Hao Peng
3314 SC
201 North Goodwin Avenue
Urbana, IL 61801
I am an Assistant Professor at the Siebel School of Computing and Data Science of the University of Illinois at Urbana-Champaign (UIUC). I received my Ph.D. from the University of Washington, with Noah Smith, and my Bachelors Degree from Peking University.
I’m broadly interested in large language models (LLMs). Recently, I focus on AI towards:
- Solving complex reasoning problems in a generalizable and data-efficient way; I believe that learning from experience (e.g., through reinforcement learning) and insights from human cognition are crucial towards this goal;
- Understanding and reasoning about the world causally;
- Positively impacting society;
- Advancing the frontier of human knowledge and contributing to scientific discovery, which I see as the ultimate demonstration of true generalization beyond the human knowledge they have been trained on.
Outside work, I cater to the whims of a quartet of furry overlords: Meera, Loki, Sylvie, and Kea. When they release me from their service, I cycle in the summer, and (backcountry) ski in the winter.
Undergraduate and master’s students: We love hearing from motivated undergraduate and master’s students! If you’d like to collaborate with us, send me an email right away if
- you have experience with Lean and are excited about mathematical theorem proving; or
- you’re comfortable with post-training engineering and interested in building foundation models for chemistry.
If neither describes you, please read one of our recent papers that excites you and ask a couple of questions about it in your email; please begin your subject line with [Meera+Loki+Sylvie+Kea].
news
| Feb 17, 2026 | LLM-as-a-judge is alarmingly easy to game: rewriting an agent’s chain-of-thought (without changing actions/observations) can inflate false positives by up to 90%—see our new preprint Gaming the Judge. |
|---|---|
| Feb 17, 2026 | Stronger SFT can hurt downstream RL; our PEAR reweighting makes SFT checkpoints better starters for RL and improves post-RL performance—new preprint Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning. |
| Feb 17, 2026 | Surprisingly, plain SGD matches (or beats) AdamW for RL in LLMs while updating <0.02% of parameters—see our new preprint Do We Need Adam?. |
recent publications
-
-
-
- RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments2025