Hao Peng

hao/headshot.jpg

3314 SC

201 North Goodwin Avenue

Urbana, IL 61801

I am an Assistant Professor at the Siebel School of Computing and Data Science of the University of Illinois at Urbana-Champaign (UIUC). I received my Ph.D. from the University of Washington, with Noah Smith, and my Bachelors Degree from Peking University.

I’m broadly interested in large language models (LLMs). Recently, I focus on AI towards:

  • Solving complex reasoning problems in a generalizable and data-efficient way; I believe that learning from experience (e.g., through reinforcement learning) and insights from human cognition are crucial towards this goal;
  • Understanding and reasoning about the world causally;
  • Positively impacting society;
  • Advancing the frontier of human knowledge and contributing to scientific discovery, which I see as the ultimate demonstration of true generalization beyond the human knowledge they have been trained on.

Outside work, I cater to the whims of a quartet of furry overlords: Meera, Loki, Sylvie, and Kea. When they release me from their service, I cycle in the summer, and (backcountry) ski in the winter.

Undergraduate and master’s students: We love hearing from motivated undergraduate and master’s students! If you’d like to collaborate with us, send me an email right away if

  • you have experience with Lean and are excited about mathematical theorem proving; or
  • you’re comfortable with post-training engineering and interested in building foundation models for chemistry.

If neither describes you, please read one of our recent papers that excites you and ask a couple of questions about it in your email; please begin your subject line with [Meera+Loki+Sylvie+Kea].

news

Feb 17, 2026 LLM-as-a-judge is alarmingly easy to game: rewriting an agent’s chain-of-thought (without changing actions/observations) can inflate false positives by up to 90%—see our new preprint Gaming the Judge.
Feb 17, 2026 Stronger SFT can hurt downstream RL; our PEAR reweighting makes SFT checkpoints better starters for RL and improves post-RL performance—new preprint Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning.
Feb 17, 2026 Surprisingly, plain SGD matches (or beats) AdamW for RL in LLMs while updating <0.02% of parameters—see our new preprint Do We Need Adam?.

recent publications

  1. Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation
    Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn, Yunxiang Zhang, Moontae Lee, Hao Peng, Lu Wang, and Honglak Lee
    2026
  2. Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs
    Sagnik Mukherjee, Lifan Yuan, Pavan Jayasinha, Dilek Hakkani-Tür, and Hao Peng
    2026
  3. Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning
    Dylan Zhang, Yufeng Xu, Haojin Wang, Qingzhi Chen, and Hao Peng
    2026
  4. RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
    Zhiyuan Zeng, Hamish Ivison, Yiping Wang, Lifan Yuan, Shuyue Stella Li, Zhuorui Ye, Siting Li, Jacqueline He, Runlong Zhou, Tong Chen, and 7 more authors
    2025
  5. Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
    Minhui Zhu, Minyang Tian, Xiaocheng Yang, Tianci Zhou, Penghao Zhu, Eli Chertkov, Shengyan Liu, Yufeng Du, Lifan Yuan, Ziming Ji, and 55 more authors
    2025