Hao Peng

hao/headshot.jpg

3314 SC

201 North Goodwin Avenue

Urbana, IL 61801

I am an Assistant Professor at the Department of Computer Science of the University of Illinois at Urbana-Champaign (UIUC).

I received my Ph.D. from the University of Washington, with Noah Smith, and my Bachelors Degree from Peking University. I spent one year at the Allen Institute for Artificial Intelligence as a Young Investigator, and time at Microsoft Research, Google, and DeepMind as an intern.

My research broadly spans natural language processing and machine learning. My current interests primarily include:

  • Post-pretraining algorithms and data, especially for tackling complex reasoning, coding, and mathematical problems
  • Long context and efficiency
  • LLMs for accelerating scientific research

Outside work, I cater to the whims of a quartet of furry overlords: Meera, Loki, Sylvie, and Kea. When they release me from their service, I cycle in the summer, and (backcountry) ski in the winter.

recent publications

  1. The Best Instruction-Tuning Data are Those That Fit
    Dylan Zhang, Qirun Dai, and Hao Peng
    2025
  2. Process Reinforcement through Implicit Rewards
    Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, Qixin Xu, Weize Chen, and 13 more authors
    2025
  3. Free Process Rewards without Process Labels
    Lifan Yuan, Wendi Li, Huayu Chen, Ganqu Cui, Ning Ding, Kaiyan Zhang, Bowen Zhou, Zhiyuan Liu, and Hao Peng
    2024
  4. S2-Attention: Hardware-Aware Context Sharding Among Attention Heads
    Xihui Lin, Yunan Zhang, Suyu Ge, Liliang Ren, Barun Patra, Vishrav Chaudhary, Hao Peng, and Xia Song
    2024
  5. A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
    Suyu Ge, Xihui Lin, Yunan Zhang, Jiawei Han, and Hao Peng
    In Proceedings of the International Conference on Learning Representations (ICLR), 2025
  6. oral
    Retrieval Head Mechanistically Explains Long-Context Factuality
    Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, and Yao Fu
    In Proceedings of the International Conference on Learning Representations (ICLR), 2025
  7. SciCode: A Research Coding Benchmark Curated by Scientists
    Minyang Tian, Luyu Gao, Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, and 19 more authors
    In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024