Hao Peng

hao/headshot.jpg

3314 SC

201 North Goodwin Avenue

Urbana, IL 61801

I am an Assistant Professor at the Department of Computer Science of the University of Illinois at Urbana-Champaign (UIUC).

I received my Ph.D. from the University of Washington, with Noah Smith, and my Bachelors Degree from Peking University. I spent one year at the Allen Institute for Artificial Intelligence as a Young Investigator, and time at Microsoft Research, Google, and DeepMind as an intern.

My research interest broadly spans natural language processing and machine learning. My current interests primarily include making language AI more efficient and accessible, and evaluating and improving large language models’ reasoning capabilities, factuality, and trustworthiness, and their applications in the scientific domain.

Outside of work, I cater to the whims of a trio of furry overlords: Meera, Loki, and Sylvie. When they release me from their service, I cycle in the summer, and (backcountry) ski in the winter.

news

Apr 16, 2024 I will give a talk at the Midwest Speech and Language Days at UMich.
Apr 12, 2024 I will give a talk at UChicago and TTIC.
Apr 11, 2024 I will give a talk at the Argonne National Laboratory.
Apr 2, 2024 Check out Eurus, our state-of-the-art opensource LLMs!
Feb 15, 2024 Pretrained LLMs can be adapted to handle 128K-long context with surprisingly small amount of continual pretraining. Check out our new preprint!

recent publications

  1. SciCode: A Research Coding Benchmark Curated by Scientists
    Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, and 20 more authors
    arXiv preprint, 2024
  2. PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models
    Dylan Zhang, Shizhe Diao, Xueyan Zou, and Hao Peng
    arXiv preprint, 2024
  3. Advancing LLM Reasoning Generalists with Preference Trees
    Lifan Yuan, Ganqu Cui, Hanbin Wang, Ning Ding, Xingyao Wang, Jia Deng, Boji Shan, Huimin Chen, Ruobing Xie, Yankai Lin, and 5 more authors
    arXiv preprint, 2024
  4. Source-Aware Training Enables Knowledge Attribution in Language Models
    Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, and Hao Peng
    In Proceedings of the Conference on Language Modeling (COLM), 2024
  5. Data Engineering for Scaling Language Models to 128K Context
    Yao Fu, Rameswar Panda, Xinyao Niu, Xiang Yue, Hannaneh Hajishirzi, Yoon Kim, and Hao Peng
    2024