Hao Peng
  • about
  • group
  • publications
  • teaching

Announcement_5

February 17, 2026

2026

Stronger SFT can hurt downstream RL; our PEAR reweighting makes SFT checkpoints better starters for RL and improves post-RL performance—new preprint Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning.

© Copyright 2026 Hao Peng. Powered by Jekyll with al-folio theme. Hosted by GitHub Pages. Last updated: February 27, 2026.