December 2, 2024
2024
You don’t need step-wise labels to train strong process reward models. Check out our new preprint.