You don’t need step-wise labels to train strong process reward models. Check out our new preprint.