Scaling up RL with verifiable environments yields surprisingly strong performance. Check out our preprint!