Intro to Reasoning Gym

Reasoning Gym is a new open-source project aimed at facilitating the training of reasoning models using finetuning techniques including reinforcement learning.

Background

The open-source release of DeepSeek-R1 stirred up a lot of interest, in both international markets and the open-source AI community. Two of the key innovations justifying the excitement are:

GRPO, originally published in 2024 with DeepSeekMath, is more cost-effective than the dominant PPO for reinforcement learning on LLMs. GRPO removes the value function, reducing resource usage since the value function was typically a neural model of similar size to the policy model.
Algorithmically verifiable problems as training data, allowing rules-based rewards in the RL process rather than a (difficult to train and often unreliable) neural reward model.

Goal

Reasoning Gym seeks to take advantage of both of the mentioned DeepSeek innovations by building a procedural dataset generator for algorithmically verifiable problems. This will allow anyone to train reasoning models using an RL technique like GRPO, without going through the process of curating huge volumes of data.

Many datasets have configurable complexity or difficulty levels, allowing virtually infinite datasets of varying challenge levels.

To achieve this, contributors can write their own dataset classes in a range of categories such as games, logic puzzles, algorithms, coding, geometry, and others. My own first contribution to the project was a Futoshiki generator.

Further Work

Going forwards a key aim is to build a “curriculum”, where data is generated with progressing levels of complexity, depending on the learning progress being made by the model being trained.

The repository for the project also contains examples for training LLMs using RL with various frameworks, including OpenRLHF, trl, and veRL.

Finally, building supplementary reasoning datasets, which are not procedurally generated, will be explored in order to augment the procedural generators.

Contributing

Anyone can contribute to Reasoning Gym by commenting on the issues in the GitHub repository and raising pull requests. The project readme also details how to join the community on Discord.

First Open Assistant Models and Dataset Release

Reasoning Gym Paper