Oliver Stanley

Machine learning engineer and researcher specialising in large language models, agents, and reinforcement learning.

I work at Scale AI, bridging the gap between ML research and applied ML systems. Previously I worked at Netcall, where I led ML research and engineering in a small team, and Kainos, where I delivered language and vision ML projects for clients across the defence and insurance industries.

Outside of my job I do independent ML research with a focus on open-source projects. The most interesting are:

Currently, Reasoning Gym. We built procedural dataset generators of algorithmically verifiable problems, and used them to train reasoning LLMs via reinforcement learning from verifiable rewards. You can read a preprint of our paper on arXiv.
Previously, Open Assistant. We built the first open-source alternative to ChatGPT, published a dataset of high-quality instruction data, and trained LLMs using supervised finetuning (SFT) and reinforcement learning from human feedback (RLHF). Our paper was accepted to NeurIPS 2023.

Outside of work I take interests in economics, policy, and football.

If you’d like to get in touch, feel free to reach out via LinkedIn. You can also view my Google Scholar profile.

Latest Posts

Jun 2, 2025
Reasoning Gym Paper
Evaluation and RL training experiments with Reasoning Gym
Feb 20, 2025
Intro to Reasoning Gym
Building procedural data generators to train reasoning models
Apr 15, 2023
First Open Assistant Models and Dataset Release
New open-source language models tuned to follow instructions

Latest Posts

Reasoning Gym Paper

Intro to Reasoning Gym

First Open Assistant Models and Dataset Release