-
Notifications
You must be signed in to change notification settings - Fork 261
Description
SkyRL Refactor to the Tinker API
tl;dr The SkyRL team is kicking off a refactor to make SkyRL fully Tinker-compatible. We gladly welcome contributions to this effort! If you are interested, please comment in this issue or message us in the slack workspace.
Overview
Tinker is a simple training API introduced by Thinking Machines Lab that nicely separates algorithm logic from infrastructure logic. We've spent quite a lot of time testing out the Tinker API through our SkyRL tx project, and we’ve found the Tinker API to be compelling for a few key reasons:
- API at the right leveI: Tinker’s API surface is small and simple (
forward_backward,optim_step,sample, checkpointing) and we’ve found it very effective at pushing all infrastructure logic below the API (worker onloading/offloading, micro-batching, gradient management) while leaving users with full algorithm and dataflow control above the API surface. - Flexibility: Tinker’s low-level API of
forward_backwardandoptim_stepsupports any weight-updating algorithms, allowing for flexible adaption to how researchers and practitioners want to perform post-training. In contrast, SkyRL’s training worker APIs are currently specific to algorithms like GRPO and PPO (such as explicit worker roles of Policy, Reference, etc.). Moving to the Tinker API allows the underlying worker implementations to also easily support SFT, DPO, or other weight-updating algorithms. - Standardization & network effects: We believe Tinker can become a widely-adopted standard for post-training, akin to OpenAI API’s adoption for inference, and are excited about the network effects this can unlock in open source. Today, it is difficult to adapt code written for one post-training framework to run effectively on another. In a future where more post-training stack developers implement Tinker backends and more researchers build atop the Tinker API, training scripts and libraries can be shared and adopted more easily across the community.
Our goal
- Full Tinker API compatibility: Training scripts that target the Tinker API and can run on Thinking Machines’ hosted service will run without any code changes on your own hardware using SkyRL's
skyrl-trainlibrary. - Performance & scalability: Retain SkyRL’s existing high-performance training and inference backends (FSDP2, Megatron, vLLM) while refactoring them behind the Tinker API.
High-level Implementation Plan
This refactor will involve breaking changes, but we will stage them as carefully and minimally as possible as well as communicate clearly via this issue and in the SkyRL Slack workspace (invite here). We intend to maintain support for all existing workloads throughout and after the refactor; the Tinker API is flexible to support them. A high-level breakdown of the planned changes are as follows:
Phase 1 — Training backends
- Refactor training workers to expose
forward_backwardandoptim_stepinstead of the currentppo_trainAPI - Introduce a Tinker-style checkpoint interface
Phase 2 — Training loop
- Push infrastructure logic (worker onloading/offloading, cache management, weight sync) out of the training loop (
train()) to a layer below the Tinker API - Implement a new training loop targeting the Tinker API
- We will first implement this side-by-side with the existing
train()method for minimal disruption, but intend to make it the default once ready
- We will first implement this side-by-side with the existing
Phase 3 — Sampling
- Add Tinker's
sampleentrypoint to the inference interface - Update existing endpoints (ie, OpenAI API-based endpoints) to hit the sample endpoint under the hood while maintaining token-in/token-out semantics.
Phase 4 — Tinker API Server
- Adopt the API server developed in SkyRL tx to support Tinker's HTTP-based interface
- At this point, scripts from the
tinker-cookbookwill be run-able in SkyRL without modifying code.
Phase 5 — Reproduction Runs
- Run several reproductions of prior training runs to verify correctness.
We’ll use this issue to track progress, link PRs, and update timelines. Feedback is welcome!