SkyRL Refactor to the Tinker API

## SkyRL Refactor to the Tinker API

tl;dr The SkyRL team is kicking off a refactor to make **SkyRL fully Tinker-compatible**.  We gladly welcome contributions to this effort! If you are interested, please comment in this issue or message us in the [slack workspace](https://join.slack.com/t/skyrl/shared_invite/zt-3f6ncn5b8-QawzK3uks6ka3KWoLwsi5Q).

### Overview

Tinker is a simple training API [introduced](https://thinkingmachines.ai/tinker/) by Thinking Machines Lab that nicely separates algorithm logic from infrastructure logic. We've spent quite a lot of time testing out the Tinker API through our **SkyRL tx** [project](https://github.com/NovaSky-AI/SkyRL/tree/main/skyrl-tx), and we’ve found the Tinker API to be compelling for a few key reasons:

- **API at the right leveI**: Tinker’s API surface is small and simple (`forward_backward`, `optim_step`, `sample`, checkpointing) and we’ve found it very effective at pushing all infrastructure logic below the API (worker onloading/offloading, micro-batching, gradient management) while leaving users with full algorithm and dataflow control above the API surface.
- **Flexibility**: Tinker’s low-level API of `forward_backward` and `optim_step` supports any weight-updating algorithms, allowing for flexible adaption to how researchers and practitioners want to perform post-training. In contrast, SkyRL’s training worker APIs are currently specific to algorithms like GRPO and PPO (such as explicit worker roles of Policy, Reference, etc.). Moving to the Tinker API allows the underlying worker implementations to also easily support SFT, DPO, or other weight-updating algorithms. 
- **Standardization & network effects**: We believe Tinker can become a widely-adopted standard for post-training, akin to OpenAI API’s adoption for inference, and are excited about the network effects this can unlock in open source. Today, it is difficult to adapt code written for one post-training framework to run effectively on another. In a future where more post-training stack developers implement Tinker backends and more researchers build atop the Tinker API, training scripts and libraries can be shared and adopted more easily across the community.

### Our goal
- **Full Tinker API compatibility**: Training scripts that target the Tinker API and can run on Thinking Machines’ hosted service will _run without any code changes on your own hardware using SkyRL's `skyrl-train` library._
- **Performance & scalability**: Retain SkyRL’s existing high-performance training and inference backends (FSDP2, Megatron, vLLM) while refactoring them behind the Tinker API.

### High-level Implementation Plan
This refactor **will involve breaking changes**, but we will stage them as carefully and minimally as possible as well as communicate clearly via this issue and in the SkyRL Slack workspace ([invite here](https://join.slack.com/t/skyrl/shared_invite/zt-3f6ncn5b8-QawzK3uks6ka3KWoLwsi5Q)). We intend to maintain support for all existing workloads throughout and after the refactor; the Tinker API is flexible to support them. A high-level breakdown of the planned changes are as follows:

#### Phase 1 — Training backends
- Refactor training workers to expose `forward_backward` and `optim_step` instead of the current `ppo_train` API
- Introduce a Tinker-style checkpoint interface

#### Phase 2 — Training loop
- Push infrastructure logic (worker onloading/offloading, cache management, weight sync) out of the training loop (`train()`) to a layer below the Tinker API
- Implement a new training loop targeting the Tinker API
  - We will first implement this side-by-side with the existing `train()` method for minimal disruption, but intend to make it the default once ready

#### Phase 3 — Sampling
- Add Tinker's `sample` entrypoint to the inference interface
- Update existing endpoints (ie, OpenAI API-based endpoints) to hit the sample endpoint under the hood while maintaining token-in/token-out semantics.

#### Phase 4 — Tinker API Server
- Adopt the API server developed in SkyRL tx to support Tinker's HTTP-based interface 
- At this point, scripts from the [`tinker-cookbook`](https://github.com/thinking-machines-lab/tinker-cookbook) will be run-able in SkyRL without modifying code.

#### Phase 5 — Reproduction Runs
- Run several reproductions of prior training runs to verify correctness.

---

We’ll use this issue to track progress, link PRs, and update timelines. Feedback is welcome!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SkyRL Refactor to the Tinker API #812