Skip to content

SkyRL Refactor to the Tinker API #812

@tyler-griggs

Description

@tyler-griggs

SkyRL Refactor to the Tinker API

tl;dr The SkyRL team is kicking off a refactor to make SkyRL fully Tinker-compatible. We gladly welcome contributions to this effort! If you are interested, please comment in this issue or message us in the slack workspace.

Overview

Tinker is a simple training API introduced by Thinking Machines Lab that nicely separates algorithm logic from infrastructure logic. We've spent quite a lot of time testing out the Tinker API through our SkyRL tx project, and we’ve found the Tinker API to be compelling for a few key reasons:

  • API at the right leveI: Tinker’s API surface is small and simple (forward_backward, optim_step, sample, checkpointing) and we’ve found it very effective at pushing all infrastructure logic below the API (worker onloading/offloading, micro-batching, gradient management) while leaving users with full algorithm and dataflow control above the API surface.
  • Flexibility: Tinker’s low-level API of forward_backward and optim_step supports any weight-updating algorithms, allowing for flexible adaption to how researchers and practitioners want to perform post-training. In contrast, SkyRL’s training worker APIs are currently specific to algorithms like GRPO and PPO (such as explicit worker roles of Policy, Reference, etc.). Moving to the Tinker API allows the underlying worker implementations to also easily support SFT, DPO, or other weight-updating algorithms.
  • Standardization & network effects: We believe Tinker can become a widely-adopted standard for post-training, akin to OpenAI API’s adoption for inference, and are excited about the network effects this can unlock in open source. Today, it is difficult to adapt code written for one post-training framework to run effectively on another. In a future where more post-training stack developers implement Tinker backends and more researchers build atop the Tinker API, training scripts and libraries can be shared and adopted more easily across the community.

Our goal

  • Full Tinker API compatibility: Training scripts that target the Tinker API and can run on Thinking Machines’ hosted service will run without any code changes on your own hardware using SkyRL's skyrl-train library.
  • Performance & scalability: Retain SkyRL’s existing high-performance training and inference backends (FSDP2, Megatron, vLLM) while refactoring them behind the Tinker API.

High-level Implementation Plan

This refactor will involve breaking changes, but we will stage them as carefully and minimally as possible as well as communicate clearly via this issue and in the SkyRL Slack workspace (invite here). We intend to maintain support for all existing workloads throughout and after the refactor; the Tinker API is flexible to support them. A high-level breakdown of the planned changes are as follows:

Phase 1 — Training backends

  • Refactor training workers to expose forward_backward and optim_step instead of the current ppo_train API
  • Introduce a Tinker-style checkpoint interface

Phase 2 — Training loop

  • Push infrastructure logic (worker onloading/offloading, cache management, weight sync) out of the training loop (train()) to a layer below the Tinker API
  • Implement a new training loop targeting the Tinker API
    • We will first implement this side-by-side with the existing train() method for minimal disruption, but intend to make it the default once ready

Phase 3 — Sampling

  • Add Tinker's sample entrypoint to the inference interface
  • Update existing endpoints (ie, OpenAI API-based endpoints) to hit the sample endpoint under the hood while maintaining token-in/token-out semantics.

Phase 4 — Tinker API Server

  • Adopt the API server developed in SkyRL tx to support Tinker's HTTP-based interface
  • At this point, scripts from the tinker-cookbook will be run-able in SkyRL without modifying code.

Phase 5 — Reproduction Runs

  • Run several reproductions of prior training runs to verify correctness.

We’ll use this issue to track progress, link PRs, and update timelines. Feedback is welcome!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions