Welcome to OpenThoughts-Agent (OT-Agent for short), a large-scale research project dedicated to creating the best tooling and finding the best data for training small agentic models.
OT-Agent is a research codebase! Conventions will change, files will move and workflows will break as we continue to grow. Please bear with us and open an issue if you discover a bug.
If you are new to the project, start here to get up and running.
- Create a clean Python 3.12 virtual environment using your preferred manager (conda, venv, pixi, etc.).
- Install uv if you don’t already have it:
pip install --upgrade uv
- From the repo root, install the base OT-Agent stack:
uv pip install -e .
Optional extras (append the extra names to the command above, e.g. uv pip install -e ".[datagen]"):
- HPC datagen runtime (Ray clusters + vLLM serving):
.[datagen]pulls CUDA-heavy wheels on Linux/Windows and automatically falls back to CPU-friendly packages on macOS. - SweSmith-specific datagen helpers (extends the above with bespoke tools):
.[datagen,datagen-swesmith](or.[datagen-swesmith]if you already pulled the base datagen extra) - Cloud orchestration helpers (SkyPilot, Docker tooling):
.[cloud]
Fresh Ubuntu/GCP quickstart
sudo snap install astral-uv --classic # installs uv system-wide
uv venv --python 3.12 # creates .venv with CPython 3.12
source .venv/bin/activate
uv pip install -e ".[datagen]" # add ,cloud or other extras as needed- SFT stack:
- Install the new
[sft]extra to auto-sync the submodule and pull its heavy dependencies in one go (runsgit submodule update --init --remote sft/llamafactoryunless you setOT_AGENT_SKIP_SFT_SYNC=1):uv pip install -e "[sft]" # only SFT runtime uv pip install -e "[datagen,sft]" # convenient combined env uv pip install -e "[datagen,sft,cloud]" # pull whatever extras you need
- Under the hood we install LLaMA-Factory from
sft/llamafactorywith thehf-kernels,liger-kernel,deepspeed,bitsandbytesextras. If you need additional LLaMA-Factory extras, continue tocd sft/llamafactory && uv pip install -e .[...more...]. - Training configs that pair with OT-Agent live under
sft/lf_configs/**; refer tosft/llamafactory/README.mdfor detailed flags and dependency notes.
- Under the hood we install LLaMA-Factory from
- Data stack
- Dataset tooling docs live under
data/README.md; install per-generator requirements in addition to thedatagenextras above when needed.
- Dataset tooling docs live under
Many OT-Agent launch modes JIT-compile CUDA/C++ extensions (e.g., flash-infer, flash-attn, triton). Those builds are sensitive to compiler and CUDA versions, so verify that the toolchain you expose to Python matches the version of PyTorch you installed (python - <<<'import torch; print(torch.version.cuda)'). We primarily test on CUDA 12.8/12.9 with GCC ≥12.
Cluster modules. If your HPC environment exposes the right stack, loading modules is the path of least resistance:
module load gcc/14.2.0
module load cuda/12.8Container shells. Some centers publish pre-baked CUDA images. Binding your workspace into one of those containers often guarantees a clean toolchain:
singularity shell --nv \
--bind $SCRATCH/ot-agent \
$SCRATCH/cuda-img/cuda-cudnn-12.8-ubuntu22.sifConda-provisioned toolchains. When neither modules nor containers provide what you need, install the compilers and sysroot via mamba. Keep the packages pinned so minor upgrades don’t silently change ABI compatibility:
mamba install -c conda-forge c-compiler cxx-compiler -y
mamba install -c conda-forge gcc_linux-64 gxx_linux-64 sysroot_linux-64 -y
mamba install -c conda-forge libstdcxx-ng=12 libgcc-ng=12 gcc_impl_linux-64 \
gxx_impl_linux-64 sysroot_linux-64 -yEnvironment variables. Point CUDA- and GCC-aware tools at the locations you provisioned. Adjust the paths below if your install lives somewhere else:
GCC_ROOT="$(dirname "$(dirname "$(which gcc)")")"
export CUDA_HOME=/usr/local/cuda
export CPATH="$CUDA_HOME/include${CPATH:+:$CPATH}"
export LIBRARY_PATH="$CUDA_HOME/lib64${LIBRARY_PATH:+:$LIBRARY_PATH}"
export LD_LIBRARY_PATH="$GCC_ROOT/lib64:$GCC_ROOT/lib:$CUDA_HOME/lib64${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}"
export PATH="$CUDA_HOME/bin${PATH:+:$PATH}"Heavyweight builds. Once the toolchain is stable, the JIT pieces compile automatically on import. Some packages (like flash-attn) still require manual builds—install those last so you know the rest of the environment is steady, and make sure TORCH_CUDA_ARCH_LIST, NVCC_THREADS, etc. match your hardware:
UV_COMPILE_THREADS=4 MAX_JOBS=4 NVCC_THREADS=4 TORCH_CUDA_ARCH_LIST="9.0" \
pip install -v --no-build-isolation "flash-attn==2.8.1"Most scripts expect credentials (HF tokens, Daytona keys, W&B API keys, Supabase creds, etc.) to live in a private env file that is not committed to this repo. Point OT-Agent at your private file by exporting:
export DC_AGENT_SECRET_ENV=/secure/path/to/my_dc_agent_secrets.envThat file should export DAYTONA_API_KEY=..., export HF_TOKEN=..., export WANDB_API_KEY=..., export SUPABASE_*, etc. The launcher and auxiliary scripts now read DC_AGENT_SECRET_ENV; legacy KEYS/SECRET_ENV_PATH variables are still accepted for backward compatibility but will be removed once everyone migrates.
OT-Agent's job launchers are designed to work with HPC (high-performance computing) clusters. Different launchers exist for different job types. OT-Agent's launchers are modular, making it relatively straightforward to add your own preferred cluster. Every invocation of python -m hpc.launch must explicitly set --job_type; use sft for standard finetuning jobs, sft_mca when you need the Megatron Core Adapter sbatch templates, pretokenize for tokenization-only runs, datagen for generator/trace work, and consolidate for ZeRO merges.
Datagen jobs are launched via the generic HPC launcher and use --job_type datagen plus a generator script.
Need to verify a Harbor eval locally before burning queue time? eval/local/run_eval.py spins up a single-node Ray cluster, launches a vLLM controller, and executes a Harbor job against the newly created endpoint.
Prereqs:
- Python environment with the base install plus the
datagenextra (Ray + vLLM). - CUDA GPUs (or Apple Silicon if you just need a dry run) and the expected driver stack.
- Harbor/Daytona/ Supabase credentials exposed via the usual
DC_AGENT_SECRET_ENV.
Example:
python eval/local/run_eval.py \
--datagen_config hpc/datagen_yaml/qwen3_coder_30b_a3b_vllm_serve_131k_1xH200.yaml \
--harbor_config hpc/harbor_yaml/trace_16concurrency_eval_ctx131k.yaml \
--dataset terminal-bench@2.0 \
--model Qwen/Qwen3-Coder-30B-A3B-Instruct \
--agent terminus-2 \
--gpus 1 \
--eval_benchmark_repo DCAgent2/Qwen3Coder30B-terminus2-terminal-bench-2.0-test-lambdaWhat happens:
- Datagen defaults (tensor parallelism, vLLM overrides, etc.) are pulled from
--datagen_config. - Ray and vLLM are booted locally; endpoint metadata lands in
eval_runs/vllm_endpoint.json. - The script builds the Harbor command, injects required agent kwargs, and runs it while streaming output to
eval_runs/logs/harbor.log. - Traces, logs, and Harbor artifacts accumulate under
eval_runs/(change via--experiments_dir).
Handy flags:
--agent_kwarg foo=bar(repeatable) to forward Harbor agent settings.--harbor_extra_arg ...for advanced Harbor CLI knobs (e.g., filtering datasets).--harbor_env daytona|docker|modalto select the Harbor sandbox backend (default:daytona).--harbor_log /tmp/harbor.logto redirect the live Harbor TUI.--dry_runto validate configs without launching Ray/Harbor.
Once your eval behaves locally, promote the same Harbor YAML/datagen config to your HPC launcher or the cloud wrappers (see Cloud Launchers below).
OT-Agent provides SkyPilot-based cloud launchers for running trace generation and eval jobs on cloud VMs (GCP, AWS, Lambda, Kubernetes, etc.) without SLURM. These live under data/cloud/ and eval/cloud/.
Launches data/local/run_tracegen.py on a cloud GPU node:
python data/cloud/launch_tracegen_cloud.py \
--harbor_config hpc/harbor_yaml/trace_16concurrency_ctx131k.yaml \
--datagen_config hpc/datagen_yaml/gpt_oss_120b_vllm_serve_131k_1xH200.yaml \
--tasks_input_path DCAgent/stackexchange-tor-sandboxes \
--model openai/gpt-oss-120b \
--secrets_env /path/to/secrets.env \
--accelerator "H100:1" \
--cloud_provider gcp \
--region us-central1Launches eval/local/run_eval.py on a cloud GPU node:
python eval/cloud/launch_eval_cloud.py \
--harbor_config hpc/harbor_yaml/trace_16concurrency_eval_ctx131k.yaml \
--datagen_config hpc/datagen_yaml/qwen3_coder_30b_a3b_vllm_serve_131k_1xH200.yaml \
--dataset terminal-bench@2.0 \
--model Qwen/Qwen3-Coder-30B-A3B-Instruct \
--agent terminus-2 \
--secrets_env /path/to/secrets.env \
--accelerator "H100:1" \
--cloud_provider gcp--cloud_provider- Cloud backend:gcp,aws,lambda,kubernetes, etc. (comma-separated for fallbacks)--accelerator- GPU spec, e.g.,H100:1,A100:4(comma-separated for fallbacks)--region- Preferred region(s)--harbor_env- Harbor environment backend:daytona(default),docker, ormodal- Note: Use
daytona(default) for cloud VMs. Thedockerbackend requires Docker-in-Docker which is not available in SkyPilot's container runtime. Thedockerandmodalbackends are primarily for local development.
- Note: Use
--secrets_env- Path to secrets file sourced inside the container--autostop N- Auto-stop cluster after N minutes idle (set to 0 for Kubernetes)--retry_until_up- Keep retrying until resources are available (useful for scarce GPUs)--down- Tear down cluster after job completes--list_providers- Show available cloud providers and exit
Logs and outputs sync periodically to --local_sync_dir during execution.
- Ensure your cluster environment is set up (dotenv, conda env, etc.). For TACC/Vista-style machines, follow the checklist in
hpc/README.mdand usehpc/dotenv/tacc.envas a starting point for your environment variables. - Activate your environment and source the dotenv:
source hpc/dotenv/<your-cluster>.env
eval "$DCFT_ACTIVATE_ENV"
cd "$DCFT"The dotenvs now export PYTHONPATH="${DCFT_PRIVATE:-$DCFT}:$PYTHONPATH" so python -m hpc.launch resolves even on clusters that strip the working directory from sys.path. If you maintain a custom dotenv, mirror this line to keep the launcher importable.
- Choose or write a datagen script under
data/...implementingBaseDataGenerator(seedata/generation/base.pyand existing generators for examples). - Run the launcher from a login node:
python -m hpc.launch \
--job_type datagen \
--datagen_script data/<dataset>/generate_abstract.py \
--datagen_config hpc/datagen_yaml/<model>_vllm_serve.yaml \
--datagen_target_repo <org/dataset-tasks> \
--enable_task_gen True \
--experiments_dir "$DCFT/experiments" \
--time_limit 12:00:00- To also generate traces, add:
--enable_trace_gen True--trace_target_repo <org/dataset-traces>--trace_harbor_config path/to/harbor_job.yamland any of thetrace_*overrides documented inhpc/README.md.
The launcher will synthesize and submit one or more sbatch scripts under "$experiments_dir/sbatch_scripts" and write configs to "$experiments_dir/configs". Use --dry_run to inspect scripts without actually calling sbatch.
SFT jobs are also launched via hpc.launch with --job_type sft and a LLaMA Factory config.
- Pull and install the SFT submodule (once per checkout) and install its dependencies in-place:
git submodule update --init --remote sft/llamafactory
cd sft/llamafactory
pip install -e .[train,liger-kernel,deepspeed] # pick the extras you need
cd -- Configure your cluster dotenv and environment as in the Datagen section.
- Pick a training config under
sft/lf_configsor create your own YAML alongside the existing presets. - From a login node, run:
python -m hpc.launch \
--job_type sft \
--train_config_path sft/lf_configs/<path-to-config>.yaml \
--dataset <org/dataset> \
--num_nodes 8 \
--time_limit 24:00:00 \
--experiments_dir "$DCFT/experiments"- Optionally override LLaMA Factory flags via
--train_extra_args "..."(seehpc/README.mdandsft/llamafactory/README.mdfor full argument lists).
The launcher will construct a per-run YAML in "$experiments_dir/configs", generate an sbatch script, and then submit the job. Training metadata and summaries are written into the run’s output_dir.
Everything you need to evaluate models lives under eval/. Pick the mode that fits how much infrastructure you have available:
-
Terminal-Bench smoke tests (local or Daytona).
eval/example_tbench.pywraps theterminal_benchCLI so you can point Harbor at a hosted vLLM endpoint. Start (or SSH-tunnel to) a VLLM server first (python eval/start_ray.pyspins up a Ray Serve-backed OpenAI-compatible endpoint bound to127.0.0.1:8000). Then run:python eval/example_tbench.py \ # tweak dataset_name/version, backend, agent, model_name, agent_kwargs, n_concurrent_trialsThis creates a run with Daytona sandboxes and prints the aggregated score. Use it to verify that your model + Harbor wiring works before touching HPC.
-
HF RL task batches.
eval/example_rltasks_eval.pydownloads a HuggingFace dataset, optionally slices it, and launches Harbor with your preferred agent + model:
python eval/example_rltasks_eval.py \
--model openai/gpt-5-chat-latest \
--dataset DCAgent/nl2bash \
--agent terminus-2 \
--n-tasks 100 \
--n-concurrent 8 \
--n-attempts 3 \
--env daytonaThe script converts parquet task dumps via scripts/sandboxes/tasks_parquet_converter.py, stages results under ./jobs, and accepts --job-name/--output so you can checkpoint multiple experimental runs.
-
Cluster-scale Harbor eval (TACC template). The
eval/tacc/subtree packages the scripts we use on TACC GH nodes:- Drop database + HF credentials into
eval/tacc/secret.env(seeeval/tacc/README.mdfor the expected keys) and make sure the shared conda env in that README is accessible from your account. - Submit
sbatch eval/tacc/tacc_eval_harbor.sbatch <hf_model_id> <hf_dataset_repo>; both arguments default to the canonical OT-Agent eval model/dataset if omitted. - The sbatch file loads CUDA/GCC modules, boots a VLLM server, hydrates datasets via
snapshot_download.py, and then callsharbor jobs startwith the concurrency specified in line 109 (set--n-concurrenthere to scale up/down). Behavior is controlled byeval/tacc/dcagent_eval_config.yaml. - Logs land in
experiments/logs/and full Harbor outputs go underjobs/<RUN_TAG>/.eval/tacc/README.mddocuments troubleshooting tips plus alternate sbatch variants (tacc_eval_multi_gpu.sbatch,tb2_eval_harbor.sbatch, SweBench listeners, etc.).
- Drop database + HF credentials into
-
Other clusters/benchmarks. The remaining files in
eval/are copyable templates:run_tb_wx.sbatchand therun_tb_vllm_wx_qwen3_t*.sbatchscripts schedule Terminal-Bench runs with hosted vLLM models.JSC/launch_tbench.shshows how to adapt the flow to JSC (interactive SLURM allocation + reverse tunnel + Apptainer). Reuse these when porting eval to a new site—each script labels the environment variables and SLURM resources you will likely need to swap out.
-
Launcher-driven evals. Prefer
python -m hpc.launch --job_type eval ...whenever you want the same CLI ergonomics as SFT/datagen jobs. Key flags:--datagen_config hpc/datagen_yaml/<model>.yaml(required). This is the same engine file you would pass to--job_type datagenand determines how the vLLM/Ray controller boots.--trace_modeloptionally overrides bothengine.modelandvllm_server.model_pathso a single YAML can serve multiple finetunes.--trace_harbor_config hpc/harbor_yaml/<cluster>_eval_*.yaml(filename must include_eval_)- Either
--trace_input_path /path/to/tasksor--harbor_dataset terminal-bench@2.0(mutually exclusive) --eval_benchmark_repo <org/dataset>so Supabase rows can track the benchmark- Agent knobs now default to the Harbor YAML:
--trace_agent_name,--trace_env, and--trace_n_concurrentare optional overrides when you need to deviate. --trace_agent_kwargs '{"temperature":0.2,"max_tokens":2048}'overlays JSON on top of the Harbor + datagen defaults. When the datagen config usesvllm_local, the launcher injects the computedapi_base/metrics_endpointfromvllm_endpoint.jsonautomatically unless you explicitly provide them.
When a
vllm_localengine is selected, the eval launcher reuses the datagen hosting flow: it spins up the Ray/vLLM server, waits for the generated endpoint JSON to pass health checks, feeds the derived OpenAI-compatible URL to Harbor, and tears the server down once the eval finishes. No ad-hoc--trace_eval_onlyhacks are needed—the vLLM bootstrap, Supabase bookkeeping, and HF uploads all run in one job.Example (local dataset path):
python -m hpc.launch \ --job_type eval \ --job_name tacc-qwen2-eval \ --datagen_config hpc/datagen_yaml/qwen3_coder_30b_a3b_vllm_serve.yaml \ --trace_harbor_config hpc/harbor_yaml/tacc_eval_daytona_eval.yaml \ --trace_input_path $SCRATCH/dev_set_71_tasks \ --eval_benchmark_repo mlfoundations-dev/dev_set_71_tasks \ --trace_model hosted_vllm/qwen2.5-7b-instruct \ --trace_agent_name terminus-2 \ --trace_agent_kwargs '{"api_base":"http://127.0.0.1:8000/v1","key":"fake_key"}' \ --trace_n_concurrent 128
Example (Harbor registry dataset slug):
python -m hpc.launch \ --job_type eval \ --job_name tb2-claude-eval \ --datagen_config hpc/datagen_yaml/qwen3_coder_30b_a3b_vllm_serve.yaml \ --trace_harbor_config hpc/harbor_yaml/tacc_eval_daytona_eval.yaml \ --harbor_dataset terminal-bench@2.0 \ --eval_benchmark_repo DCAgent/dev_set_71_tasks \ --trace_model anthropic/claude-opus-4-1 \ --trace_agent_name claude-code \ --trace_env daytona \ --trace_n_concurrent 64Valid
--harbor_datasetslugs come from the upstream registry (../harbor/registry.json). Popular options includeterminal-bench@2.0,terminal-bench-pro@head, andhello-world@head; runharbor datasets listfor the rest. The launcher validates slugs against the registry when it is available locally.
Regardless of the path you choose, make sure DC_AGENT_SECRET_ENV or the cluster-specific secret.env is exported so Harbor can read HF, Daytona, and database credentials. Use --dry-run/small --n-concurrent first to validate that Harbor, the sandbox provider, and your model endpoint all respond as expected.
- Job bookkeeping starts before Harbor launches. The sbatch drivers (for example
eval/tacc/tacc_eval_harbor.sbatch) calldatabase/unified_db/utils.py:create_job_entry_startedto write a Supabasesandbox_jobsrow withjob_status="Started", the intendedn_trials(Harbor tasks) andn_rep_eval(copied fromconfig.n_attempts, default 3). This dedupes model/benchmark pairs and records agent/model provenance before GPUs are consumed. - Runs must finish cleanly before we touch the database. After Harbor exits, the sbatch script inspects
jobs/<RUN_TAG>/result.jsonand bails if too many Daytona errors show up (TACC skips the upload if there are >3). Only when the run directory exists and passes that gate do we proceed. - Averaged metrics come straight from Harbor.
_extract_job_metadata()(seedatabase/unified_db/utils.py) parsesstats.evals.*.metricsinsideresult.json, grabs themeanvalues that Harbor already computed across repeated attempts, and stores them withn_total_trials. We never recompute scores—what Harbor reports is what lands in Supabase. - Trial completeness is enforced.
_extract_trial_metadata()refuses to register a trial unless bothagent_executionandverifier_resultblocks are present, so partially executed tasks never pollute the leaderboard averages. Harbor’s retry knobs (--n-attempts,n_rep_eval) ensure enough fully verified trials exist to compute the mean. - Uploads are two-phased.
upload_eval_results()first pushes traces to HuggingFace (ifhf_repo_idis supplied) and then updates the pre-created job row withjob_status="Finished", metrics, stats, and the HF dataset URL viaupload_job_and_trial_records(). Trial + usage rows are inserted in the same pass, so Supabase has both the aggregate score and every per-task attempt.error_modecontrols whether a failed HF upload rolls everything back or simply marks the job with warnings. - Listeners keep things consistent. Automations like
eval/tacc/tb2_eval_listener.pypoll Supabase for recent models, detect stalesandbox_jobsstuck inStarted, and only submit new sbatch runs when a model/benchmark pair still needs coverage. That feedback loop guarantees the averages you see on the leaderboard come from completed Harbor jobs with trace artifacts uploaded.
Please check rl/README.md.
Adding a new cluster involves defining its resources, sbatch templates, and a dotenv file so hpc.launch can target it.
- Create a dotenv for your cluster under
hpc/dotenv/, followingtacc.envas a template. At a minimum, define:DCFT(path to your OpenThoughts-Agent checkout on the cluster)DCFT_ACTIVATE_ENV(command to activate the Python env)- paths for
EXPERIMENTS_DIR,DATASETS_DIR,MODELS_DIR, and any cluster-specific SIF/Apptainer images.
- Register basic cluster metadata by exporting
HPC_NAMEand related fields in your dotenv or by passing them on the CLI:--name,--account,--partition,--gpus_per_node,--cpus_per_node, etc. (seehpc/README.mdandhpc/hpc.py).
- Create sbatch templates in
hpc/sbatch_data/for your cluster:- Copy an existing template for a similar machine (GPU type / internet access) and adjust
#SBATCHheaders and module loads. - Keep placeholders like
{time_limit},{job_name},{experiments_dir}etc. intact; they will be filled byhpc.launch.
- Copy an existing template for a similar machine (GPU type / internet access) and adjust
- Test with a dry run:
source hpc/dotenv/<your-cluster>.env
eval "$DCFT_ACTIVATE_ENV"
cd "$DCFT"
python -m hpc.launch \
--job_type datagen \
--datagen_script data/<dataset>/generate.py \
--datagen_target_repo test-org/test-dataset \
--experiments_dir "$DCFT/experiments" \
--dry_run- Once sbatch scripts look correct, drop
--dry_runto submit real jobs. If your cluster needs special handling (login vs compute nodes, proxies, etc.), add it tohpc/hpc.pyand, if necessary,hpc/launch.py(for example, see the existing logic for JURECA/JUWELS internet nodes).
To learn more about the details of how HPC Launch works, please refer to hpc/README.md.
For non-HPC users, we provided a tutorial notebook under notebook/datagen_sft_tutorial.ipynb with an example of how we generate data from the inferredbugs dataset and perform SFT.
OT-Agent relies on Harbor to launch containerized tools for datagen and eval. Harbor supports multiple backends (Docker, Daytona, Modal, e2b, etc.), but most HPC centers either forbid Docker outright or only allow Apptainer/Singularity. In practice this means:
- Remote container providers are the default. We run large-scale datagen on managed platforms (Daytona, Modal, e2b) where Docker is available and network egress is unrestricted. Daytona is our primary provider; their infrastructure has handled millions of launches/day for OT-Agent workloads.
- HPC clusters stay lean. Login/compute nodes focus on scheduling, storage, and GPU time. When those jobs need containerized helpers (e.g., Harbor trace agents), they call out to the remote provider rather than trying to build Docker images locally.
- Configuration lives in Harbor YAMLs. Pick a template under
hpc/harbor_yaml/, set thetypefield to your provider (e.g.,daytona,modal,modal-ray), and make sure any required secrets/API keys are present in your runtime env (DC_AGENT_SECRET_ENVis sourced automatically). - Bring-your-own backend is fine. Any Harbor-compatible provider works as long as its CLI/API is reachable from the cluster you launch jobs on. If you need Podman or another backend we don’t support yet, open an issue/PR—Harbor makes it straightforward to add.
Once the Harbor YAML points at the right backend and credentials, OT-Agent’s launch scripts will provision containers, stream logs, and tear everything down automatically.
We are a collaboration led by researchers and engineers from Stanford, UC Berkeley, UT Austin, NYU, University of Washington, UCSD, ASU, CMU, UCLA, UNC Chapel Hill, TUM, LAION, and other partners focused on building the best datasets (and therefore the best models). See our previous work at datacomp.ai and mlfoundations.
We currently organize via the terminal-bench Discord; go there if you need help.
- For RL: Please contact Charlie Ruan
- For SFT: Please contact Benjamin Feuer
- For Data: Please contact Etash Guha
- For Eval: Please contact Negin Raoof
- For Project Management (includes cluster and account access): Please Contact Ryan Marten
@misc{openthoughts-agent,
author = {Team, OpenThoughts-Agent},
month = Dec,
title = {{OpenThoughts-Agent}},
howpublished = {https://open-thoughts.ai/agent},
year = {2025}
}
