Enable MAST client mode #405

allenwang28 · 2025-10-14T18:25:56Z

Most of the usage info is in the README

Example run: https://www.internalfb.com/mlhub/pipelines/runs/mast/allencwang-forge-mni5vl?job_attempt=0&version=0&tab=logs&env=PRODUCTION

Only tested 1.7B, 8B and 32B - please let me know if it doesn't work

codecov-commenter · 2025-10-16T17:59:02Z

Codecov Report

❌ Patch coverage is 11.36364% with 39 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.43%. Comparing base (633b219) to head (d33662b).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
src/forge/controller/launcher.py	11.36%	39 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #405      +/-   ##
==========================================
+ Coverage   64.69%   73.43%   +8.73%     
==========================================
  Files          79       81       +2     
  Lines        7775     7829      +54     
==========================================
+ Hits         5030     5749     +719     
+ Misses       2745     2080     -665

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LucasLLC

This is incredible. I love how all the manual steps have explanations and scripts. Really solid.

vidhyav · 2025-10-16T18:13:31Z

.meta/mast/README.md

+export HF_HOME=/mnt/wsfuse/teamforge/hf
+
+# Download the model (replace with your desired model)
+huggingface-cli download Qwen/Qwen3-8B --local-dir /mnt/wsfuse/teamforge/hf_artifacts/qwen3_8b
+```
+
+#### 2. Hydrate the HuggingFace Cache
+
+After downloading the weights, you need to hydrate the HuggingFace cache so that the transformers library can find the model metadata:
+
+```bash
+# Set HF_HOME to the OilFS path
+export HF_HOME=/mnt/wsfuse/teamforge/hf
+
+# Hydrate the cache for the model
+python .meta/mast/hydrate_cache.py --model-id Qwen/Qwen3-8B
+```
+
+This ensures that when MAST runs with `HF_HUB_OFFLINE=1`, the transformers library can locate all necessary files from the cache.
+
+### Directory Structure
+
+Both cache and model files are stored under:
+- **Cache**: `/mnt/wsfuse/teamforge/hf` (set via `HF_HOME`)
+- **Model weights**: `/mnt/wsfuse/teamforge/hf/<model_name>`
+
+Make sure your MAST config files point to the correct paths in `hf_artifacts`.


Why can't you automate this?

this is the million dollar question Vidhya lol

vidhyav · 2025-10-16T18:15:23Z

src/forge/controller/launcher.py

+       - Only worker roles (GPU hosts) are launched in MAST
+       - Client connects to workers remotely via provisioner
+
+    2. Detached mode (detached=True):


allenwang28 added 3 commits October 13, 2025 15:10

changes for forge client mode

d8d0a33

initial commit

63b3274

park

834f4f9

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 14, 2025

allenwang28 added 17 commits October 14, 2025 12:29

park changes

6d38ae6

fixes

d308862

nmerge

4fdc2bc

almost there, but keeps failing on dcp load despite me not setting that

f5d6eb9

park again

be3c446

gsm local

5a4ddeb

?

3e7b605

1.7b is running!

e6a0025

Merge branch 'main' into mast_client

cc14604

cleanups

ed832f4

update all yamls

d589ffc

fix path

c630f55

comment out hf_home

4a00205

affinity

42d83de

hydrate cache

a2b3a9e

change hf_artifacts to hf

37276d6

Merge branch 'main' into mast_client

d33662b

allenwang28 requested a review from LucasLLC October 16, 2025 17:51

allenwang28 marked this pull request as ready for review October 16, 2025 17:51

allenwang28 changed the title ~~[wip] MAST client mode~~ Enable MAST client mode Oct 16, 2025

LucasLLC approved these changes Oct 16, 2025

View reviewed changes

update readme

c84f796

vidhyav reviewed Oct 16, 2025

View reviewed changes

allenwang28 merged commit cf55407 into meta-pytorch:main Oct 16, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable MAST client mode #405

Enable MAST client mode #405

allenwang28 commented Oct 14, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 16, 2025

Uh oh!

LucasLLC left a comment

Uh oh!

vidhyav Oct 16, 2025

Uh oh!

allenwang28 Oct 16, 2025

Uh oh!

vidhyav Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Enable MAST client mode #405

Enable MAST client mode #405

Conversation

allenwang28 commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Oct 16, 2025

Codecov Report

Uh oh!

LucasLLC left a comment

Choose a reason for hiding this comment

Uh oh!

vidhyav Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

allenwang28 Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

vidhyav Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

allenwang28 commented Oct 14, 2025 •

edited

Loading