-
Notifications
You must be signed in to change notification settings - Fork 16
Enable MAST client mode #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #405 +/- ##
==========================================
+ Coverage 64.69% 73.43% +8.73%
==========================================
Files 79 81 +2
Lines 7775 7829 +54
==========================================
+ Hits 5030 5749 +719
+ Misses 2745 2080 -665 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is incredible. I love how all the manual steps have explanations and scripts. Really solid.
export HF_HOME=/mnt/wsfuse/teamforge/hf | ||
|
||
# Download the model (replace with your desired model) | ||
huggingface-cli download Qwen/Qwen3-8B --local-dir /mnt/wsfuse/teamforge/hf_artifacts/qwen3_8b | ||
``` | ||
|
||
#### 2. Hydrate the HuggingFace Cache | ||
|
||
After downloading the weights, you need to hydrate the HuggingFace cache so that the transformers library can find the model metadata: | ||
|
||
```bash | ||
# Set HF_HOME to the OilFS path | ||
export HF_HOME=/mnt/wsfuse/teamforge/hf | ||
|
||
# Hydrate the cache for the model | ||
python .meta/mast/hydrate_cache.py --model-id Qwen/Qwen3-8B | ||
``` | ||
|
||
This ensures that when MAST runs with `HF_HUB_OFFLINE=1`, the transformers library can locate all necessary files from the cache. | ||
|
||
### Directory Structure | ||
|
||
Both cache and model files are stored under: | ||
- **Cache**: `/mnt/wsfuse/teamforge/hf` (set via `HF_HOME`) | ||
- **Model weights**: `/mnt/wsfuse/teamforge/hf/<model_name>` | ||
|
||
Make sure your MAST config files point to the correct paths in `hf_artifacts`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you automate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the million dollar question Vidhya lol
- Only worker roles (GPU hosts) are launched in MAST | ||
- Client connects to workers remotely via provisioner | ||
2. Detached mode (detached=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE.
Most of the usage info is in the README
Example run: https://www.internalfb.com/mlhub/pipelines/runs/mast/allencwang-forge-mni5vl?job_attempt=0&version=0&tab=logs&env=PRODUCTION
Only tested 1.7B, 8B and 32B - please let me know if it doesn't work