-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce ray job cold time #825
base: main
Are you sure you want to change the base?
Changes from 19 commits
0ea7a00
2100cfc
cac5994
cf4aad0
bc691ad
c52b25c
047193b
0373e5c
edb2899
dad7db7
dc77037
640f7df
93fbe69
c201dcb
dddf260
8ed4552
62ba702
eff24b9
018b808
c74bb1e
5601431
24a37f2
a1794d9
e5de144
535d4eb
4bf78ff
3b1c63c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Dockerfile.huggingface-cache | ||
FROM python:3.11-slim | ||
|
||
# Install required packages: transformers and huggingface_hub | ||
RUN pip install --no-cache-dir transformers huggingface_hub | ||
|
||
# Ensure the cache directory exists (snapshot_download will create its own subfolders) | ||
RUN mkdir -p /home/ray/.cache/huggingface/hub | ||
|
||
# Use the huggingface_hub API to download the model exactly as the Hub does. | ||
# This will create a folder with the proper structure (e.g. blobs, refs, snapshots). | ||
RUN python -c "\ | ||
from huggingface_hub import snapshot_download; \ | ||
model_path = snapshot_download('facebook/bart-large-cnn', cache_dir='/home/ray/.cache/huggingface/hub'); \ | ||
print('Model downloaded to:', model_path)\ | ||
" | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we want to have more than one model, perhaps we could have something like
WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Definitely happy with the addition to have more than 1 model (look at my comment above). Not sure I follow about the path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My bad, sorry, I did not explain it properly! |
||
# Exit immediately (this container’s only job is to populate the cache) | ||
CMD ["/bin/true"] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,16 @@ name: lumigator | |
|
||
services: | ||
|
||
inference-model: | ||
build: | ||
context: . | ||
dockerfile: cache/Dockerfile.model-inference | ||
platform: linux/${ARCH} | ||
command: /bin/true | ||
volumes: | ||
- huggingface_cache_vol:/home/ray/.cache/huggingface | ||
profiles: | ||
- local | ||
minio: | ||
labels: | ||
ai.mozilla.product_name: lumigator | ||
|
@@ -68,6 +78,8 @@ services: | |
depends_on: | ||
redis: | ||
condition: service_healthy | ||
inference-model: | ||
condition: service_completed_successfully | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As this will take a while, what are we planning to do with the other services in the meantime? Options:
|
||
ports: | ||
- "6379:6379" | ||
- "${RAY_DASHBOARD_PORT}:${RAY_DASHBOARD_PORT}" | ||
|
@@ -83,18 +95,18 @@ services: | |
- -c | ||
- | | ||
set -eaux | ||
mkdir -p /tmp/ray_pip_cache | ||
mkdir -p /home/ray/.cache/ && mkdir -p /tmp/ray_pip_cache | ||
sudo chmod -R 777 /home/ray/.cache/ && sudo chmod -R 777 /tmp/ray_pip_cache/ || true | ||
RAY_JOB_ALLOW_DRIVER_ON_WORKER_NODES=1 RAY_REDIS_ADDRESS=redis:6379 ray start --head --dashboard-port=${RAY_DASHBOARD_PORT} --port=6379 --dashboard-host=0.0.0.0 --ray-client-server-port 10001 | ||
# If the file was mounted in a volume instead of | ||
# a shared dir, permissions need to be setup | ||
# ... || true allows this to fail (-e is set) | ||
sudo chmod -R 777 /tmp/ray_pip_cache/ || true | ||
RAY_JOB_ALLOW_DRIVER_ON_WORKER_NODES=1 RAY_REDIS_ADDRESS=redis:6379 ray start --head --dashboard-port=${RAY_DASHBOARD_PORT} --port=6379 --dashboard-host=0.0.0.0 --ray-client-server-port 10001 | ||
mkdir -p /tmp/ray/session_latest/runtime_resources/pip | ||
rmdir /tmp/ray/session_latest/runtime_resources/pip/ && ln -s /tmp/ray_pip_cache /tmp/ray/session_latest/runtime_resources/pip | ||
sleep infinity | ||
shm_size: 2g | ||
volumes: | ||
- ${HOME}/.cache/huggingface:/home/ray/.cache/huggingface | ||
- huggingface_cache_vol:/home/ray/.cache/huggingface | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any strong reason for moving this to a volume? This makes lumigator's cache not interoperable with the HF cache (that might already reside on users' machines). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is definitely one of the problems this PR may introduce (on top of slower CI times) I moved this to a volume because we create that volume before, so we ensure the bart image is there (reducing first time to experiment). Without it being a volume, I'm not sure if I know how I can add this to ray cache. |
||
- ray-pip-cache:/tmp/ray_pip_cache | ||
deploy: | ||
resources: | ||
|
@@ -242,6 +254,9 @@ volumes: | |
redis-data: | ||
labels: | ||
ai.mozilla.product_name: lumigator | ||
huggingface_cache_vol: | ||
labels: | ||
ai.mozilla.product_name: lumigator | ||
ray-pip-cache: | ||
labels: | ||
ai.mozilla.product_name: lumigator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be great if we added in the docs (1) what models are downloaded (right now it's bart alone, right? I'd suggest roberta-large too for the bertscore metric), (2) their exact size (bart+roberta are less than 3GB), and (3) how this can be disabled if e.g. someone has no intention of ever running bart. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bart has to run right now to generate GT, that's why I added it as a kind of mandatory model. As it is, we can't disable it (apart from manually removing the service from docker-compose, which is not very user friendly I'd say.
We could maybe add a list of variables that is the models you want to predownload into Ray's cache. Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that'd be great! For instance right now we are using roberta-large for bertscore evaluations so the models are already two, and having a list we could point users to makes it easy for them to customise it. Thank you!