Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent Failure if custom image puts something into /opt/ml/code #222

Open
njbrake opened this issue Sep 5, 2024 · 0 comments
Open

Silent Failure if custom image puts something into /opt/ml/code #222

njbrake opened this issue Sep 5, 2024 · 0 comments

Comments

@njbrake
Copy link

njbrake commented Sep 5, 2024

Hi, I was making a new Docker image for training:

FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04
COPY src/requirements.txt /opt/ml/code/requirements.txt
RUN pip install --no-cache-dir -r /opt/ml/code/requirements.txt

And I found that when I do that, my training image could no longer find the files that usually get copied in when the container runs. I traced it back to this line, which checks if the /opt/ml/code folder exists, and if it exists at all it just skips the step that copies over the sourcedir.tar.gz file from that URI.

Should the logic be changed so that it doesn't skip downloading the file, or maybe at least it should give a warning that it's skipping the download?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant