-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn pushing off for intermediate results #4868
Comments
Share a same flag with #4581? |
Two kinds of jobs had DVC done to the data, one is backup and sharing (for security), another is local caching (for speed). We currently mixed them up? |
#4581 introduces all the machinery required: the only thing missing is a flag for the user interface. That PR has already grown quite a lot from the initial scope however: maybe the best way is to wait until it's merged then add a flag in a subsequent, small PR. |
@charlesbaynham Any updates on this issue? As the ticket is still open, I assume the changes have not been released yet? |
@charlesbaynham @brnd42 Sorry for the delay, guys. Likely the |
According to the documentation here: https://dvc.org/doc/command-reference/run |
For anyone reading this |
This can now be accomplished by setting |
My original request:
Is there a way to automatically never push certain intermediate results? For example if I had a script extract_features.py which outputs the features as hdf5 and than use those hdf5 files for train.py, I would like to never push those hdf5 files, but keep them in the local cache, since we don't want to backup files that can be reproduced exactly. But I also don't want to reproduce these hdf5 files every time.
I could also just gc the remote from time to time, but I don't really want to delete anything there :/
ruslan wrote on Discord:
Hi @brnd42 ! Unfortunately there is no special feature for that right now, and as I understand you still want to cache them locally, so --outs-no-cache doesn't work for you :slight_frown: It would be a very good feature though! To implement it, we would need to introduce something like push: false to dvc.yaml for specific outputs and then just take it into considersation when collecting cache in get_used_cache. It is pretty straightforward and simple. Could you create a feature request on our github, please?
The text was updated successfully, but these errors were encountered: