Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

directory structure for production deployment #734

Closed
analystanand opened this issue May 29, 2018 · 5 comments
Closed

directory structure for production deployment #734

analystanand opened this issue May 29, 2018 · 5 comments

Comments

@analystanand
Copy link

Currently we are planing to utilize dvc for our projects.

  • Anyone utilizing the dvc for production as it can put everything(model.p) into dvc and will hide them for repo.
@efiop
Copy link
Contributor

efiop commented May 29, 2018

Hi @analystanand !

Could you please clarify your question? Are you asking if you can hide .dvc files into a separate directory? If so, the answer is you can, but we do not recommend that and instead suggest storing dvcfiles alongside your code. That being said, if you really need to hide them, consider using -f and -c options for dvc run and make sure that the commands you run are fully aware that they are being run from the directories that respective dvcfiles are located in. Also be aware of that dvc add will only put respective dvcfile alongside the data file that you are trying to add, so If you really need to hide it, you will have to modify paths in respective dvcfiles. Dvcfiles have a very simple yaml format, so editing them by-hand should not be a problem.

Thanks,
Ruslan

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label May 29, 2018
@analystanand
Copy link
Author

analystanand commented May 29, 2018

I find quite easy to reproduce experiments from one system to another with the help of dvc.

But As we have to deploy projects.. that we need to manually take model.p from master branch and put them into git for loading while prediction.

In general while creating we add data/* to DVC but to deploy our remote repo. we need just model.pickle. which needs to manually placed out dvc..

My question is can we create model.pickle which will be in pipeline as well as available in git repo to deploy directly from continuous integration.

see the structure. https://drivendata.github.io/cookiecutter-data-science/

@efiop
Copy link
Contributor

efiop commented May 29, 2018

Ah, I see. Sure, just use -O model.pickle(-O is equivalent to --outs-no-cache, which will tell dvc to not cache this output, so that you can git add it yourself and store it in git) instead of -o model.pickle in dvc run when defining your pipeline(or add cache=true for model.pickle in the respective dvcfile that has model.pickle as an output).

@analystanand
Copy link
Author

got it. thanks

@efiop efiop removed the awaiting response we are waiting for your reply, please respond! :) label May 29, 2018
@efiop
Copy link
Contributor

efiop commented May 29, 2018

You are welcome :) Closing for now, feel free to reopen.

@efiop efiop closed this as completed May 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants