Data pipeline using DVC #6805
ijlyttle
started this conversation in
Show and Tell
Replies: 2 comments 4 replies
-
Thanks for sharing @ijlyttle ! DVC is indeed an interesting tool ! I won't be physically at conf, but happy to discuss all this virtually or next time we'll meet! |
Beta Was this translation helpful? Give feedback.
4 replies
-
This sounds neat. Hope the talk went well. Do you have any demo code or presentation published on your session? Would love to have a closer look at dvc cml + quarto. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Demo of DVC with Quarto
DVC is an open-source project, an acronym for "data version control". It can be used to:
This sounds a lot like targets (for R) and pins (for R and Python), both of which I respect greatly. One of the big differences is that DVC can live outside your code, as an orchestrator, whereas, my understanding of targets and pins is that they are integrated into the pipeline code.
As an orchestrator, I wanted to try out DVC as a complementary tool to Quarto, so I made a small pipeline (repo). This pipeline is made using Python, but like Quarto, I don't think DVC "cares" about the languages used in the pipeline; it just runs
quarto render
on a file if it is invalidated. This required me tofreeze
the Quarto files in the pipeline, so that a globalquarto render
will work for the "rest" of the project.I use an S3 bucket as backing, but DVC supports a variety of remote storage.
One point that I wanted to bring up here is that I found
.qmd
files to be suitable for the pipeline, but not.ipynb
files. I may be putting this imprecisely, but running a.qmd
file does not change the.qmd
file; however, running a.ipynb
file changes it. If a pipeline's dependencies includes a.ipynb
file, I can see things becoming circular.That said, there is an issue at DVC where they discuss allowing a dependency to filter for the (in this case) Python code - but it seems to have stalled.
FWIW, I'll be at posit::conf() if anyone wants to discuss. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions