-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bundle diff and update #214
Comments
@jakedsouza @galvare2 here's a design sketch of the CLI to review
|
initial draft notesi haven't written a particularly performant diffing algorithm.
more specifically means that the files will be the same. some empty directories might be left over. |
If there is no difference, bundle update will do nothing, correct? So there would be no need to first do bundle diff and then only do bundle update if a diff is found? |
I think for flood and seismic immediate requirements, bundle update seems more important than diff. |
yes, this correct: under the hood, |
inter-process concurrency (IPC)requirementsin addition to the design sketch mentioned above, we need to address the possibility of multiple discussiona usual methodology for IPC by version control systems is lockfiles. in
so that, specifically, existence of the in datamon, the same another option is using a lockfile at the command level, decoupling the locking on operations at a particular it's possible that there's a channel-based way to implement IPC, yet lockfiles seem like a good place to start since they're a better-understood solution. finally, in the case of |
in order to move data from Argo data-science workflows to production services (that is, the programs more directly connected to an externally visible web program) via datamon, the plan is to use either bundle ids or labels (which resolve to bundle ids) to provide of referring to workflow output from the production services. now, from one such reference (bundle id) to the next, there could in general be a fair amount of duplicate files in the data-science artifact that production needs to access. production is using
bundle download
(not the FUSE fs) to access the data, so supposing that there's already a bundle downloaded to the production environment, there need be a way, given the next reference, to (1) describe the diff between the currently downloaded bundle (in terms of file lists, not file contents) as well as (2) updated the downloaded bundle on disk such that only the differences as resolved, not such that the entirety of the bundle is downloaded again.as is, the result of a
bundle download
contains a.datamon/
directory with the bundle's metadata (file lists with names and hashes). so to provide the diff, it'll suffice to compare the metadata available on GCS to the local.datamon/
directory. then updating the bundle on disk will consist of concurrently iterating through the diff, (i) adding missing files, (ii) removing additional files, and (iii) replacing any file with a differing hash. afterward, the.datamon/
directory metadata will need to be updated such that the result of an update is the same as the result of freshbundle download
: there is not a local history being stored within.datamon/
as with.git/
.note that this iss is distinct from the similarly-titled #204 : that issue has to do with updating a bundle stored in GCS via local changes (specifically, via the FUSE filesystem abstraction). this issue is about updating a bundle stored locally (after a
bundle download
) with changes from GCS.The text was updated successfully, but these errors were encountered: