Feature: Cherry-pick a commit #3162
Labels
area/cataloger
Improvements or additions to the cataloger
contributor
proposal
team/versioning-engine
Team versioning engine
Feature Request
One of two:
Add a single method that copies a
<path>
from a<ref>
to a target branch<branch>
. The copied files would be staged on the target branch. The method would only copy metadata. The underlying physical files would still only have one copy and would remain untouched by this method.Add a cherry-pick method that copies a
<commit>
to a target branch<branch>
. The behavior would mirrorgit cherry-pick
.Motivation
We plan to use LakeFS to back numerous data workflows. Our ingestion workflows run regularly on their own schedules to bring external data into our system. Other workflows then operate on that data to produce data that we then share with our customers. When the latter workflows run, they need a consistent state from which to run.
We plan to use a
main
branch to store the latest versions of the external data. We will create release branches frommain
when we need to run the other workflows to produce data that is shared with our customers. When the release is ready, we will tag the commit.While this setup seems like it will meet all of our needs, there is an exception: we may want to bring a newer version of some external datasets from
main
onto the release branch.main
because the first workflows to run on the release branch take a lot of time to run (days) and do not depend upon these datasets, so we choose to start them as soon as possible.main
into the release branch (essentially rebasing the release branch) because we only want to update a subset of the datasets. If we update all of the datasets, we would need to re-run all of the workflows on the release branch.It would be nice if there were a first-class method for "copying" (metadata only -- there should still only be one instance of the data in the underlying filesystem) a dataset from one branch to another. Some operations currently exist that can be assembled to do this, but it would be nice if there were a single unified operation that could be used here instead.
We plan to merge release branches back into
main
after their tags are created. The benefit of merging intomain
is that the derived data on the release branch is updated onmain
. Actions can then be triggered to do something with the derived data onmain
, such as updating our customer-facing APIs (yes, these actions could be performed by tags as well). If we are doing this merge, we will likely run into merge conflicts between the release branch andmain
if both updated the same external datasets. If we could do the copying at the commit-level with the same commit ID (i.e. cherry-pick the commit frommain
onto the release branch), then we may be able to avoid the merge conflict.The text was updated successfully, but these errors were encountered: