Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Cherry-pick a commit #3162

Closed
clintonmonk opened this issue Apr 3, 2022 · 0 comments · Fixed by #5483
Closed

Feature: Cherry-pick a commit #3162

clintonmonk opened this issue Apr 3, 2022 · 0 comments · Fixed by #5483
Assignees
Labels
area/cataloger Improvements or additions to the cataloger contributor proposal team/versioning-engine Team versioning engine

Comments

@clintonmonk
Copy link

Feature Request

One of two:

  1. Add a single method that copies a <path> from a <ref> to a target branch <branch>. The copied files would be staged on the target branch. The method would only copy metadata. The underlying physical files would still only have one copy and would remain untouched by this method.

  2. Add a cherry-pick method that copies a <commit> to a target branch <branch>. The behavior would mirror git cherry-pick.

Motivation

We plan to use LakeFS to back numerous data workflows. Our ingestion workflows run regularly on their own schedules to bring external data into our system. Other workflows then operate on that data to produce data that we then share with our customers. When the latter workflows run, they need a consistent state from which to run.

We plan to use a main branch to store the latest versions of the external data. We will create release branches from main when we need to run the other workflows to produce data that is shared with our customers. When the release is ready, we will tag the commit.

While this setup seems like it will meet all of our needs, there is an exception: we may want to bring a newer version of some external datasets from main onto the release branch.

  • We choose not to wait to create the release branch until that is ready on main because the first workflows to run on the release branch take a lot of time to run (days) and do not depend upon these datasets, so we choose to start them as soon as possible.
  • We choose not to merge main into the release branch (essentially rebasing the release branch) because we only want to update a subset of the datasets. If we update all of the datasets, we would need to re-run all of the workflows on the release branch.
  • We could change our branching strategy, but I am hopeful that we can continue to use a single release branch for a given release.

It would be nice if there were a first-class method for "copying" (metadata only -- there should still only be one instance of the data in the underlying filesystem) a dataset from one branch to another. Some operations currently exist that can be assembled to do this, but it would be nice if there were a single unified operation that could be used here instead.

We plan to merge release branches back into main after their tags are created. The benefit of merging into main is that the derived data on the release branch is updated on main. Actions can then be triggered to do something with the derived data on main, such as updating our customer-facing APIs (yes, these actions could be performed by tags as well). If we are doing this merge, we will likely run into merge conflicts between the release branch and main if both updated the same external datasets. If we could do the copying at the commit-level with the same commit ID (i.e. cherry-pick the commit from main onto the release branch), then we may be able to avoid the merge conflict.

@arielshaqed arielshaqed added area/cataloger Improvements or additions to the cataloger proposal labels Apr 3, 2022
@itaiad200 itaiad200 added the team/versioning-engine Team versioning engine label Apr 4, 2022
@itaiad200 itaiad200 self-assigned this Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cataloger Improvements or additions to the cataloger contributor proposal team/versioning-engine Team versioning engine
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants