Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate Stages and introduce Labels instead #218

Merged
merged 10 commits into from
Jul 25, 2022
109 changes: 83 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ Git Tag Ops. Turn your Git repository into an Artifact Registry:

- Registry: Track new artifacts and their versions for releases and significant
changes.
- Lifecycle Management: Promote or roll back versions among a structured set of
stages.
- Lifecycle Management: Create actionable labels for versions marking status of
artifact or it's readiness to be consumed by a specific environment.
- GitOps: Signal CI/CD automation or other downstream systems to act upon these
lifecycle updates.
new versions and lifecycle updates.
- Enrichments: Annotate and query artifact metadata with additional information.

GTO works by creating annotated Git tags in a standard format.
Expand Down Expand Up @@ -64,31 +64,57 @@ several versions in a given commit, ordered by their automatic version numbers.

</details>

### Promoting
### Create a label
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved

Promote a specific artifact version to a lifecycle stage with `gto promote`.
Stages can be seen as the status of your artifact, signaling readiness for usage
by downstream systems, e.g. via CI/CD or web hooks. For example: redeploy an ML
model.
Create an actionable label for a specific artifact version with `gto label`.
Labels can mark it's readiness for a specific consumer. You can plug in a real
downsteam system via CI/CD or web hooks. For example: redeploy an ML model.

```console
$ gto promote awesome-model prod
Created git tag 'awesome-model#prod#1' that promotes 'v0.0.1'
$ gto label awesome-model prod
Created git tag 'awesome-model#prod#1' that adds label 'prod' to 'v0.0.1'
```

<details summary="What happens under the hood?">

GTO creates a special Git tag for the artifact promotion, in a standard format:
GTO creates a special Git tag in a standard format:
`{artifact_name}#{stage}#{e}`.

The event is now associated to the latest version of the artifact. There can be
multiple events for a given version, ordered by an automatic incremental event
number (`{e}`). This will keep the history of your promotions.
number (`{e}`). This will keep the history of your labels creation.

Note: if you prefer, you can use simple promotion tag format without the
incremental `{e}`, but this will disable the `gto history` command. This is
because promoting an artifact where a promotion tag already existed will require
deleting the existing tag.
Note: if you prefer, you can use simple label tag format without the incremental
`{e}`, but this will disable the `gto history` command. This is because labeling
an artifact version where a label tag already existed will require deleting the
existing tag.

</details>

### Remove a label

Sometimes you need to mark an artifact version no longer ready for a specific consumer, and maybe signal a downstream system about this. You can use `gto unlabel` for that:

```console
$ gto unlabel awesome-model prod
Created git tag 'awesome-model#prod#2!' that removes label 'prod' from 'v0.0.1'
```

<details summary="Some details and options">

GTO creates a special Git tag in a standard format:
`{artifact_name}#{stage}#{e}!`.

Note, that later you can create this label again, if you need to, by calling `$ gto label`.

You also may want to delete the git tag instead of creating a new one. This is useful if you don't want to keep extra tags in you Git repo, don't need history and don't want to trigger a CI/CD or another downstream system. For that, you can use:

```console
$ gto unlabel --delete
Deleted git tag 'awesome-model#prod#1' that added label 'prod' to 'v0.0.1'
To push the changes upsteam, run:
git push origin awesome-model#prod#1 --delete
```

</details>

Expand Down Expand Up @@ -125,7 +151,7 @@ GTO the artifact file is committed to Git.
<details summary="Virtual vs. Physical artifacts">

- Physical files/directories are committed to the repo. When you register a new
version or promote it, Git guarantees that it's immutable -- you can return a
version or label it, Git guarantees that it's immutable -- you can return a
year later and get the same artifact by providing a version.

- Virtual artifacts could be an external path (e.g. `s3://mybucket/myfile`) or a
Expand All @@ -149,15 +175,15 @@ Let's look at the usage of the `gto show` and `gto history`.
### Show the current state

This is the entire state of the registry: all artifacts, their latest versions,
and what is promoted to stages right now.
and the greatest versions for each label.

```console
$ gto show
╒═══════════════╤══════════╤════════╤═════════╤════════════╕
│ name │ latest │ #dev │ #prod │ #staging │
╞═══════════════╪══════════╪════════╪═════════╪════════════╡
│ churn │ v3.1.0 │ - │ v3.0.0 │ v3.1.0 │
│ segment │ v0.4.1 │ v0.4.1 │ - │ -
│ churn │ v3.1.0 │ v3.0.0 │ v3.0.0 │ v3.1.0 │
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
│ segment │ v0.4.1 │ v0.4.1 │ - │ v0.4.1
│ cv-class │ v0.1.13 │ - │ - │ - │
│ awesome-model │ v0.0.1 │ - │ v0.0.1 │ - │
╘═══════════════╧══════════╧════════╧═════════╧════════════╛
Expand All @@ -167,7 +193,7 @@ Here we'll see both artifacts that have Git tags only and those annotated in
`artifacts.yaml`. Use `--all-branches` or `--all-commits` to read
`artifacts.yaml` from more commits than just `HEAD`.

Add an artifact name to print all og its versions instead:
Add an artifact name to print all of its versions instead:

```console
$ gto show churn
Expand All @@ -179,6 +205,29 @@ $ gto show churn
╘════════════╧═══════════╧═════════╧═════════════════════╧═══════════════════╧══════════════╛
```

#### Enabling Stages/Kanban workflow

In some cases, you would like to have a latest label for an artifact version to
replace all the previous labels. In this case the version will have a single
label. This resembles Kanban workflow, when you "move" your artifact version
from one column ("label1") to another ("label2"). This is how MLFlow and some
other Model Registries works.

To achieve this, you can use `--last-label-for-version` flag (or `--last` for
Copy link
Contributor

@dberenbaum dberenbaum Jul 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my take since we discussed this: I don't think the long --last-label-for-version flag is needed, and the help description for the flag is probably enough to specify that --last means the last label for the version (not the last version for the label).

Also, we use latest in other places, so it might be worth considering whether last and latest have different meanings (in which case we might want to clarify each), or else use one of those words consistently.

Edit: I see latest and last do have different meanings. Still not sure about the flag name 🤔. Some ideas:

  • --collapse-versions
  • --current-labels

Not sure those are any better, and it doesn't need to block this PR, but it might be good to get more feedback on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they have different meanings. I had an idea of introducing two flags: gto show --greatest (sort by semver) and gto show latest (sort by timestamp) with greatest as a default.

Re options: Maybe --last-stage or --latest-stage could work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about --hide-old-stages? Anyway, to be clear, it shouldn't block the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aguschin @dberenbaum
Suggestion - How about we use a per-label prefix that will make this specific label behave like a stage?
We would be able to support behaviors in parallel:

  • free-form labels (user needs to assign and remove)
  • stages (assigning a stage overrides previous stages assignments per-artifact-version)
  • environments (assigning an environment to an artifact-version will cause this environment to be "unassigned" from other versions of that artifact)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That summary of the three different behaviors is great @omesser.

TBH I like the current approach by @aguschin where you can switch between those by choosing how to view the labels rather than having it coded into the tag names. It makes it much easier to play around with the different approaches without being locked into one of them. The only part I don't like is that naming/explaining is not that easy, but I don't think it would be easy no matter what approach we take.

Copy link
Contributor

@omesser omesser Jul 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main downside of having this determined by the "querying"/"viewing" command is that you cannot mix and match. This prevents from using GTO to manage environments if you already use it for kanban-like stages and vice versa.
To put it another way - if I build my workflow and CI to treat gto labels (or specific labels) like kanban-stages, it will never make sense to view those as something else. If I use them as deployment environments - it will never make sense to query them as anything else. querying it the wrong way is only room for error, there is not upside to that flexibility after the label mechanism was decided on assignment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omesser, I agree with your argument, but I think flexibility is more important currently for us to see what people want. Also, an important question you raised is how this will behave in Studio when a team collaborates. A misunderstanding may appear when different users interpret the labels with different mechanics (e.g. one with free-form labels and one with stages).

On the second point, I don't think I'll be able to implement this before Studio release, and I doubt it will be handy without Studio BE/FE support cause it just won't be shown there. So let's continue to discuss this and gather some users' feedback to see how they want this to work, and then let's implement this in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we don't have much choice in the short term. In the medium/long term, separate concepts of "stages" and "environments" seem useful.

short):

```console
$ gto show --last
╒═══════════════╤══════════╤════════╤═════════╤════════════╕
│ name │ latest │ #dev │ #prod │ #staging │
╞═══════════════╪══════════╪════════╪═════════╪════════════╡
│ churn │ v3.1.0 │ - │ v3.0.0 │ v3.1.0 │
│ segment │ v0.4.1 │ v0.4.1 │ - │ - │
│ cv-class │ v0.1.13 │ - │ - │ - │
│ awesome-model │ v0.0.1 │ - │ v0.0.1 │ - │
╘═══════════════╧══════════╧════════╧═════════╧════════════╛
```

### See the history of an artifact

`gto history` will print a journal of the events that happened to an artifact.
Expand Down Expand Up @@ -265,18 +314,26 @@ $ gto which churn prod --ref
churn#prod#2
```

<details summary="Kanban/Stages workflow">
If you prefer Kanban/Stages workflow described above, you could use `--last` flag:

```console
$ gto which churn prod --last
v3.1.0
dberenbaum marked this conversation as resolved.
Show resolved Hide resolved
```

This will take into account the last label for a version only.

</details>

To get details about an artifact (from `artifacts.yaml`) use `gto describe`:

```console
$ gto describe churn
```

```yaml
{
"type": "model",
"path": "models/churn.pkl",
"virtual": false
}
{ "type": "model", "path": "models/churn.pkl", "virtual": false }
```

> The output is in JSON format for ease of parsing programatically.
Expand Down