Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write metadata about packaged docker image to dist/ #16999

Closed
huonw opened this issue Sep 26, 2022 · 6 comments · Fixed by #17299
Closed

Write metadata about packaged docker image to dist/ #16999

huonw opened this issue Sep 26, 2022 · 6 comments · Fixed by #17299
Labels
backend: Docker Docker backend-related issues enhancement

Comments

@huonw
Copy link
Contributor

huonw commented Sep 26, 2022

Is your feature request related to a problem? Please describe.

Currently there doesn't seem to be an easy way to determine info about a docker image that was built via ./pants package some/path:some_docker_image (and/or published via ./pants publish ...). This is unfortunate when the image repositories and/or tags are dynamic, or if one wants to use the image ID. In particular, machine use of this info (e.g. using it in terraform/cloudformation/... templates) seems to require parsing the human-focused output of the pants commands.

There's currently no record of a docker image being packaged in dist/, since the image itself is managed/stored by docker itself.

This was discussed in slack at https://pantsbuild.slack.com/archives/C046T6T9U/p1663916571660779.

Describe the solution you'd like

There was a suggestion of writing out a JSON file like dist/some.path/some_docker_image.docker_info.json that contains metadata about the image, effectively acting as a "link" to the compiled artefact.

For example:

{
    "repositories": ["example.repo"]
    "image_tags": ["pants-hash-123456789", "latest"]
    "image_id": "1234567890"
}

Questions:

  • should this be versioned somehow?
  • anything else to include?
  • is there a different format that may work better? (e.g. I imagine it may be common to want to use a shell script to interpret this output)

Describe alternatives you've considered

None, yet.

Additional context

#14657 may be tangentially related, since I could imagine it may result in layers being written out to dist/ (maybe?).

@huonw
Copy link
Contributor Author

huonw commented Sep 27, 2022

Hm, thinking about it a bit more. Potentially the schema should be different:

  • When something is templated/computed, there should be a reference to the input pattern to the output, so that there's always a fixed string that can be used to find the relevant tag, such as "image_tags": {"pants-hash-{pants.hash}": "pants-hash-123456789", "latest": "latest"} allowing indexing by "pants-hash-{pants.hash}" rather than having to do string matching with startswith or something.
  • It looks like the set of registries, repositories and tags aren't independent, so it may be better to model that structure more directly.
  • Generally docker reasons about the 'full name' of an image, rather than registry, repository and tag separately.

To spell out a proposal of the schema, in dataclasses syntax:

class DockerInfo:
    version: Literal[1] = 1 # (edit: added 2022-09-29)
    image_id: str
    digest: str
    # registry alias or address -> Registry
    # (using a dict rather than just a list[Registry] for more convenience look-ups when multiple values exist)
    registries: dict[str, Registry]

class Registry:
    # set if registry was specified as `@something`
    alias: None | str
    address: str
    repository: str
    # for convenience, include the name (using the ...@sha256:... digest)
    name_digest: str
    # tag template -> Tag
    tags: dict[str, ImageTag]

class ImageTag:
    template: str
    tag: str
    # for convenience, include the name (using this tag)
    name: str

Putting that together into an example:

# pants.toml
[docker]
default_repository = "{name}"

[docker.registries.company-registry1]
address = "reg1.company.internal"
default = true
extra_image_tags = ["dev"]

[docker.registries.company-registry2]
address = "reg2.company.internal"
repository = "example/{name}"
# BUILD
docker_image(
    name="demo",
    registries=[
        "@company-registry1",
        "@company-registry2",
        "ext-registry.company-b.net:8443",
    ],
    image_tags=["pants-hash-{pants.hash}"]`
)
# dist/.../demo.docker_info.json
{
  "version": 1,
  "image_id": "1234567890",
  "digest": "sha256:abcdef123456"
  "registries": {
    "@company-registry1": {
      "alias": "company-registry1",
      "address": "reg1.company.internal"
      "repository": "demo",
      "name_digest": "reg1.company.internal/demo@sha256:abcdef123456",
      "tags": {
        "pants-hash-{pants.hash}": {
          "template": "pants-hash-{pants.hash}",
          "tag": "pants-hash-123456789", 
          "name": "reg1.company.internal/demo:pants-hash-123456789"
        },
        "dev": {
          "template": "dev",
          "tag": "dev", 
          "name": "reg1.company.internal/demo:dev"
        }
      }
    },
    "@company-registry2": {
      "alias": "company-registry2",
      "address": "reg2.company.internal"
      "repository": "example/demo",
      "name_digest": "reg2.company.internal/example/demo@sha256:abcdef123456",
      "tags": {
        "pants-hash-{pants.hash}": {
          "template": "pants-hash-{pants.hash}",
          "tag": "pants-hash-123456789", 
          "name": "reg2.company.internal/example/demo:pants-hash-123456789"
        }
      }
    },
    "ext-registry.company-b.net:8443": {
      "alias": null,
      "address": "ext-registry.company-b.net:8443"
      "repository": "demo",
      "name_digest": "ext-registry.company-b.net:8443/demo@sha256:abcdef123456",
      "tags": {
        "pants-hash-{pants.hash}": {
          "template": "pants-hash-{pants.hash}",
          "tag": "pants-hash-123456789", 
          "name": "ext-registry.company-b.net:8443/example/demo:pants-hash-123456789"
        }
      }
    }
  }
}

A consumer of this can then load the file and read the properties:

  • for a name using a tag: data["registries"]["@company-registry1"]["tags"]["pants-hash-{pants.hash}"]["name"]
  • for a name using a digest: data["registries"]["ext-registry.company-b.net:8443"]["name_digest"]

@benjyw
Copy link
Contributor

benjyw commented Sep 28, 2022

This is interesting! Not sure I'm grokking what Registry.name_digest and ImageTag.name refer to?

@huonw
Copy link
Contributor Author

huonw commented Sep 28, 2022

Docker commands generally reason in terms of an image's registry, repository and tag all together (or digest instead of tag). That is, it's docker run reg1.company.internal/demo:dev, rather than docker run demo:dev or something, and similarly our AWS CloudFormation templates encode the full name some.registry.host/repository:tag.

Thus, since that's how this output is usually used (and often by shell scripts), it seems nicer to pre-do the concatenation (f"{address}{repository}:{tag}") rather than require every consumer of the JSON file to do it themselves.

@benjyw
Copy link
Contributor

benjyw commented Sep 29, 2022

Ah, makes sense

@benjyw
Copy link
Contributor

benjyw commented Sep 29, 2022

So it is denormalized for convenience

@calleo
Copy link

calleo commented Jan 5, 2023

So happy someone thought (and solved) this already 👍👍👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Docker Docker backend-related issues enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants