Write metadata about packaged docker image to dist/ #16999

huonw · 2022-09-26T00:21:47Z

Is your feature request related to a problem? Please describe.

Currently there doesn't seem to be an easy way to determine info about a docker image that was built via ./pants package some/path:some_docker_image (and/or published via ./pants publish ...). This is unfortunate when the image repositories and/or tags are dynamic, or if one wants to use the image ID. In particular, machine use of this info (e.g. using it in terraform/cloudformation/... templates) seems to require parsing the human-focused output of the pants commands.

There's currently no record of a docker image being packaged in dist/, since the image itself is managed/stored by docker itself.

This was discussed in slack at https://pantsbuild.slack.com/archives/C046T6T9U/p1663916571660779.

Describe the solution you'd like

There was a suggestion of writing out a JSON file like dist/some.path/some_docker_image.docker_info.json that contains metadata about the image, effectively acting as a "link" to the compiled artefact.

For example:

{
    "repositories": ["example.repo"]
    "image_tags": ["pants-hash-123456789", "latest"]
    "image_id": "1234567890"
}

Questions:

should this be versioned somehow?
anything else to include?
is there a different format that may work better? (e.g. I imagine it may be common to want to use a shell script to interpret this output)

Describe alternatives you've considered

None, yet.

Additional context

#14657 may be tangentially related, since I could imagine it may result in layers being written out to dist/ (maybe?).

The text was updated successfully, but these errors were encountered:

huonw · 2022-09-27T22:16:27Z

Hm, thinking about it a bit more. Potentially the schema should be different:

When something is templated/computed, there should be a reference to the input pattern to the output, so that there's always a fixed string that can be used to find the relevant tag, such as "image_tags": {"pants-hash-{pants.hash}": "pants-hash-123456789", "latest": "latest"} allowing indexing by "pants-hash-{pants.hash}" rather than having to do string matching with startswith or something.
It looks like the set of registries, repositories and tags aren't independent, so it may be better to model that structure more directly.
Generally docker reasons about the 'full name' of an image, rather than registry, repository and tag separately.

To spell out a proposal of the schema, in dataclasses syntax:

class DockerInfo:
    version: Literal[1] = 1 # (edit: added 2022-09-29)
    image_id: str
    digest: str
    # registry alias or address -> Registry
    # (using a dict rather than just a list[Registry] for more convenience look-ups when multiple values exist)
    registries: dict[str, Registry]

class Registry:
    # set if registry was specified as `@something`
    alias: None | str
    address: str
    repository: str
    # for convenience, include the name (using the ...@sha256:... digest)
    name_digest: str
    # tag template -> Tag
    tags: dict[str, ImageTag]

class ImageTag:
    template: str
    tag: str
    # for convenience, include the name (using this tag)
    name: str

Putting that together into an example:

# pants.toml
[docker]
default_repository = "{name}"

[docker.registries.company-registry1]
address = "reg1.company.internal"
default = true
extra_image_tags = ["dev"]

[docker.registries.company-registry2]
address = "reg2.company.internal"
repository = "example/{name}"

# BUILD
docker_image(
    name="demo",
    registries=[
        "@company-registry1",
        "@company-registry2",
        "ext-registry.company-b.net:8443",
    ],
    image_tags=["pants-hash-{pants.hash}"]`
)

# dist/.../demo.docker_info.json
{
  "version": 1,
  "image_id": "1234567890",
  "digest": "sha256:abcdef123456"
  "registries": {
    "@company-registry1": {
      "alias": "company-registry1",
      "address": "reg1.company.internal"
      "repository": "demo",
      "name_digest": "reg1.company.internal/demo@sha256:abcdef123456",
      "tags": {
        "pants-hash-{pants.hash}": {
          "template": "pants-hash-{pants.hash}",
          "tag": "pants-hash-123456789", 
          "name": "reg1.company.internal/demo:pants-hash-123456789"
        },
        "dev": {
          "template": "dev",
          "tag": "dev", 
          "name": "reg1.company.internal/demo:dev"
        }
      }
    },
    "@company-registry2": {
      "alias": "company-registry2",
      "address": "reg2.company.internal"
      "repository": "example/demo",
      "name_digest": "reg2.company.internal/example/demo@sha256:abcdef123456",
      "tags": {
        "pants-hash-{pants.hash}": {
          "template": "pants-hash-{pants.hash}",
          "tag": "pants-hash-123456789", 
          "name": "reg2.company.internal/example/demo:pants-hash-123456789"
        }
      }
    },
    "ext-registry.company-b.net:8443": {
      "alias": null,
      "address": "ext-registry.company-b.net:8443"
      "repository": "demo",
      "name_digest": "ext-registry.company-b.net:8443/demo@sha256:abcdef123456",
      "tags": {
        "pants-hash-{pants.hash}": {
          "template": "pants-hash-{pants.hash}",
          "tag": "pants-hash-123456789", 
          "name": "ext-registry.company-b.net:8443/example/demo:pants-hash-123456789"
        }
      }
    }
  }
}

A consumer of this can then load the file and read the properties:

for a name using a tag: data["registries"]["@company-registry1"]["tags"]["pants-hash-{pants.hash}"]["name"]
for a name using a digest: data["registries"]["ext-registry.company-b.net:8443"]["name_digest"]

benjyw · 2022-09-28T17:57:29Z

This is interesting! Not sure I'm grokking what Registry.name_digest and ImageTag.name refer to?

huonw · 2022-09-28T23:28:45Z

Docker commands generally reason in terms of an image's registry, repository and tag all together (or digest instead of tag). That is, it's docker run reg1.company.internal/demo:dev, rather than docker run demo:dev or something, and similarly our AWS CloudFormation templates encode the full name some.registry.host/repository:tag.

Thus, since that's how this output is usually used (and often by shell scripts), it seems nicer to pre-do the concatenation (f"{address}{repository}:{tag}") rather than require every consumer of the JSON file to do it themselves.

benjyw · 2022-09-29T17:48:58Z

Ah, makes sense

benjyw · 2022-09-29T17:49:27Z

So it is denormalized for convenience

calleo · 2023-01-05T19:30:03Z

So happy someone thought (and solved) this already 👍👍👍

huonw added the enhancement label Sep 26, 2022

thejcannon added the backend: Docker Docker backend-related issues label Sep 30, 2022

huonw mentioned this issue Oct 20, 2022

Export metadata about a packaged docker image #17299

Merged

kaos closed this as completed in #17299 Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write metadata about packaged docker image to dist/ #16999

Write metadata about packaged docker image to dist/ #16999

huonw commented Sep 26, 2022 •

edited

Loading

huonw commented Sep 27, 2022 •

edited

Loading

benjyw commented Sep 28, 2022 •

edited

Loading

huonw commented Sep 28, 2022

benjyw commented Sep 29, 2022

benjyw commented Sep 29, 2022

calleo commented Jan 5, 2023

Write metadata about packaged docker image to dist/ #16999

Write metadata about packaged docker image to dist/ #16999

Comments

huonw commented Sep 26, 2022 • edited Loading

huonw commented Sep 27, 2022 • edited Loading

benjyw commented Sep 28, 2022 • edited Loading

huonw commented Sep 28, 2022

benjyw commented Sep 29, 2022

benjyw commented Sep 29, 2022

calleo commented Jan 5, 2023

huonw commented Sep 26, 2022 •

edited

Loading

huonw commented Sep 27, 2022 •

edited

Loading

benjyw commented Sep 28, 2022 •

edited

Loading