Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for automated multi-tagging and reducing the number of stack files #214

Merged
merged 17 commits into from
Aug 11, 2021

Conversation

eitsupi
Copy link
Member

@eitsupi eitsupi commented Aug 10, 2021

Currently, the stack files (and files generated from stack files) do not contain so-called "shared tags" information such as rocker/r-ver:latest.
This makes it difficult to track the tags of currently published images.
So, update make-stacks.R so that all shared tags are automatically included on the stack file.
This makes it easy to build tasks such as downloading all tags from DockerHub and recording the contents of the image in reports.
Using jq (installed in GitHub Actions runners), we can extract values from a json file easily and loop through them using foreach in Makefiles.
For example:

$ jq '.stack[].tags[]' -r stacks/4.1.1.json
docker.io/rocker/r-ver:4.1.1
docker.io/rocker/r-ver:4.1
docker.io/rocker/r-ver:4
docker.io/rocker/r-ver:latest
docker.io/rocker/rstudio:4.1.1
docker.io/rocker/rstudio:4.1
docker.io/rocker/rstudio:4
docker.io/rocker/rstudio:latest
docker.io/rocker/tidyverse:4.1.1
docker.io/rocker/tidyverse:4.1
docker.io/rocker/tidyverse:4
docker.io/rocker/tidyverse:latest
docker.io/rocker/verse:4.1.1
docker.io/rocker/verse:4.1
docker.io/rocker/verse:4
docker.io/rocker/verse:latest
docker.io/rocker/geospatial:4.1.1
docker.io/rocker/geospatial:4.1
docker.io/rocker/geospatial:4
docker.io/rocker/geospatial:latest
docker.io/rocker/shiny:4.1.1
docker.io/rocker/shiny:4.1
docker.io/rocker/shiny:4
docker.io/rocker/shiny:latest
docker.io/rocker/shiny-verse:4.1.1
docker.io/rocker/shiny-verse:4.1
docker.io/rocker/shiny-verse:4
docker.io/rocker/shiny-verse:latest
docker.io/rocker/binder:4.1.1
docker.io/rocker/binder:4.1
docker.io/rocker/binder:4
docker.io/rocker/binder:latest
docker.io/rocker/cuda:4.1.1-cuda10.1
docker.io/rocker/cuda:4.1-cuda10.1
docker.io/rocker/cuda:4-cuda10.1
docker.io/rocker/cuda:cuda10.1
docker.io/rocker/cuda:4.1.1
docker.io/rocker/cuda:4.1
docker.io/rocker/cuda:4
docker.io/rocker/cuda:latest
docker.io/rocker/r-ver:4.1.1-cuda10.1
docker.io/rocker/ml:4.1.1-cuda10.1
docker.io/rocker/ml:4.1-cuda10.1
docker.io/rocker/ml:4-cuda10.1
docker.io/rocker/ml:cuda10.1
docker.io/rocker/ml:4.1.1
docker.io/rocker/ml:4.1
docker.io/rocker/ml:4
docker.io/rocker/ml:latest
docker.io/rocker/ml-verse:4.1.1-cuda10.1
docker.io/rocker/ml-verse:4.1-cuda10.1
docker.io/rocker/ml-verse:4-cuda10.1
docker.io/rocker/ml-verse:cuda10.1
docker.io/rocker/ml-verse:4.1.1
docker.io/rocker/ml-verse:4.1
docker.io/rocker/ml-verse:4
docker.io/rocker/ml-verse:latest

$ jq '.stack[].tags[]' -r stacks/4.1.0.json
docker.io/rocker/r-ver:4.1.0
docker.io/rocker/rstudio:4.1.0
docker.io/rocker/tidyverse:4.1.0
docker.io/rocker/verse:4.1.0
docker.io/rocker/geospatial:4.1.0
docker.io/rocker/shiny:4.1.0
docker.io/rocker/shiny-verse:4.1.0
docker.io/rocker/binder:4.1.0
docker.io/rocker/cuda:4.1.0-cuda10.1
docker.io/rocker/cuda:4.1.0
docker.io/rocker/r-ver:4.1.0-cuda10.1
docker.io/rocker/ml:4.1.0-cuda10.1
docker.io/rocker/ml:4.1.0
docker.io/rocker/ml-verse:4.1.0-cuda10.1
docker.io/rocker/ml-verse:4.1.0

This PR contains shared tags such as rocker/r-ver:4.0 that are not currently used, but these tags will also be published on DockerHub when we change to build with the docker buildx bake command.

Since it is desirable to have a single stack file when extracting tags, etc., as I have done above, I will merge the stack files that are currently being built with GitHub Actions into a single file for each R version.

ToDo

  • Merge with Automatic update of container definition files #213 and reflect the release of R4.1.1.
  • Integration of stack files.
    • Check the effect on the generated files.
    • Fix workflow for building devel images. Generate core-devel.yml from devel.json.
    • Update the build workflow, core.yml.
    • Update sync-template-vars.R. (Includes build core latest daily images every day #194 unfinished feature, sync with core-latest-daily.json)
    • Update docments.
    • Update the Makefile
  • Test GitHub Actions on my fork

@eitsupi eitsupi mentioned this pull request Aug 10, 2021
36 tasks
@eitsupi
Copy link
Member Author

eitsupi commented Aug 10, 2021

With this improvement, we can give the tag rocker/ml:cuda10.1 to the latest version of rocker/ml:X.Y.Z-cuda10.1.
Does it look good?

@cboettig
Copy link
Member

Yeah, this is definitely an issue. Long term I think this will be better when we can consolidate the stack files rather than duplicate them for each tag, yes?

With this improvement, we can give the tag rocker/ml:cuda10.1 to the latest version of rocker/ml:X.Y.Z-cuda10.1.

🎉 🎉

@eitsupi
Copy link
Member Author

eitsupi commented Aug 10, 2021

Yeah, this is definitely an issue. Long term I think this will be better when we can consolidate the stack files rather than duplicate them for each tag, yes?

Yes, I think we can make a single stack file for each R version like X.Y.Z.json in the near future.

One of the problems preventing this at the moment is that IMAGE has the same r-ver in ml-cuda10.1 and core.
We are using this IMAGE as the part of tag during docker-compose build, so I think we can put in the work to eliminate duplicate r-ver and unify the stack files after we change the build method to docker buildx bake.

@cboettig
Copy link
Member

One of the problems preventing this at the moment is that IMAGE has the same r-ver in ml-cuda10.1 and core.

You mean this, right?

Yeah, we should probably change the image name there anyway, saying this is just r-ver with an additional tag added probably is not be the best way to describe that image -- rocker/cuda might be a more accurate name. Generally we use tags only to specify versions, not add content. that build adds cuda libs, so that's probably a better naming convention. I doubt too many people use rocker/r-ver:cuda10.1 directly so hopefully the change would not have much impact downstream.

@eitsupi
Copy link
Member Author

eitsupi commented Aug 10, 2021

I'm sorry my explanation is not clear.

First of all, docker-bake.json has the following structure, and no target with the same name can exist.

{
    "target": {
        "r-ver": {
        },
        "rstudio": {
        }
    }
}

And make-bakejson.R will transcribe the IMAGE of the stack files to the target in docker-bake.json.

name = purrr::map_chr(value, "IMAGE", .default = NA),

So, there should not be two r-ver in one stack file. (Actually, it does work because the R process suffixes it and makes it an alias.)
It is only the target name, and the name given to the built image is determined by make-stacks.R as included in this PR.

When merging them into one stack file, we can name one of them something other than r-ver (e.g. r-ver-cuda), but this will affect the image name defined in the compose files generated by write-compose.R.

Therefore, I think it is better to integrate stack files after docker buildx bake is available (at least for images that can be built with GitHub Actions) and docker-compose build is no longer necessary.
It is expected that we can use this PR to set the required tags in docker-bake.json, so it will probably be ready in a few days to use docker buildx bake.

we should probably change the image name there anyway, saying this is just r-ver with an additional tag added probably is not be the best way to describe that image -- rocker/cuda might be a more accurate name. Generally we use tags only to specify versions, not add content. that build adds cuda libs, so that's probably a better naming convention. I doubt too many people use rocker/r-ver:cuda10.1 directly so hopefully the change would not have much impact downstream.

Given the comparison with rocker/r-ver:X.Y.Z-cuda11.1, which is built from a different image, I think the current name is fine.
It certainly seems odd that it is based on rocker/r-ver:X.Y.Z, but for example buster-slim is buster with something removed, and I think it is common for images with different content to be represented by different tags.

@cboettig
Copy link
Member

Right, in other docker projects it's common to use tags for content as well as versions and many major projects have far more tag permutations than we do. I was just comparing to use in other places within rocker.

I do think things would be cleaner if rocker/r-ver:X.Y.Z-cuda10.1 was rocker/cuda:X.Y.Z-cuda10.1 and likewise for rocker/cuda:X.Y.Z-cuda11.1. That would also simplify the issue for buildx too, right?

@eitsupi
Copy link
Member Author

eitsupi commented Aug 10, 2021

I do think things would be cleaner if rocker/r-ver:X.Y.Z-cuda10.1 was rocker/cuda:X.Y.Z-cuda10.1 and likewise for rocker/cuda:X.Y.Z-cuda11.1. That would also simplify the issue for buildx too, right?

You're right.
It is also possible to set both rocker/r-ver:X.Y.Z-cuda10.1 and rocker/cuda:X.Y.Z-cuda10.1 in tags, so it might be a good idea to have them coexist (for a while).

@cboettig
Copy link
Member

nice, sounds like a plan

@eitsupi
Copy link
Member Author

eitsupi commented Aug 10, 2021

If you can create a rocker/cuda repository on DockerHub soon, I think I can merge the stack files into one in this PR, what do you think?

To be honest, it's hard to target multiple files when extracting the tag list using jq as described above, so having a single stack file makes it easier to report on published images and build images using buildx.

@cboettig
Copy link
Member

Yeah, I think that's the way to go. Simplifying the build chain whenever possible definitely helps maintenance, even with automation.

Ok, I've pushed a fresh local build of what we called rocker/r-ver:4.1.0-cuda10.1 up see, https://hub.docker.com/repository/docker/rocker/cuda. (Note that rocker/cuda actually already existed from several years ago in an experimental and un-maintained cuda9.0 / debian:stretch version on R 3.6.x, this push updated the image with the appropriate tags).

@eitsupi eitsupi changed the title Update for automated multi-tagging Update for automated multi-tagging and reducing the number of stack files Aug 11, 2021
@eitsupi
Copy link
Member Author

eitsupi commented Aug 11, 2021

The workflow seems work well on my fork. (not push, only build images and publish reports)
image

@eitsupi eitsupi marked this pull request as ready for review August 11, 2021 10:11
@cboettig
Copy link
Member

Oh, very nice! Great work.

So, thoughts on the images that are not part of stacks/X.Y.Z.json e.g. cuda-11.1, geospatial-dev-osgeo, geospatial-ubuntugis? For the moment these appear to be outside the workflow. (Perhaps this indicates a deeper challenge with these images, while they mostly do provide just different versions of the same thing, doing so in these images requires altered build logic/install scripts and not just diff env vars like we use elsewhere...)

I think I understand why you're organizing stack files by version; it makes sense from a workflow perspective to say "build everything in 4.1.1.json" rather than say "build the 4.1.1 version of core.json, geospatial.json, ...." But from a conceptual perspective grouping by version still feels awkward. I think my ideal setup would have a set of version-agnostic 'stack' files, core.json, geospatial.json, ml.json, with the version matrix either included in those or maybe declared as some versions.json file? Or is that crazy?

@eitsupi
Copy link
Member Author

eitsupi commented Aug 11, 2021

So, thoughts on the images that are not part of stacks/X.Y.Z.json e.g. cuda-11.1, geospatial-dev-osgeo, geospatial-ubuntugis? For the moment these appear to be outside the workflow. (Perhaps this indicates a deeper challenge with these images, while they mostly do provide just different versions of the same thing, doing so in these images requires altered build logic/install scripts and not just diff env vars like we use elsewhere...)

The reason why I didn't include them in X.Y.Z.json this time is simply because we didn't build them in the current workflow core.yml.
And, there are few common layers and the image size is large, so I couldn't be sure that we could build them in single workflow.
How about considering it for integration in the future?

I think I understand why you're organizing stack files by version; it makes sense from a workflow perspective to say "build everything in 4.1.1.json" rather than say "build the 4.1.1 version of core.json, geospatial.json, ...." But from a conceptual perspective grouping by version still feels awkward. I think my ideal setup would have a set of version-agnostic 'stack' files, core.json, geospatial.json, ml.json, with the version matrix either included in those or maybe declared as some versions.json file? Or is that crazy?

I used to think about that approach too. (https://github.com/eitsupi/r-ver/blob/cb6dde632287e04e8d36c82a6e153038b468d147/buildargs/versions.json)
However, if major changes are made, such as replacing the scripts used (#192 (comment)), versions.json will eventually grow larger and larger, and become closer and closer to the current X.Y.Z.json.
So now I'm thinking that the current format of a stackfile for each version, which leaves room for manual configuration, is fine.

@cboettig
Copy link
Member

@eitsupi that makes sense, thanks for explaining. Re the images outside core.yml, that makes perfect sense, I agree those images are probably too large and slow-building (geospatial images build packages from source since they are not using the pre-packaged ubuntu libs used in RSPM), and will probably always need to be in separate GH-Actions files.

Right, isolating versions in separate files does have the advantage of making it easier to modify the builds going forward without altering frozen images, which will no doubt become only more important as time passes. I'm on board :shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants