Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

we should consider removing build tools from Helix images #805

Open
wfurt opened this issue Feb 16, 2023 · 8 comments
Open

we should consider removing build tools from Helix images #805

wfurt opened this issue Feb 16, 2023 · 8 comments
Assignees

Comments

@wfurt
Copy link
Member

wfurt commented Feb 16, 2023

I somewhat stumbled on this one while running out of space on my local machine. I noticed that most of the helix images are pretty large. For example Ubuntu 18 -> 1.5G

$ docker image ls
mcr.microsoft.com/dotnet-buildtools/prereqs         ubuntu-18.04-helix-amd64-20230216023557-4443d0d   6d7d1241e2c6   7 seconds ago   1.52GB

It seems like many of the images are pulling build tools directly or indirectly by relaying on the prereq base image.
And in both cases they seems to be geared toward building runtime not executing tests. This may be for historical reasons.

I did little fiddling (https://github.com/dotnet/dotnet-buildtools-prereqs-docker/compare/ubu18?expand=1) and I can get the image to about 1/3

mcr.microsoft.com/dotnet-buildtools/prereqs         ubuntu-18.04-helix-amd64-20230216060058-4443d0d   07e33080f83d   5 seconds ago    593MB

even less with restricted locale

mcr.microsoft.com/dotnet-buildtools/prereqs         ubuntu-18.04-helix-amd64-20230216061525-4443d0d   c2d0fb3b5d9c   52 seconds ago   463MB

is the ~ 1G per image saving interesting and worth of the troubles @MattGal @mthalman ?
I'm reasonably confident that the resulting image would be capable of running runtime tests.
But I don't have really visibility to other repos and any historical reasons.
If we decouple the dependency on the base local image we may see more duplication on Dockerfile - but probably not too much.

@mthalman
Copy link
Member

mthalman commented Feb 16, 2023

cc @ChadNedzlek

1 GB of space is significant and would help overall with the overall storage costs associated with the images we produce.

@MattGal
Copy link
Member

MattGal commented Feb 16, 2023

No objection to installing less stuff, other than to point out the obvious that there will be some teething pains in getting the new list correct and some tests will fail.

To my knowledge, most of the "why the test images look like the build images" is just an artifact of folks being new to using docker and there being only that list of dependencies available back when the images were created.

That said, why would you even have a ubuntu-18.04-helix-amd64... image? Even with smaller images it would be significantly more performant to just run on the ubuntu.1804.amd64.* helix "normal" queues. Docker should be for images we don't already have readily available in Helix, since you don't have to do a docker pull to have them. If it's about needing things like msquic installed, we can add artifacts (and indeed, the work going on right now is to unify how this works) to make the non-Docker versions the way you want.

@wfurt
Copy link
Member Author

wfurt commented Feb 16, 2023

I agree with the point @MattGal. There will be some pain so the question is if it would be worth it. (as guess)
And if we feel like it, it may need coordination beyond runtime.

I pick ubuntu-18.04 as an example. But the other Helix images also look somewhat large:

mcr.microsoft.com/dotnet-buildtools/prereqs         alpine-3.17-helix-amd64-20230210201636-609d24f    67b176d4e97d   5 days ago       2.1GB

runtime tests do consume many docker images and I'm not sure if there is desire (or possibility) to replace all of them with full queues. And I don't have visibility to operational and maintenance cost. While the Docker is not quite the same as it lacks matching Kernel I do like the ease of updates and using docker to investigate test failures.

I would like to reach consensus before jumping to sweeping cleanup.
(We may do it opportunistically as we onboard now OS versions)

@ChadNedzlek
Copy link
Member

The "ease of updates" is going to go away with some new changes, as they are going to be managed identically to the VM images Matt's talking about, but investigating failures is an interesting bit. The VM images can also be used to spin up an Azure VM, so it's still possible to use, and we could potentially work at making that easier.

@wfurt
Copy link
Member Author

wfurt commented Feb 17, 2023

making updates more difficult is ... interesting.
I personally see problem with the VMs as:

  • takes long time to start
  • adds additional cost
  • does not allow direct access via VPN
  • lacks development support (for example, I can easily map new artifacts from my dev machine to docker as well as share tools and scrips)

@MattGal
Copy link
Member

MattGal commented Feb 17, 2023

I personally see problem with the VMs as:

  • takes long time to start

Taking longer for "plain" helix VMs to start than docker ones is impossible. Our "docker hosts" are the plain Helix VMs, if you send work to ubuntu.1804.amd64.open@some-ubuntu-1804-dockertag, you are literally first spinning up an ubuntu 18.04 machine, doing all the same steps as a normal helix work item, then downloading all the layers of the docker image, then starting the container. There's no way this is faster than the same thing minus two steps.

  • adds additional cost

Again I disagree. The docker scenario slows things down for the above reasons, VMs cost money per hour, so even though the connection to Microsoft container registry from within the data center is fast and free, the extra time of doing a docker Helix work item definitely costs more than the time not doing it.

  • does not allow direct access via VPN

I can try to poke more on this with the DDFUN folks. I agree that the repro machines not being available off corpnet is not an inclusive behavior for our remote coworkers.

  • lacks development support (for example, I can easily map new artifacts from my dev machine to docker as well as share tools and scrips)

No argument here, but you do still have to merge a pull request and wait in both cases to change test agent behavior.

@wfurt
Copy link
Member Author

wfurt commented Feb 17, 2023

I'm talking about developer experience @MattGal. I have no visibility to the operational part. I can run container on my laptop as long as I want to and start/restart is fast. And no emails about running machines and cost saving.

It is also easy to prototype and test changes.

@MattGal
Copy link
Member

MattGal commented Feb 17, 2023

Ah. Yeah no disagreement there, prototyping is a Docker strong point

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

4 participants