Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate reducing the count of apt sources on Hosted Ubuntu images #2951

Closed
maxim-lobanov opened this issue Mar 16, 2021 · 15 comments
Closed
Assignees
Labels
investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Ubuntu

Comments

@maxim-lobanov
Copy link
Contributor

maxim-lobanov commented Mar 16, 2021

Description
Currently, Hosted Ubuntu images have pretty huge list of apt sources:

azure.archive.ubuntu.com
dl.google.com
ppa.launchpad.net
security.ubuntu.com
apt.postgresql.org
cli-assets.heroku.com
dl.bintray.com
dl.yarnpkg.com
download.mono-project.com
download.opensuse.org
packagecloud.io
packages.cloud.google.com
packages.microsoft.com
repo.mongodb.org
storage.googleapis.com

It causes two types of problems:

  • apt update takes much time
  • apt update fails if one of the sources is not available. Some community-managed sources can be unavailable and it will break a lot of user builds.

Ideally, we should keep minimal number of sources on images:

azure.archive.ubuntu.com
security.ubuntu.com
ppa.launchpad.net

We should consider reducing the list of sources to improve reliability of apt update.

What should be investigated / considered for every source:

  • How many packages are come from the source? Can we get rid of it and install tool without apt?
  • Can we remove apt source after tool installation? What impact will it cause?
  • What kind of communication do we need if we remove the source from image?
  • If we remove source after tool installation - should we document it somehow if customers would like to add it in runtime?

I don't expect removing these source in scope of this issue. Let's start with investigation and share investigation results.

Related issue: #2919

@maxim-lobanov maxim-lobanov changed the title Investigate removing some apt sources on Hosted Ubuntu images Investigate reducing the count of apt sources on Hosted Ubuntu images Mar 16, 2021
@maxim-lobanov maxim-lobanov added investigate Collect additional information, like space on disk, other tool incompatibilities etc. and removed needs triage labels Mar 16, 2021
@catthehacker
Copy link
Contributor

catthehacker commented Mar 16, 2021

Shouldn't packages.microsoft.com be left intact? It provides moby-engine, moby-cli, moby-containerd and moby-runc since Docker licence doesn't allow usage of their own distributed binaries on GHA.

postgresql doesn't seem to have any recent versions anywhere except Ubuntu default repos (pgsql12 for 20.04, pgsql10 for 18.04, pgsql95 for 16.04). Latest version at launchpad is 9.5.

heroku cli client source-only is available at https://github.com/heroku/cli and binaries at snapcraft or their official repo.

buildah seems to only be available from opensuse repos, source code in https://github.com/opencontainers/buildah.

@oberstet
Copy link

All of those PPAs should be removed IMO. Only official Ubuntu sources, that is apt origins also present in the official Ubuntu distro should be present.

  • Recently, our CI was broken twice by TLS certificate problems related to PPAs (https://support.github.com/ticket/personal/0/1064630):
    • dl.bintray.com
    • download.opensuse.org
  • Having PPAs, and such a big list, is a major security issue and attack surface. Our CIs runs on a box that blindly updates from "bintray" and a dozen others? wtf, really?
  • if I need a PPA, I can add it in a workflow. under my control.

I would highly appreciate if you'd offer eg an ubuntu-20.04-vanilla that does not add anything beyond what Ubuntu ships with.

@dsame
Copy link
Contributor

dsame commented Mar 17, 2021

dl.google.com - google-chrome-stable - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/google-chrome.sh

ppa.launchpad.net - haskel,ghc - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/haskell.sh

apt.postgresql.org - postgresql - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/postgresql.sh

cli-assets.heroku.com - heroku - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/heroku.sh

dl.bintray.com - sbt - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/sbt.sh

dl.yarnpkg.com - yarn - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/nodejs.sh

download.mono-project.com - xamarin - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/android.sh

download.opensuse.org - (podman buildah skopeo) - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/containers.sh

packagecloud.io - git-lfs - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/git.sh

packages.cloud.google.com - google-cloud-sdk, kubectl - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/kubernetes-tools.sh,https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/google-cloud-sdk.sh,

packages.microsoft.com - moby*, azure-cli, dotnet-, odbc, - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/docker-moby.sh, https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/azure-cli.sh, https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/dotnetcore-sdk.sh, images/linux/scripts/installers/mssql-cmd-tools.sh

repo.mongodb.org - mongodb - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/mongodb.sh

storage.googleapis.com - bazel - https://github.com/actions/virtual-environments/blob/main/images/linux/scripts/installers/bazel.sh

@sirosen
Copy link

sirosen commented Apr 14, 2021

Has there been further action on this?

The workaround I have been using just started failing (bash -x snippet):

+ sudo rm /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
rm: cannot remove '/etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list': No such file or directory

Which is fine. I'll add -f until I have clarity on what's going on.

So... what is going on? Am I getting broken VMs or was there a change within the last 24-hours?
I don't see any relevant merged PRs in this repo in that window.


Aside: I don't fully agree with treating this as a major security issue, but nor do I disagree.

Installing packages from the public Internet involves a huge amount of trust.
There's an argument that the moment you install and run mongodb, you might as well trust repo.mongodb.org.

Counterpoint: we should follow the news. PHP development just moved to GitHub because running their own git servers was an unnecessary point of vulnerability.
I think GitHub should remove as many of these as can reasonably be removed. Do we need to trust bintray.com or yarnpkg.com? If they can be removed without causing tremendous issues, they should be removed.

@catthehacker
Copy link
Contributor

Has there been further action on this?

Yes

The workaround I have been using just started failing (bash -x snippet):

+ sudo rm /etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list
rm: cannot remove '/etc/apt/sources.list.d/devel:kubic:libcontainers:stable.list': No such file or directory

apt source lists were removed

So... what is going on? Am I getting broken VMs or was there a change within the last 24-hours?
I don't see any relevant merged PRs in this repo in that window.

It was done here: #3077

I think GitHub should remove as many of these as can reasonably be removed. Do we need to trust bintray.com or yarnpkg.com? If they can be removed without causing tremendous issues, they should be removed.

It's ongoing process which takes time to properly replace sources

@maxim-lobanov
Copy link
Contributor Author

@sirosen, I think this workaround is not needed anymore because apt source was removed.

@sirosen
Copy link

sirosen commented Apr 14, 2021

Awesome! Thanks so much for the speedy replies!

I just wanted to know what was up. Without activity on this issue, and given that it takes time from the PR merging until the user (me) sees a change, it wasn't clear if this work was active or stalled.

@maxim-lobanov
Copy link
Contributor Author

After these all changes deployed, the list of repos should be:

http://azure.archive.ubuntu.com
http://ppa.launchpad.net
http://security.ubuntu.com
https://apt.kubernetes.io
https://download.mono-project.com
https://packages.cloud.google.com
https://packages.microsoft.com

Significantly shorter the initial list in first issue. We will continue work to cut it more

@maxim-lobanov
Copy link
Contributor Author

Hello everyone!

Posting the new update based on the image that we will deploy next week.

  • ubuntu-20.04 or ubuntu-latest:
http://azure.archive.ubuntu.com/ubuntu
http://security.ubuntu.com/ubuntu
https://packages.microsoft.com/ubuntu/20.04/prod
http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu
  • ubuntu-18.04:
http://azure.archive.ubuntu.com/ubuntu
http://security.ubuntu.com/ubuntu
https://packages.microsoft.com/ubuntu/18.04/prod
http://ppa.launchpad.net/ondrej/php/ubuntu
http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu
  • ubuntu-16.04
http://azure.archive.ubuntu.com/ubuntu
http://security.ubuntu.com/ubuntu
https://packages.microsoft.com/ubuntu/16.04/prod
http://ppa.launchpad.net/ondrej/php/ubuntu
http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu
http://ppa.launchpad.net/ansible/ansible/ubuntu
http://ppa.launchpad.net/mercurial-ppa/releases/ubuntu
https://esm.ubuntu.com/infra/ubuntu

(Potentially, we also can remove http://ppa.launchpad.net/ansible/ansible/ubuntu and http://ppa.launchpad.net/mercurial-ppa/releases/ubuntu and https://esm.ubuntu.com/infra/ubuntu from 16.04 but according to #3287 Ubuntu 16.04 is going to be deprecated in 3 months so we have decided to leave it)

Some information about remain repos:

Initially, there were about 20 apt repos and this number was reduced to 4 repos on Ubuntu 20.04 that looks like a good improvement for image reliability. I will close this issue since we have decided to leave remain repos on images for now.

@oberstet
Copy link

Initially, there were about 20 apt repos and this number was reduced to 4 repos on Ubuntu 20.04 that looks like a good improvement for image reliability.

fantastic! this is highly appreciated. thanks a lot!

of the remaining 6 repositories, only 2 are published by Canonical, one of these is "Ubuntu" itself, the other I suspect (but don't know for sure) is specifically for Ubuntu on Azure (hypervisor stuff and such.

I will close this issue since we have decided to leave remain repos on images for now.

For a regular Ubuntu user, the other 4 repos are not needed, but a security risk:

  • the Microsoft repo: why would I need that as an Ubuntu/Linux user?
  • those 2 PPAs: huh, scary. certainly unneeded. "customers" (I guess MS, not original GitHub) should fix their shit instead of the rest of the userbase being forced to accept the security risks of some random PPA. this is reversing incentives.

It's a pity that GitHub was bought by MS. Now we all have to pay a price because "MS customers" can't get their CI/CD pipelines right (as in, not depend on random external PPAs or even use PHP in the first place;)

anyways, IMO closing this issue (without a follow up one) is the wrong action - it will only get worse. at the very least, GitHub users should be warned in the docs that they do not get a plain vanilla Ubuntu experience / security profile. but again, thanks @maxim-lobanov fpr your heroic efforts and progress!

@oberstet
Copy link

https://packages.microsoft.com/ubuntu/20.04/prod - is official Microsoft repository and shouldn't be removed

it is broken (404), with resulting fallout: https://www.theregister.com/2021/06/17/microsoft_packages_404/

why is Microsoft forcing users to reference and try-update-fail on PPAs that are non-standard, unneeded and might open additional attack surfaces?

why would I care that the (broken) PPA is "MS official" when I'm using Ubuntu and the PPA is not "Canonical official"?

@catthehacker
Copy link
Contributor

the other I suspect (but don't know for sure) is specifically for Ubuntu on Azure (hypervisor stuff and such.

Docker, .NET, PowerShell, MSSQL, etc.

For a regular Ubuntu user, the other 4 repos are not needed, but a security risk:

There is no such thing as "regular Ubuntu user"

why is Microsoft forcing users to reference and try-update-fail on PPAs that are non-standard, unneeded and might open additional attack surfaces?

Because there is nothing standard here, GitHub Actions (as per documentation) is a platform that is managed by staff so that users don't have to do it themselves.

why would I care that the (broken) PPA is "MS official" when I'm using Ubuntu and the PPA is not "Canonical official"?

Because you are using ubuntu-* GitHub Actions runner which uses Ubuntu as underlying base, not plain Ubuntu.
If you want just Ubuntu, run your own self-hosted runner.

  • those 2 PPAs: huh, scary. certainly unneeded. "customers" (I guess MS, not original GitHub) should fix their shit instead of the rest of the userbase being forced to accept the security risks of some random PPA. this is reversing incentives.

It's a pity that GitHub was bought by MS. Now we all have to pay a price because "MS customers" can't get their CI/CD pipelines right (as in, not depend on random external PPAs or even use PHP in the first place;)

Why is it my responsibility to add Microsoft repository and not your responsibility to remove it during a workflow?
Maybe you should fix your "shit"?

anyways, IMO closing this issue (without a follow up one) is the wrong action - it will only get worse. at the very least, GitHub users should be warned in the docs that they do not get a plain vanilla Ubuntu experience / security profile. but again, thanks @maxim-lobanov fpr your heroic efforts and progress!

Maybe you will understand if I use Linux people language: RTFM!
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#preinstalled-software

I'm sure that there are more people who are happy when most of their stuff works out of the box thanks to the tools pre-installed in GHA runners than there are people who bitch about it.

@oberstet
Copy link

oberstet commented Jun 19, 2021

There is no such thing as "regular Ubuntu user"

Ok, fair enough! Let me rephrase: none of those MS repos is default (== comes in a Canonical install) or necessary to install or use Ubuntu.

Because there is nothing standard here

Sure. Standard == everything, but only that (iow: exactly) all software sources that come with an official Canonical distro. Last time I looked, that did not include those MS PPAs.

If you want just Ubuntu, run your own self-hosted runner.

Yes, we're doing that for some repos already, but it is a lot of work, since historically, we had everything on hosted runners.

Maybe you will understand if I use Linux people language: RTFM!

I did read the manual - when GitHub was still an independent company. Then MS decided to change stuff. and sure, I missed the announcement (I assume there was one) that new PPAs would be added to the hosted images. My bad.

My whole point is: if I use an image "ubuntu", I expect it to expose me to Canonical SW upstream only - but not MS. if there would be a separate "ubuntu-microsoft", that would make everything crystal clear and non-controversial. anyways, just my 2cts. I very much appreciate the work that Maxim did on clearing up the set of PPAs!

I'm sure that there are more people who are happy when most of their stuff works out of the box thanks to the tools pre-installed in GHA runners than there are people who bitch about it.

I doubt that, but in my perspective, the problem is the unwanted additional attack / problem surface. a price paid by all other users (ones that don't need any MS stuff).

@sirosen
Copy link

sirosen commented Jun 19, 2021

I don't really need my inbox spammed just because someone read an El Reg article about an outage. If you just want to complain about the fact that GitHub was bought by Microsoft, please just open another issue so that someone from GitHub can close it with prejudice.

This is the price you pay when you use a CI service, rather than running your own Jenkins box or whatever. You don't have full control over the platform.

They definitely need to have docker installed to make the platform work ("services" and docker actions), and doing it from a Microsoft repo is a very reasonable choice. Use of that one repo is not a part of the prior issue that there were simply too many repos, many of them maintained neither by Canonical nor by Microsoft.
One of the first comments on this issue noted why the Microsoft repo is needed.

The GitHub team has handled this issue in pretty much the best way I can imagine, removing as many sources as they have found to be possible. I'm super happy with everything about how this went down. ... Except for the part where I get irrelevant mail months after the fact because someone disagrees with the decisions they made on what to keep. I'm unsubscribing from this issue so that doesn't happen again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigate Collect additional information, like space on disk, other tool incompatibilities etc. OS: Ubuntu
Projects
None yet
Development

No branches or pull requests

6 participants