Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container tagging / naming #1

Closed
shykes opened this issue Jan 20, 2013 · 17 comments
Closed

Container tagging / naming #1

shykes opened this issue Jan 20, 2013 · 17 comments
Milestone

Comments

@shykes
Copy link
Contributor

shykes commented Jan 20, 2013

In previous conversations I think we agreed that the ability to tag containers would be nice, but there was no compelling reason to add it to the core. Especially with globally unique IDs, it's super easy for a user to store all the metadata he needs himself: users, applications, services, versions, source repository, source tag, whatever.

However there is one thing that is only possible in the core: atomic operations on a set of containers matching certain tags. In the future that might be a necessary feature, for a number of reasons:

a) Performance (to avoid running 200 duplicate commands for 200 containers)
b) Reliability (eg. less moving parts when coordinating many dockers)
c) Ease of development

I don't have a set opinion, but wanted to write this down for later discussion.

@shykes
Copy link
Contributor Author

shykes commented Mar 27, 2013

@synack and @progrium (among others) both asked for the ability to name containers - which seems like a special case of tagging.

So maybe we want this in the core after all?

@progrium
Copy link
Contributor

docker is a manager so it should play index. a pid file approach would suck (means containers have to know their id, be able to write it somewhere). and any third party tool would have to work with docker to figure out the id of every container somehow, and that's already a problem. but that problem would go away if you knew the way to reference a container before you made it.

@progrium
Copy link
Contributor

I was skeptical of tags. But I think containers should work similarly to images now. I don't know about the name "repository" but having a base name and optional tag would be nice to help look up ids. In theory, you should never have to work with ids. Imagine a PaaS and you named containers (using dotcloud terms) < user>/< app>/< server>.< instance> and then tags would be used for deployments. v1, v2, v3 ... they'd all have their own ids and all be unique containers.

@dstrctrng
Copy link

+1, names would let me autoconfigure based on convention, like memcacheXX containers would get written to a memcached.yml, or mysql-master, mysql-replica would get paired together without each knowing about the other. The devs using my containers would focus on the logical names instead of implementation details like IDs.

@jpetazzo
Copy link
Contributor

jpetazzo commented Apr 5, 2013

Some thoughts about naming, based on our experience at dotCloud...

EC2 uses tags, and doesn't have a way to name instances (an instance will be i-42ab69cd). Oh, right, you can give a Name tag, and it will show up in the console in the "Name" column; but you can't address instances by name and uniqueness is not enforced. In practice this is nice when you use the console, but overall a bit error-prone, because uniqueness is not enforced. On the dotCloud platform, we set the Name tag to be the FQDN of the instance, and we use meaningful names (admin-8, gateway-7, ...). This is very useful when working under pressure (e.g. when an outage happens), for the following reasons.

  1. It's easier to shout across the room "check foo-55, I think it's blowing up!" rather than "check i-42f0a... no wait... i-42f9a... oh crap" (yes, you can also copy paste on IRC/e-mail)
  2. Some well-known resources are "pinned" to specific instances. E.g. you can have a large cluster of MySQL instances, but know that your main MySQL database masterdb is on instances tagged db-1 and db-2. Most of the time, you don't care, because you have a resource location service that will tell you instantly the exact location of any resource; but when said service is down (and relies on said db), you're very happy to have some fallback.

For those reasons, we decided that the naming scheme on the dotCloud platform would be different; i.e. for a given scaled service, each service instance would get an assigned number (0,1,2...) and uniqueness would be enforced. This, however, has other shortcomings.

  1. Uniqueness is only local to a machine. Nothing prevents from having two identical containers on two different hosts; and then, of course, you're in trouble.
  2. Relying on that "hard-coded in your brain" mapping (evoked in the previous paragraph) doesn't work that well, neither. It's nice to know that your super important masterdb is on host db-1, but when it gets migrated, it doesn't work anymore, and you're back to square zero (e.g. having to parallel-ssh to your whole cluster to figure out where the database is, because the naming system is broken).

I assume that we can't / don't want to ensure global uniqueness of container names (that would require some distributed naming system, and suddenly a herd of zookeepers, doozers, and other weird beasts are hammering to gates to get in!); however, some way to easily find a container "when the sh_it hits the fan" would be really great. Picture the following scenario: you need to stop (or enter) a specific instance of a given service, but your global container db (maintained by you, outside of docker) is down. Let's hope that you have some way to locate the docker host running the instance you're looking for. Now, how do you locate the specific container easily (=not with a 4-lines obscure shell pipeline that takes you 5 minutes to grok correctly through SSH), quickly (=not by running a command on each of the hundreds or thousands of containers running on the machine), and reliably (=not yielding 4 false positives of down or unrelated containers before pointing to the right one)? *Even if docker's structures have been corrupted/messed with?_

Any solution to the last problem gets my immediate buy-in :-)

@shykes
Copy link
Contributor Author

shykes commented Apr 5, 2013

Hi Jerome, I'm not sure what you're asking or suggesting.

Names willl be a convenience for single-machine use. They should not be
used to name or lookup containers across multiple hosts.

On Friday, April 5, 2013, Jérôme Petazzoni wrote:

Some thoughts about naming, based on our experience at dotCloud...

EC2 uses tags, and doesn't have a way to name instances (an instance will
be i-42ab69cd). Oh, right, you can give a Name tag, and it will show up
in the console in the "Name" column; but you can't address instances by
name and uniqueness is not enforced. In practice this is nice when you use
the console, but overall a bit error-prone, because uniqueness is not
enforced. On the dotCloud platform, we set the Name tag to be the FQDN of
the instance, and we use meaningful names (admin-8, gateway-7, ...). This
is very useful when working under pressure (e.g. when an outage happens),
for the following reasons.

  1. It's easier to shout across the room "check foo-55, I think it's
    blowing up!" rather than "check i-42f0a... no wait... i-42f9a... oh crap"
    (yes, you can also copy paste on IRC/e-mail)
  2. Some well-known resources are "pinned" to specific instances. E.g. you
    can have a large cluster of MySQL instances, but know that your main MySQL
    database masterdb is on instances tagged db-1 and db-2. Most of the time,
    you don't care, because you have a resource location service that will tell
    you instantly the exact location of any resource; but when said service is
    down (and relies on said db), you're very happy to have some fallback.

For those reasons, we decided that the naming scheme on the dotCloud
platform would be different; i.e. for a given scaled service, each service
instance would get an assigned number (0,1,2...) and uniqueness would be
enforced. This, however, has other shortcomings.

  1. Uniqueness is only local to a machine. Nothing prevents from having two
    identical containers on two different hosts; and then, of course, you're in
    trouble.
  2. Relying on that "hard-coded in your brain" mapping (evoked in the
    previous paragraph) doesn't work that well, neither. It's nice to know that
    your super important masterdb is on host db-1, but when it gets migrated,
    it doesn't work anymore, and you're back to square zero (e.g. having to
    parallel-ssh to your whole cluster to figure out where the database is,
    because the naming system is broken).

I assume that we can't / don't want to ensure global uniqueness of
container names (that would require some distributed naming system, and
suddenly a herd of zookeepers, doozers, and other weird beasts are
hammering to gates to get in!); however, some way to easily find a
container "when the sh_it hits the fan" would be really great. Picture the
following scenario: you need to stop (or enter) a specific instance of a
given service, but your global container db (maintained by you, outside of
docker) is down. Let's hope that you have some way to locate the docker
host running the instance you're looking for. Now, how do you locate the
specific container easily (=not with a 4-lines obscure shell pipeline that
takes you 5 minutes to grok correctly through SSH), quickly (=not by
running a command on each of the hundreds or thousands of containers
running on the machine), and reliably (=not yielding 4 false positives of
down or unrelated containers before pointing to the right o ne)? *Even if
docker's structures have been corrupted/messed with?_

Any solution to the last problem gets my immediate buy-in :-)


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-15965238
.

@progrium
Copy link
Contributor

progrium commented Apr 5, 2013

Sometimes global uniqueness is just scoping. Put the host machine's name in
the name. That's why I was suggesting we use the hostname option to let you
specify a referenceable name.

IMHO, we shouldn't try to address global uniqueness. It should be something
the user can do on their own, augmented by other systems.

In your scenario, I would solve it with a better service discovery system.
One that doesn't have a central DB. Also one that is not necessarily Doozer
or Zookeeper.

On Fri, Apr 5, 2013 at 9:12 AM, Jérôme Petazzoni
notifications@github.comwrote:

Some thoughts about naming, based on our experience at dotCloud...

EC2 uses tags, and doesn't have a way to name instances (an instance will
be i-42ab69cd). Oh, right, you can give a Name tag, and it will show up
in the console in the "Name" column; but you can't address instances by
name and uniqueness is not enforced. In practice this is nice when you use
the console, but overall a bit error-prone, because uniqueness is not
enforced. On the dotCloud platform, we set the Name tag to be the FQDN of
the instance, and we use meaningful names (admin-8, gateway-7, ...). This
is very useful when working under pressure (e.g. when an outage happens),
for the following reasons.

  1. It's easier to shout across the room "check foo-55, I think it's
    blowing up!" rather than "check i-42f0a... no wait... i-42f9a... oh crap"
    (yes, you can also copy paste on IRC/e-mail)
  2. Some well-known resources are "pinned" to specific instances. E.g. you
    can have a large cluster of MySQL instances, but know that your main MySQL
    database masterdb is on instances tagged db-1 and db-2. Most of the time,
    you don't care, because you have a resource location service that will tell
    you instantly the exact location of any resource; but when said service is
    down (and relies on said db), you're very happy to have some fallback.

For those reasons, we decided that the naming scheme on the dotCloud
platform would be different; i.e. for a given scaled service, each service
instance would get an assigned number (0,1,2...) and uniqueness would be
enforced. This, however, has other shortcomings.

  1. Uniqueness is only local to a machine. Nothing prevents from having two
    identical containers on two different hosts; and then, of course, you're in
    trouble.
  2. Relying on that "hard-coded in your brain" mapping (evoked in the
    previous paragraph) doesn't work that well, neither. It's nice to know that
    your super important masterdb is on host db-1, but when it gets migrated,
    it doesn't work anymore, and you're back to square zero (e.g. having to
    parallel-ssh to your whole cluster to figure out where the database is,
    because the naming system is broken).

I assume that we can't / don't want to ensure global uniqueness of
container names (that would require some distributed naming system, and
suddenly a herd of zookeepers, doozers, and other weird beasts are
hammering to gates to get in!); however, some way to easily find a
container "when the sh_it hits the fan" would be really great. Picture the
following scenario: you need to stop (or enter) a specific instance of a
given service, but your global container db (maintained by you, outside of
docker) is down. Let's hope that you have some way to locate the docker
host running the instance you're looking for. Now, how do you locate the
specific container easily (=not with a 4-lines obscure shell pipeline that
takes you 5 minutes to grok correctly through SSH), quickly (=not by
running a command on each of the hundreds or thousands of containers
running on the machine), and reliably (=not yielding 4 false positives of
down or unrelated containers before pointing to the right o ne)? *Even if
docker's structures have been corrupted/messed with?_

Any solution to the last problem gets my immediate buy-in :-)


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-15965238
.

Jeff Lindsay
http://progrium.com

@jpetazzo
Copy link
Contributor

jpetazzo commented Apr 5, 2013

Well, I was merely explaining my use case (which is the use case
experienced by ops on the dotCloud platform): "sometimes, you need to be
able to find a given process easily and quickly; i.e. on which host it is,
and in which container it is exactly". In my use case, said "process" is
identified by an arbitrary tag which is set by the operator when the
process start (and in our specific case, this tag is the container
hostname, but it doesn't need to be).

Then there is a very wide spectrum of possible implementations.

  1. docker doesn't let you set a tag, so you have to retrieve the container
    ID when it starts, and store it in some database (zookeeper, redis, dns,
    whatever lets you do an easy mapping).
  2. docker lets you set a tag, and retrieve it as well, and indexes the tags
    locally. Now you don't need to store the full container ID anymore, just
    the host running your container.
  3. docker lets you set a tag, and the tags happen to be stored as flat
    files with a well-known layout (or something similar).

Notes:
(1) seems to be fine and well, until the central database is unavailable /
corrupted / inconsistent;
(2) makes things easier than (1) because as long as containers don't move
arbitrarily, your database doesn't change (and also, you can efficiently
use DNS to map containers to hosts);
(3) makes things easier than (2) when docker breaks, because finding a
container can be done with simple shell commands.

@brynary
Copy link

brynary commented Jun 17, 2013

Container tagging feels like a core concept to me, and seems like a building block for the "container groups" feature that @shykes mentioned.

@warpfork
Copy link
Contributor

warpfork commented Jul 8, 2013

Naming containers worries me. Even if the feature were implemented, I think that if I found myself tempted to use it, I'd suspect a bad smell and try to redesign to avoid using it.

Docker is powerful because it removes magic numbers from my life. It makes it possible for me to run, for example, two instances of nginx on my one host machine, without making a mess. Now what if I start working in a company that has a script that relies on one of their containers being named "nginx"? Or worse, "main"? Now try to set these up in jenkins or something that's trying to run selfcontained/concurrent tests. Whoops.

Maybe it's possible to build scripts around naming to make sure names are generated with unique suffixes to avoid that kind of problem... but I'm not sure it's reasonable to expect people to consistently do so, and even if it is done, then is that any better than just dealing with random IDs as they stand? If there was support for commands that run on sets of containers based on globbing of names, then I could see a potential gain, but otherwise, I see nothing.

@progrium
Copy link
Contributor

progrium commented Jul 8, 2013

@heavenlyhash so it sounds like you'd prefer tags.

@shykes
Copy link
Contributor Author

shykes commented Sep 7, 2013

Tentatively scheduling for 0.8

tianon added a commit that referenced this issue Oct 23, 2013
@shykes
Copy link
Contributor Author

shykes commented Oct 30, 2013

Just a heads up, this is confirmed for release in 0.6.5 tomorrow :)

@shykes
Copy link
Contributor Author

shykes commented Oct 31, 2013

This has been implemented in Docker 0.6.5.

@shykes shykes closed this as completed Oct 31, 2013
@jamescarr
Copy link
Contributor

👍

crosbymichael referenced this issue in crosbymichael/docker Nov 5, 2013
vieux pushed a commit that referenced this issue Nov 15, 2013
Add warning about SYS_BOOT capability with pre-3.4 kernels and pre-0.8 LXC.
trebonian pushed a commit to trebonian/docker that referenced this issue Jun 3, 2021
yousong pushed a commit to yousong/moby that referenced this issue Apr 27, 2022
yousong pushed a commit to yousong/moby that referenced this issue Apr 27, 2022
thaJeztah added a commit that referenced this issue Aug 21, 2023
I had a CI run fail to "Upload reports":

    Exponential backoff for retry #1. Waiting for 4565 milliseconds before continuing the upload at offset 0
    Finished backoff for retry #1, continuing with upload
    Total file count: 211 ---- Processed file #160 (75.8%)
    ...
    Total file count: 211 ---- Processed file #164 (77.7%)
    Total file count: 211 ---- Processed file #164 (77.7%)
    Total file count: 211 ---- Processed file #164 (77.7%)
    A 503 status code has been received, will attempt to retry the upload
    ##### Begin Diagnostic HTTP information #####
    Status Code: 503
    Status Message: Service Unavailable
    Header Information: {
      "content-length": "592",
      "content-type": "application/json; charset=utf-8",
      "date": "Mon, 21 Aug 2023 14:08:10 GMT",
      "server": "Kestrel",
      "cache-control": "no-store,no-cache",
      "pragma": "no-cache",
      "strict-transport-security": "max-age=2592000",
      "x-tfs-processid": "b2fc902c-011a-48be-858d-c62e9c397cb6",
      "activityid": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-tfs-session": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-vss-e2eid": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-vss-senderdeploymentid": "63be6134-28d1-8c82-e969-91f4e88fcdec",
      "x-frame-options": "SAMEORIGIN"
    }
    ###### End Diagnostic HTTP information ######
    Retry limit has been reached for chunk at offset 0 to https://pipelinesghubeus5.actions.githubusercontent.com/Y2huPMnV2RyiTvKoReSyXTCrcRyxUdSDRZYoZr0ONBvpl5e9Nu/_apis/resources/Containers/8331549?itemPath=integration-reports%2Fubuntu-22.04-systemd%2Fbundles%2Ftest-integration%2FTestInfoRegistryMirrors%2Fd20ac12e48cea%2Fdocker.log
    Warning: Aborting upload for /tmp/reports/ubuntu-22.04-systemd/bundles/test-integration/TestInfoRegistryMirrors/d20ac12e48cea/docker.log due to failure
    Error: aborting artifact upload
    Total file count: 211 ---- Processed file #165 (78.1%)
    A 503 status code has been received, will attempt to retry the upload
    Exponential backoff for retry #1. Waiting for 5799 milliseconds before continuing the upload at offset 0

As a result, the "Download reports" continued retrying:

    ...
    Total file count: 1004 ---- Processed file #436 (43.4%)
    Total file count: 1004 ---- Processed file #436 (43.4%)
    Total file count: 1004 ---- Processed file #436 (43.4%)
    An error occurred while attempting to download a file
    Error: Request timeout: /Y2huPMnV2RyiTvKoReSyXTCrcRyxUdSDRZYoZr0ONBvpl5e9Nu/_apis/resources/Containers/8331549?itemPath=integration-reports%2Fubuntu-20.04%2Fbundles%2Ftest-integration%2FTestCreateWithDuplicateNetworkNames%2Fd47798cc212d1%2Fdocker.log
        at ClientRequest.<anonymous> (/home/runner/work/_actions/actions/download-artifact/v3/dist/index.js:3681:26)
        at Object.onceWrapper (node:events:627:28)
        at ClientRequest.emit (node:events:513:28)
        at TLSSocket.emitRequestTimeout (node:_http_client:839:9)
        at Object.onceWrapper (node:events:627:28)
        at TLSSocket.emit (node:events:525:35)
        at TLSSocket.Socket._onTimeout (node:net:550:8)
        at listOnTimeout (node:internal/timers:559:17)
        at processTimers (node:internal/timers:502:7)
    Exponential backoff for retry #1. Waiting for 5305 milliseconds before continuing the download
    Total file count: 1004 ---- Processed file #436 (43.4%)

And, it looks like GitHub doesn't allow cancelling the job, possibly
because it is defined with `if: always()`?

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit that referenced this issue Aug 21, 2023
I had a CI run fail to "Upload reports":

    Exponential backoff for retry #1. Waiting for 4565 milliseconds before continuing the upload at offset 0
    Finished backoff for retry #1, continuing with upload
    Total file count: 211 ---- Processed file #160 (75.8%)
    ...
    Total file count: 211 ---- Processed file #164 (77.7%)
    Total file count: 211 ---- Processed file #164 (77.7%)
    Total file count: 211 ---- Processed file #164 (77.7%)
    A 503 status code has been received, will attempt to retry the upload
    ##### Begin Diagnostic HTTP information #####
    Status Code: 503
    Status Message: Service Unavailable
    Header Information: {
      "content-length": "592",
      "content-type": "application/json; charset=utf-8",
      "date": "Mon, 21 Aug 2023 14:08:10 GMT",
      "server": "Kestrel",
      "cache-control": "no-store,no-cache",
      "pragma": "no-cache",
      "strict-transport-security": "max-age=2592000",
      "x-tfs-processid": "b2fc902c-011a-48be-858d-c62e9c397cb6",
      "activityid": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-tfs-session": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-vss-e2eid": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-vss-senderdeploymentid": "63be6134-28d1-8c82-e969-91f4e88fcdec",
      "x-frame-options": "SAMEORIGIN"
    }
    ###### End Diagnostic HTTP information ######
    Retry limit has been reached for chunk at offset 0 to https://pipelinesghubeus5.actions.githubusercontent.com/Y2huPMnV2RyiTvKoReSyXTCrcRyxUdSDRZYoZr0ONBvpl5e9Nu/_apis/resources/Containers/8331549?itemPath=integration-reports%2Fubuntu-22.04-systemd%2Fbundles%2Ftest-integration%2FTestInfoRegistryMirrors%2Fd20ac12e48cea%2Fdocker.log
    Warning: Aborting upload for /tmp/reports/ubuntu-22.04-systemd/bundles/test-integration/TestInfoRegistryMirrors/d20ac12e48cea/docker.log due to failure
    Error: aborting artifact upload
    Total file count: 211 ---- Processed file #165 (78.1%)
    A 503 status code has been received, will attempt to retry the upload
    Exponential backoff for retry #1. Waiting for 5799 milliseconds before continuing the upload at offset 0

As a result, the "Download reports" continued retrying:

    ...
    Total file count: 1004 ---- Processed file #436 (43.4%)
    Total file count: 1004 ---- Processed file #436 (43.4%)
    Total file count: 1004 ---- Processed file #436 (43.4%)
    An error occurred while attempting to download a file
    Error: Request timeout: /Y2huPMnV2RyiTvKoReSyXTCrcRyxUdSDRZYoZr0ONBvpl5e9Nu/_apis/resources/Containers/8331549?itemPath=integration-reports%2Fubuntu-20.04%2Fbundles%2Ftest-integration%2FTestCreateWithDuplicateNetworkNames%2Fd47798cc212d1%2Fdocker.log
        at ClientRequest.<anonymous> (/home/runner/work/_actions/actions/download-artifact/v3/dist/index.js:3681:26)
        at Object.onceWrapper (node:events:627:28)
        at ClientRequest.emit (node:events:513:28)
        at TLSSocket.emitRequestTimeout (node:_http_client:839:9)
        at Object.onceWrapper (node:events:627:28)
        at TLSSocket.emit (node:events:525:35)
        at TLSSocket.Socket._onTimeout (node:net:550:8)
        at listOnTimeout (node:internal/timers:559:17)
        at processTimers (node:internal/timers:502:7)
    Exponential backoff for retry #1. Waiting for 5305 milliseconds before continuing the download
    Total file count: 1004 ---- Processed file #436 (43.4%)

And, it looks like GitHub doesn't allow cancelling the job, possibly
because it is defined with `if: always()`?

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit d6f340e)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit that referenced this issue Aug 21, 2023
I had a CI run fail to "Upload reports":

    Exponential backoff for retry #1. Waiting for 4565 milliseconds before continuing the upload at offset 0
    Finished backoff for retry #1, continuing with upload
    Total file count: 211 ---- Processed file #160 (75.8%)
    ...
    Total file count: 211 ---- Processed file #164 (77.7%)
    Total file count: 211 ---- Processed file #164 (77.7%)
    Total file count: 211 ---- Processed file #164 (77.7%)
    A 503 status code has been received, will attempt to retry the upload
    ##### Begin Diagnostic HTTP information #####
    Status Code: 503
    Status Message: Service Unavailable
    Header Information: {
      "content-length": "592",
      "content-type": "application/json; charset=utf-8",
      "date": "Mon, 21 Aug 2023 14:08:10 GMT",
      "server": "Kestrel",
      "cache-control": "no-store,no-cache",
      "pragma": "no-cache",
      "strict-transport-security": "max-age=2592000",
      "x-tfs-processid": "b2fc902c-011a-48be-858d-c62e9c397cb6",
      "activityid": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-tfs-session": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-vss-e2eid": "49a48b53-0411-4ff3-86a7-4528e3f71ba2",
      "x-vss-senderdeploymentid": "63be6134-28d1-8c82-e969-91f4e88fcdec",
      "x-frame-options": "SAMEORIGIN"
    }
    ###### End Diagnostic HTTP information ######
    Retry limit has been reached for chunk at offset 0 to https://pipelinesghubeus5.actions.githubusercontent.com/Y2huPMnV2RyiTvKoReSyXTCrcRyxUdSDRZYoZr0ONBvpl5e9Nu/_apis/resources/Containers/8331549?itemPath=integration-reports%2Fubuntu-22.04-systemd%2Fbundles%2Ftest-integration%2FTestInfoRegistryMirrors%2Fd20ac12e48cea%2Fdocker.log
    Warning: Aborting upload for /tmp/reports/ubuntu-22.04-systemd/bundles/test-integration/TestInfoRegistryMirrors/d20ac12e48cea/docker.log due to failure
    Error: aborting artifact upload
    Total file count: 211 ---- Processed file #165 (78.1%)
    A 503 status code has been received, will attempt to retry the upload
    Exponential backoff for retry #1. Waiting for 5799 milliseconds before continuing the upload at offset 0

As a result, the "Download reports" continued retrying:

    ...
    Total file count: 1004 ---- Processed file #436 (43.4%)
    Total file count: 1004 ---- Processed file #436 (43.4%)
    Total file count: 1004 ---- Processed file #436 (43.4%)
    An error occurred while attempting to download a file
    Error: Request timeout: /Y2huPMnV2RyiTvKoReSyXTCrcRyxUdSDRZYoZr0ONBvpl5e9Nu/_apis/resources/Containers/8331549?itemPath=integration-reports%2Fubuntu-20.04%2Fbundles%2Ftest-integration%2FTestCreateWithDuplicateNetworkNames%2Fd47798cc212d1%2Fdocker.log
        at ClientRequest.<anonymous> (/home/runner/work/_actions/actions/download-artifact/v3/dist/index.js:3681:26)
        at Object.onceWrapper (node:events:627:28)
        at ClientRequest.emit (node:events:513:28)
        at TLSSocket.emitRequestTimeout (node:_http_client:839:9)
        at Object.onceWrapper (node:events:627:28)
        at TLSSocket.emit (node:events:525:35)
        at TLSSocket.Socket._onTimeout (node:net:550:8)
        at listOnTimeout (node:internal/timers:559:17)
        at processTimers (node:internal/timers:502:7)
    Exponential backoff for retry #1. Waiting for 5305 milliseconds before continuing the download
    Total file count: 1004 ---- Processed file #436 (43.4%)

And, it looks like GitHub doesn't allow cancelling the job, possibly
because it is defined with `if: always()`?

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit d6f340e)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
thaJeztah added a commit that referenced this issue Mar 14, 2024
…f v1.5.4

full diffs:

- protocolbuffers/protobuf-go@v1.31.0...v1.33.0
- golang/protobuf@v1.5.3...v1.5.4

From the Go security announcement list;

> Version v1.33.0 of the google.golang.org/protobuf module fixes a bug in
> the google.golang.org/protobuf/encoding/protojson package which could cause
> the Unmarshal function to enter an infinite loop when handling some invalid
> inputs.
>
> This condition could only occur when unmarshaling into a message which contains
> a google.protobuf.Any value, or when the UnmarshalOptions.UnmarshalUnknown
> option is set. Unmarshal now correctly returns an error when handling these
> inputs.
>
> This is CVE-2024-24786.

In a follow-up post;

> A small correction: This vulnerability applies when the UnmarshalOptions.DiscardUnknown
> option is set (as well as when unmarshaling into any message which contains a
> google.protobuf.Any). There is no UnmarshalUnknown option.
>
> In addition, version 1.33.0 of google.golang.org/protobuf inadvertently
> introduced an incompatibility with the older github.com/golang/protobuf
> module. (golang/protobuf#1596) Users of the older
> module should update to github.com/golang/protobuf@v1.5.4.

govulncheck results in our code:

    govulncheck ./...
    Scanning your code and 1221 packages across 204 dependent modules for known vulnerabilities...

    === Symbol Results ===

    Vulnerability #1: GO-2024-2611
        Infinite loop in JSON unmarshaling in google.golang.org/protobuf
      More info: https://pkg.go.dev/vuln/GO-2024-2611
      Module: google.golang.org/protobuf
        Found in: google.golang.org/protobuf@v1.31.0
        Fixed in: google.golang.org/protobuf@v1.33.0
        Example traces found:
          #1: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls json.Decoder.Peek
          #2: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls json.Decoder.Read
          #3: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls protojson.Unmarshal

    Your code is affected by 1 vulnerability from 1 module.
    This scan found no other vulnerabilities in packages you import or modules you
    require.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
vvoland pushed a commit that referenced this issue May 9, 2024
…f v1.5.4

full diffs:

- protocolbuffers/protobuf-go@v1.31.0...v1.33.0
- golang/protobuf@v1.5.3...v1.5.4

From the Go security announcement list;

> Version v1.33.0 of the google.golang.org/protobuf module fixes a bug in
> the google.golang.org/protobuf/encoding/protojson package which could cause
> the Unmarshal function to enter an infinite loop when handling some invalid
> inputs.
>
> This condition could only occur when unmarshaling into a message which contains
> a google.protobuf.Any value, or when the UnmarshalOptions.UnmarshalUnknown
> option is set. Unmarshal now correctly returns an error when handling these
> inputs.
>
> This is CVE-2024-24786.

In a follow-up post;

> A small correction: This vulnerability applies when the UnmarshalOptions.DiscardUnknown
> option is set (as well as when unmarshaling into any message which contains a
> google.protobuf.Any). There is no UnmarshalUnknown option.
>
> In addition, version 1.33.0 of google.golang.org/protobuf inadvertently
> introduced an incompatibility with the older github.com/golang/protobuf
> module. (golang/protobuf#1596) Users of the older
> module should update to github.com/golang/protobuf@v1.5.4.

govulncheck results in our code:

    govulncheck ./...
    Scanning your code and 1221 packages across 204 dependent modules for known vulnerabilities...

    === Symbol Results ===

    Vulnerability #1: GO-2024-2611
        Infinite loop in JSON unmarshaling in google.golang.org/protobuf
      More info: https://pkg.go.dev/vuln/GO-2024-2611
      Module: google.golang.org/protobuf
        Found in: google.golang.org/protobuf@v1.31.0
        Fixed in: google.golang.org/protobuf@v1.33.0
        Example traces found:
          #1: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls json.Decoder.Peek
          #2: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls json.Decoder.Read
          #3: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls protojson.Unmarshal

    Your code is affected by 1 vulnerability from 1 module.
    This scan found no other vulnerabilities in packages you import or modules you
    require.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 1ca89d7)
Signed-off-by: Austin Vazquez <macedonv@amazon.com>
aepifanov added a commit to aepifanov/moby that referenced this issue May 17, 2024
…protobuf to v1.33.0

These vulnerabilities were found by govulncheck:

Vulnerability moby#1: GO-2024-2611
    Infinite loop in JSON unmarshaling in google.golang.org/protobuf
  More info: https://pkg.go.dev/vuln/GO-2024-2611
  Module: google.golang.org/protobuf
    Found in: google.golang.org/protobuf@v1.28.1
    Fixed in: google.golang.org/protobuf@v1.33.0
    Example traces found:
      moby#1: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls json.Decoder.Peek
      moby#2: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls json.Decoder.Read
      moby#3: daemon/logger/gcplogs/gcplogging.go:154:18: gcplogs.New calls logging.Client.Ping, which eventually calls protojson.Unmarshal

Vulnerability moby#2: GO-2023-2153
    Denial of service from HTTP/2 Rapid Reset in google.golang.org/grpc
  More info: https://pkg.go.dev/vuln/GO-2023-2153
  Module: google.golang.org/grpc
    Found in: google.golang.org/grpc@v1.50.1
    Fixed in: google.golang.org/grpc@v1.56.3
    Example traces found:
      moby#1: api/server/router/grpc/grpc.go:20:29: grpc.NewRouter calls grpc.NewServer
      moby#2: daemon/daemon.go:1477:23: daemon.Daemon.RawSysInfo calls sync.Once.Do, which eventually calls grpc.Server.Serve
      moby#3: daemon/daemon.go:1477:23: daemon.Daemon.RawSysInfo calls sync.Once.Do, which eventually calls transport.NewServerTransport

full diffs:
 - https://github.com/grpc/grpc-go/compare/v1.50.1..v1.56.3
 - https://github.com/protocolbuffers/protobuf-go/compare/v1.28.1..v1.33.0
 - https://github.com/googleapis/google-api-go-client/compare/v0.93.0..v0.114.0
 - https://github.com/golang/oauth2/compare/v0.1.0..v0.7.0
 - https://github.com/census-instrumentation/opencensus-go/compare/v0.23.0..v0.24.0
 - https://github.com/googleapis/gax-go/compare/v2.4.0..v2.7.1
 - https://github.com/googleapis/enterprise-certificate-proxy/compare/v0.1.0..v0.2.3
 - https://github.com/golang/protobuf/compare/v1.5.2..v1.5.4
 - https://github.com/cespare/xxhash/compare/v2.1.2..v2.2.0
 - https://github.com/googleapis/google-cloud-go/compare/v0.102.1..v0.110.0
 - https://github.com/googleapis/go-genproto v0.0.0-20230410155749-daa745c078e1
 - https://github.com/googleapis/google-cloud-go/compare/logging/v1.4.2..logging/v1.7.0
 - https://github.com/googleapis/google-cloud-go/compare/compute/v1.7.0..compute/v1.19.1

Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
thaJeztah added a commit that referenced this issue Jun 6, 2024
api: Make EnableIPv6 optional (impl #1 - pointer-based)
thaJeztah added a commit that referenced this issue Dec 19, 2024
contains a fix for CVE-2024-45338 / https://go.dev/issue/70906,
but it doesn't affect our codebase:

    govulncheck -show=verbose ./...
    Scanning your code and 1260 packages across 211 dependent modules for known vulnerabilities...
    ...
    Vulnerability #1: GO-2024-3333
        Non-linear parsing of case-insensitive content in golang.org/x/net/html
      More info: https://pkg.go.dev/vuln/GO-2024-3333
      Module: golang.org/x/net
        Found in: golang.org/x/net@v0.32.0
        Fixed in: golang.org/x/net@v0.33.0

    Your code is affected by 0 vulnerabilities.
    This scan also found 0 vulnerabilities in packages you import and 1
    vulnerability in modules you require, but your code doesn't appear to call these
    vulnerabilities.

full diff: golang/net@v0.32.0...v0.33.0

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants