Update docker client version to 18.06.3 #4430

style95 · 2019-04-05T10:18:11Z

Description

With this change, it is required to install Docker=18.06.3 on target servers.
This is a breaking change.

Related issue and scope

I opened an issue to propose and discuss this change (Update docker version #4336)

My changes affect the following components

Types of changes

Bug fix (generally a non-breaking change which closes an issue).
Enhancement or new feature (adds new functionality).
Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

I signed an Apache CLA.
I reviewed the style guides and followed the recommendations (Travis CI will check :).
I added tests to cover my changes.
My changes require further changes to the documentation.
I updated the documentation where necessary.

style95 · 2019-04-05T10:19:01Z

Todo:

Update controller docker cli version ~~there is no docker cli in the controller~~ There "is" docker cli for lean deployment.
performance comparison
Update documents accordingly
Update Travis build configuration accordingly

style95 · 2019-04-11T03:30:03Z

In the latest docker, runc does not show any running containers which are created by docker.
So container is not paused by runc and just removed upon error(container not found).

so I am looking into the way to utilize runc.

style95 · 2019-04-11T09:28:20Z

The issue was previously container information was stored in /run/runc but now it is stored in /run/docker/runtime-runc/moby.

After changing mounting directory, it seems working as expected.

style95 · 2019-04-12T07:57:40Z

I have discussed this issue with @sven-lange-last.
Let me share the results here.

Generally, it takes some times to upgrade docker version for many reasons.
So it would require some time for downstream to use the latest docker version in their production.
But at the same time CVE should be properly handled.
This CVE, https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-5736 requires docker >=18.06.2

So if we upgrade the docker version to 18.06.x rather than 18.09, it would give more time to downstream at the same time resolve the CVE.

style95 · 2019-04-12T11:36:00Z

I did benchmarking to comparison any performance issue in docker-18.06.3.
It seems there is no big performance difference.

17.05: 12,500 TPS
18.06: 12,800 TPS

For more information, you can refer to this.
https://cwiki.apache.org/confluence/display/OPENWHISK/Docker+version+comparison

I think it would be good to go with docker-18.06.3

rabbah

LGTM - I think there's a readme/doc we have to update also regarding the docker version.

rabbah

https://github.com/apache/incubator-openwhisk/blob/e588323fc38c15c6e0d900e5cadd33e82bf97e3c/tools/macos/docker-machine/README.md#L30

and

https://github.com/apache/incubator-openwhisk/blob/e588323fc38c15c6e0d900e5cadd33e82bf97e3c/tools/macos/README.md#L29

rabbah

I think you need a similar change in the core/controller/Dockerfile as well for the "lean" deployment.

style95 · 2019-04-12T11:55:24Z

@rabbah
I readded WIP in the title.
Actually there are still few things to handle: #4430 (comment)

I will cover what you mentioned as well

markusthoemmes · 2019-04-15T08:01:01Z

So it would require some time for downstream to use the latest docker version in their production.

We should really try to decouple from this. Sure, downstream will always be slower than upstream but we should not need to hold back on innovation and advancement because of that fact.

Is there any possibility for us to have different versions of DockerClient that flatten out the differences and make it a deployment/configuration concern which one is used? We could have a deprecation period for older clients to clean up after some time.

style95 · 2019-04-15T08:15:09Z

I think we need to differentiate the build step of the invoker Dockerfile to achieve that.
It is mainly related to the docker version inside the invoker.

Regarding the downstream version, I will defer to @sven-lange-last as he wanted to make sure this PR does not break IBM production system.

As the binary name of runc is changed from docker-runc to runc in docker-18.09, if we decide to upgrade to 18.09, that should be addressed as well.

rabbah · 2019-04-15T09:31:53Z

I like the suggestion from Markus. Did I understand correctly that the conditioning is relatively easier for ansible than the dockerfile?

markusthoemmes · 2019-04-15T09:50:53Z

The Dockerfile only ships docker binaries, right? How much overhead would it be to have multiple versions inside of the Dockerfile that we simply pick through configuration?

style95 · 2019-04-15T10:20:55Z

@markusthoemmes
With the assumption that we will move forward with 18.09 and just including 18.06 for the deprecation period, docker-18.06 would add about 43MB(36 + 7.2 MB) of overhead.

$ ll -h docker-18.06
total 141M
drwxrwxr-x 2 style95 style95 4.0K Feb 20 11:26 ./
drwxr-xr-x 4 style95 style95 4.0K Apr 15 19:03 ../
-rwxr-xr-x 1 style95 style95  36M Feb 20 11:26 docker*
-rwxr-xr-x 1 style95 style95  26M Feb 20 11:26 docker-containerd*
-rwxr-xr-x 1 style95 style95  15M Feb 20 11:26 docker-containerd-ctr*
-rwxr-xr-x 1 style95 style95 4.0M Feb 20 11:26 docker-containerd-shim*
-rwxr-xr-x 1 style95 style95  51M Feb 20 11:26 dockerd*
-rwxr-xr-x 1 style95 style95 747K Feb 20 11:26 docker-init*
-rwxr-xr-x 1 style95 style95 2.8M Feb 20 11:26 docker-proxy*
-rwxr-xr-x 1 style95 style95 7.2M Feb 20 11:26 docker-runc*

@rabbah
Let me clarify the issues.
There are the following changes.

Docker 1.12 -> 18.06 or higher

runc no more use default(/run/runc) directory as a root directory.
It uses /run/docker/runtime-runc/moby.

Docker 18.06 -> 18.09

The name of runc is changed from docker-runc to runc.

IMHO, there are few options to have multiple versions of docker, but I think it would be easier to configure the version at the deployment time.

For example, we can have both versions of docker binaries in the invoker as @markusthoemmes suggested.
It will have two binaries, docker-runc and just runc.
Based on the configuration, DockerClient(RuncClient) will selectively look for one of them.
(Currently, it is hardcoded to /usr/bin/docker-runc)

Keeping docker-1.12

Besides of the above things, If we want to keep docker-1.12, we need to differentiate the ansible script based on the version flag as well.
The mounting directory for runc will vary between /run/runc and /run/docker/runtime-runc/moby.

markusthoemmes · 2019-04-15T10:27:17Z

I see, thanks for the writeup. It looks like moving to 18.06 introduces the least friction for now and makes all parties kinda happy?

If so, it's reasonable to make that move first and think about unblocking future upgrade later.

sven-lange-last · 2019-05-07T10:47:14Z

@style95 thanks a lot for providing this PR - very thorough work. Particularly, the performance tests are helpful for deciding whether the update to a newer Docker version is a feasible.

Some thoughts around the transient data directory docker-runc is using for containers:

When Docker 18.06 uses docker-runc, directory /run/docker/runtime-runc/moby is used as base directory.
docker-runc itself has different default directories that are used when not specifying the --root option. These are /run/runc for root (uid 0) and /run/user/<uid>/runc for non-root users.
Apparently, all uses of docker-runc need to make sure that they use the same data directory as Docker - otherwise docker-runc won't be able to work on the containers created by Docker before.
In ansible/roles/invoker/tasks/clean.yml, you explicitly use the --root option to make sure the proper directory is used.
You have no changes in the Scala RuncClient used by the invoker. (BTW - the comment is wrong.)
https://github.com/apache/incubator-openwhisk/blob/c68fc94212009d3a5cd98d6ea7e06482272c487b/core/invoker/src/main/scala/org/apache/openwhisk/core/containerpool/docker/RuncClient.scala#L56-L64
The Scala RuncClient works because you always mount the (configurable) transient data directory location to /run/runc in file ansible/roles/invoker/tasks/deploy.yml and docker-runc is run by root user.
What happens if the invoker process is run by a non-root user? Won't docker-runc invoked by Scala RuncClient then fall back to the default transient data directory, i.e. /run/user/<uid>/runc?
I do not know the details of lean setup - but what happens in other setup variants? Do we always make sure that the proper transient data directory location is available at /run/runc when running Scala RuncClient? Which user runs the process?
Can we make things more robust by also using the --root /run/runc option in Scala RuncClient?

sven-lange-last · 2019-05-07T13:27:19Z

@style95 concerning the failing Jenkins test:

Jenkin machine is using Docker-17.05 so it uses /run/runc as a root directory.
But this PR will use /run/docker/runc-runtime/moby as it is used in Docker-18.06.
So ConcurrencyTests are failing because it could not find the container.

Wouldn't it be possible to work around this problem by setting invoker_runcdir: "/run/runc" in the Ansible environment for Jenkins?

sven-lange-last · 2019-05-07T13:51:34Z

tests/src/test/scala/ha/ShootComponentsTests.scala


-    Seq(docker, "--host", host + ":" + dockerPort, cmd, component)
+    Seq(docker, cmd, component)


How is the test supposed to work if there is more than one controller running and the controllers run on different systems?

The original test code determines which host is running the first controller and would ask the remote host's Docker daemon to start, stop and restart the controller container. With the change, the test permanently fails in our test environment that uses multiple systems in parallel.

I just want to completely disable remote Docker API from the code as we don't use it anymore in our normal use cases.
It seems remote docker API is only required for test cases.

We can selectively enable/disable remote API based on the deployment type such as
production or unit tests.
But it depends on the OW operator, there is still a possibility that Docker remote API is enabled without consciousness. And it could induce a security breach.

How about making this test rely on ssh command instead?
Then what we need is just to make ssh connection available between target hosts and machine which runs tests.
(Generally, tests are executed in build machines such as jenkins, this requirement would be highly likely already met.)

I actually opt for keeping the current approach with running Docker commands against remote systems. From a security perspective, it's a matter of how you run your Docker daemon. In test environments, you can open a TCP listening socket to accept connections from remote systems - and on production systems, you can just disable the TCP listening socket. The rest is how you configure openwhisk.

Today, the DockerClientWithFileAccess() is created with an empty host parameter anyway such that the local Docker daemon BSD stream socket is used.

Make sense.
I just wanted to clean up this configuration as well: https://github.com/apache/incubator-openwhisk/pull/4430/files#diff-85b6ff9e51a5005c641ac4690b975b2cL55
But that's also a viable option.
I will work on it.

style95 · 2019-05-08T01:30:33Z

@sven-lange-last

Wouldn't it be possible to work around this problem by setting invoker_runcdir: "/run/runc" in the Ansible environment for Jenkins?

Thank you for the suggestion, I will try it.

style95 · 2019-05-09T10:09:15Z

@sven-lange-last It seems introducing invoker_runcdir with Docker 17.05 does not work.
This is because two different versions of runc are not compatible with each other.

[2019-05-09T10:04:50.544Z] [ERROR] [#tid_sid_invokerNanny] [RuncClient] info: command was unsuccessful, code: 1 (unsuccessful), stdout: , stderr: json: cannot unmarshal string into Go struct field State.init_process_start of type uint64 [marker:invoker_runc.pause_error:1322355:73]

Yeah, that's because you have two runc versions that are sharing incompatible state. That's not really a supported configuration (though we should be more friendly about not bailing when an error occurs).

opencontainers/runc#1592 (comment)

…nd runc and the type of user

style95 · 2019-05-10T09:26:35Z

Let me share the discussion with @sven-lange-last on Slack for history.

He suggested using invoker_use_runc: false in Jenkins CI environments and I applied it.

We discussed to add --root option in RuncClient as well, and it seems it's not required as of now because invoker is running as a root user.
So RuncClient client will always have access to /run/runc dir.
Since it can be broken at any time if the invoker runs as a non-root user, he also suggested leaving comments about its implications and it is also applied accordingly.

Now let's see what happens in Jenkins CI.

sven-lange-last · 2019-05-10T12:20:45Z

@style95 thanks a lot for the valuable discussion and your recent updates. Jenkins is working now.

We still have two different problems in Travis:

The code scanner complains about whitespace.
services.KafkaConnectorTests > Kafka connector should send and receive a kafka message even after shutdown one of instances fails because a remote Docker daemon cannot be connected.

The problem with services.KafkaConnectorTests is in the last line of this code snippet:

https://github.com/style95/openwhisk/blob/faf765cd9d3fedd3139298a19f911247f127c91a/tests/src/test/scala/services/KafkaConnectorTests.scala#L146-L152

  if (kafkaHosts.length > 1) {
    it should "send and receive a kafka message even after shutdown one of instances" in {
      kafkaHosts.indices.foreach { i =>
        val message = createMessage()
        val kafkaHost = kafkaHosts(i).split(":")(0)
        val startLog = s", started"
        val prevCount = startLog.r.findAllMatchIn(commandComponent(kafkaHost, "logs", s"kafka$i").stdout).length

sven-lange-last · 2019-05-13T06:40:09Z

PG 4 / 2840 completed successfully. No failures.

style95 · 2019-05-13T09:45:41Z

Finally!
This is ready to merge! 😄

sven-lange-last · 2019-05-13T10:44:49Z

ansible/environments/jenkins/group_vars/openwhisk-vm1-he-de

@@ -36,3 +36,4 @@ runtimes_enable_concurrency: true
 limit_action_concurrency_max: 500

 invoker1_machine: openwhisk-vm3-he-de
+invoker_use_runc: false


@houshengbo do you have plans to update Jenkins VMs to a higher Docker version once this PR is merged? This PR will update Docker to a higher version on the CLI side so that we needed to switch off runc usage in the invoker...

sven-lange-last

Thanks a lot for addressing all review feedback. Looks good to me now.

What do you think about giving a heads-up on the dev mailing list (again)? When merging this PR, a high Docker version is expected in development, test and production environments. So I suggest to ask whether somebody anticipates a general problem and to announce that this PR gets merged in 48 hours if nobody objects. Does that approach make sense to you?

style95 · 2019-05-13T14:58:53Z

@sven-lange-last Got it.
I would open a confirmation thread (based on the silent assent) in dev list.

tools/macos/README.md

tools/macos/docker-machine/README.md

codecov-io · 2019-05-14T04:30:58Z

Codecov Report

Merging #4430 into master will decrease coverage by 3.06%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4430      +/-   ##
==========================================
- Coverage   83.97%   80.91%   -3.07%     
==========================================
  Files         170      170              
  Lines        7940     7943       +3     
  Branches      536      539       +3     
==========================================
- Hits         6668     6427     -241     
- Misses       1272     1516     +244

Impacted Files	Coverage Δ
...pache/openwhisk/core/invoker/InvokerReactive.scala	`80.86% <100%> (+16.58%)`	⬆️
...core/database/cosmosdb/RxObservableImplicits.scala	`0% <0%> (-100%)`	⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala	`0% <0%> (-95.46%)`	⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala	`0% <0%> (-92.67%)`	⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala	`0% <0%> (-84.62%)`	⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala	`4% <0%> (-52%)`	⬇️
...in/scala/org/apache/openwhisk/common/Counter.scala	`40% <0%> (-20%)`	⬇️
...penwhisk/core/database/cosmosdb/CosmosDBUtil.scala	`81.81% <0%> (-15.16%)`	⬇️
...nwhisk/core/database/cosmosdb/CosmosDBConfig.scala	`94.11% <0%> (-5.89%)`	⬇️
...in/scala/org/apache/openwhisk/common/Logging.scala	`69.75% <0%> (-0.62%)`	⬇️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4740d40...31603d2. Read the comment docs.

* Update docker version in the controller and guides * Update travis configuration to install docker=18.06.3 * Separate docker installation from the travis setup script * Add docker setup script * Add CoordinatedShutdown to cleanup runtime containers * add a configuration for root runc dir * Disable runc use in Jenkins environment * Add comments which explain the correlation among the version docker and runc and the type of user * Reenable Docker remote API again

style95 mentioned this pull request Apr 7, 2019

Invoker use outdated docker API to pause container for docker version 18.09 #4431

Closed

Update Docker version to 18.06.3-ce

19f03eb

style95 force-pushed the update-docker-version branch from 7c8cab1 to 19f03eb Compare April 12, 2019 11:39

style95 changed the title ~~[WIP] Update docker client version to 18.09.4~~ Update docker client version to 18.06.3 Apr 12, 2019

rabbah reviewed Apr 12, 2019

View reviewed changes

style95 changed the title ~~Update docker client version to 18.06.3~~ [WIP]Update docker client version to 18.06.3 Apr 12, 2019

rabbah reviewed Apr 12, 2019

View reviewed changes

rabbah added deployment docker awaits-contributor The contributor needs to respond to comments from reviewer. labels Apr 12, 2019

style95 added 2 commits April 15, 2019 16:13

Update docker version in the controller and guides

63e957a

Update travis configuration to install docker=18.06.3

037418b

style95 changed the title ~~[WIP]Update docker client version to 18.06.3~~ Update docker client version to 18.06.3 Apr 15, 2019

style95 closed this Apr 16, 2019

style95 reopened this Apr 16, 2019

style95 closed this Apr 19, 2019

style95 reopened this Apr 19, 2019

sven-lange-last reviewed May 7, 2019

View reviewed changes

style95 added 4 commits May 9, 2019 19:10

Add CoordinatedShutdown to cleanup runtime containers

b37ba53

add a configuration for root runc dir

8cf9ec6

Apply ScalaFmt

84d524d

Fix bug

bfa6fa9

style95 force-pushed the update-docker-version branch 2 times, most recently from 2265246 to bfa6fa9 Compare May 9, 2019 11:40

style95 added 2 commits May 10, 2019 18:16

Disable runc use in Jenkins environment

ef6439e

Add comments which explain the correlation among the version docker a…

faf765c

…nd runc and the type of user

style95 added 2 commits May 13, 2019 10:19

Remove trailling whitespace

e27ec04

Reenable Docker remote API again

3f22d34

style95 force-pushed the update-docker-version branch from a377698 to 3f22d34 Compare May 13, 2019 04:53

sven-lange-last reviewed May 13, 2019

View reviewed changes

sven-lange-last approved these changes May 13, 2019

View reviewed changes

rabbah approved these changes May 13, 2019

View reviewed changes

tools/macos/README.md Outdated Show resolved Hide resolved

tools/macos/docker-machine/README.md Outdated Show resolved Hide resolved

Update tips

31603d2

style95 force-pushed the update-docker-version branch from 81bea87 to 31603d2 Compare May 14, 2019 02:25

rabbah merged commit d353d26 into apache:master May 15, 2019

style95 mentioned this pull request May 17, 2019

Update docker version #4336

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update docker client version to 18.06.3 #4430

Update docker client version to 18.06.3 #4430

style95 commented Apr 5, 2019 •

edited

Loading

style95 commented Apr 5, 2019 •

edited

Loading

style95 commented Apr 11, 2019

style95 commented Apr 11, 2019

style95 commented Apr 12, 2019

style95 commented Apr 12, 2019

rabbah left a comment

rabbah left a comment

rabbah left a comment

style95 commented Apr 12, 2019 •

edited

Loading

markusthoemmes commented Apr 15, 2019

style95 commented Apr 15, 2019 •

edited

Loading

rabbah commented Apr 15, 2019

markusthoemmes commented Apr 15, 2019

style95 commented Apr 15, 2019 •

edited

Loading

markusthoemmes commented Apr 15, 2019

sven-lange-last commented May 7, 2019 •

edited

Loading

sven-lange-last commented May 7, 2019

sven-lange-last May 7, 2019

style95 May 8, 2019

sven-lange-last May 8, 2019

style95 May 8, 2019

style95 commented May 8, 2019

style95 commented May 9, 2019 •

edited

Loading

style95 commented May 10, 2019

sven-lange-last commented May 10, 2019 •

edited

Loading

sven-lange-last commented May 13, 2019 •

edited

Loading

style95 commented May 13, 2019

sven-lange-last May 13, 2019

sven-lange-last left a comment

style95 commented May 13, 2019

codecov-io commented May 14, 2019


		Seq(docker, "--host", host + ":" + dockerPort, cmd, component)
		Seq(docker, cmd, component)

Update docker client version to 18.06.3 #4430

Update docker client version to 18.06.3 #4430

Conversation

style95 commented Apr 5, 2019 • edited Loading

Description

Related issue and scope

My changes affect the following components

Types of changes

Checklist:

style95 commented Apr 5, 2019 • edited Loading

style95 commented Apr 11, 2019

style95 commented Apr 11, 2019

style95 commented Apr 12, 2019

style95 commented Apr 12, 2019

rabbah left a comment

Choose a reason for hiding this comment

rabbah left a comment

Choose a reason for hiding this comment

rabbah left a comment

Choose a reason for hiding this comment

style95 commented Apr 12, 2019 • edited Loading

markusthoemmes commented Apr 15, 2019

style95 commented Apr 15, 2019 • edited Loading

rabbah commented Apr 15, 2019

markusthoemmes commented Apr 15, 2019

style95 commented Apr 15, 2019 • edited Loading

Docker 1.12 -> 18.06 or higher

Docker 18.06 -> 18.09

Keeping docker-1.12

markusthoemmes commented Apr 15, 2019

sven-lange-last commented May 7, 2019 • edited Loading

sven-lange-last commented May 7, 2019

sven-lange-last May 7, 2019

Choose a reason for hiding this comment

style95 May 8, 2019

Choose a reason for hiding this comment

sven-lange-last May 8, 2019

Choose a reason for hiding this comment

style95 May 8, 2019

Choose a reason for hiding this comment

style95 commented May 8, 2019

style95 commented May 9, 2019 • edited Loading

style95 commented May 10, 2019

sven-lange-last commented May 10, 2019 • edited Loading

sven-lange-last commented May 13, 2019 • edited Loading

style95 commented May 13, 2019

sven-lange-last May 13, 2019

Choose a reason for hiding this comment

sven-lange-last left a comment

Choose a reason for hiding this comment

style95 commented May 13, 2019

codecov-io commented May 14, 2019

Codecov Report

style95 commented Apr 5, 2019 •

edited

Loading

style95 commented Apr 5, 2019 •

edited

Loading

style95 commented Apr 12, 2019 •

edited

Loading

style95 commented Apr 15, 2019 •

edited

Loading

style95 commented Apr 15, 2019 •

edited

Loading

sven-lange-last commented May 7, 2019 •

edited

Loading

style95 commented May 9, 2019 •

edited

Loading

sven-lange-last commented May 10, 2019 •

edited

Loading

sven-lange-last commented May 13, 2019 •

edited

Loading