-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix spurious deadlock in overlay driver #2180
Fix spurious deadlock in overlay driver #2180
Conversation
Commit bd613df prevented data corruption due to simultaneous driver.CreateNetwork()/driver.DeleteNetwork() by holding the network lock through the read/modify part of the operation. However, part of the DeleteNetwork operation entails sending a message to the peerDB to tell that goroutine to flush entries on deletion. This can lead to a deadlock where: * driver.DeleteNetwork() starts and acquires driver.Lock() * peerDB receives some other request (e.g. EventNotify) and blocks on driver.Lock() * driver.DeleteNetwork() attempts a peerDB flush and blocks waiting on the synchronous peerDB operation channel This patch fixes the issue by deferring the peerDB flush operation until after DeleteNetwork() unlocks driver.Lock(). Commit bd613df only modified CreateNetwork() and DeleteNetwork() and the critical section that driver.Lock() protects in CreateNetwork() does not perform any peerDB notifications or other locks of driver data structures. So this solution should be a complete fix for any regressions introduced in bd613df. Signed-off-by: Chris Telfer <ctelfer@docker.com>
related to #2143 (adding for easier discoverability) |
Codecov Report
@@ Coverage Diff @@
## master #2180 +/- ##
=========================================
Coverage ? 40.57%
=========================================
Files ? 139
Lines ? 22490
Branches ? 0
=========================================
Hits ? 9126
Misses ? 12031
Partials ? 1333
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
includes; - moby/libnetwork#2178 Fix possible race on ingress programming - moby/libnetwork#2180 Fix spurious deadlock in overlay driver Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
includes; - moby/libnetwork#2178 Fix possible race on ingress programming - moby/libnetwork#2180 Fix spurious deadlock in overlay driver Signed-off-by: Sebastiaan van Stijn <github@gone.nl> Upstream-commit: 6630f214fae3bc7c8b802a0b58e1bacd9adc6148 Component: engine
includes; - moby/libnetwork#2178 Fix possible race on ingress programming - moby/libnetwork#2180 Fix spurious deadlock in overlay driver Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
…top (#3) * MSFT:17850093:fix docker to use registry policy rather than product sku to decide if argon is allowed * Make 'process isolation policy' 3 options: default, allow, deny * Fix fd leak on attach With a full attach, each attach was leaking 4 goroutines. This updates attach to use errgroup instead of the hodge-podge of waitgroups and channels. In addition, the detach event was never being sent. Signed-off-by: Brian Goff <cpuguy83@gmail.com> * Close readclosers returned by DecompressStream Signed-off-by: Joe Ferguson <joe@infosiftr.com> * Refactor and cleanup the intermediate container creation This PR is trying to refactor the `probeAndCreate` and cleanup related codes based on the refactoring. Signed-off-by: Dennis Chen <dennis.chen@arm.com> * Add support for `init` on services It's already supported by `swarmkit`, and act the same as `HostConfig.Init` on container creation. Signed-off-by: Vincent Demeester <vincent@sbr.pm> * Added network package to integration/internal to refactor integration tests calls to client.NetworkCreate Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * refactored integration tests under integration/network/macvlan to use network.Create Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * use unique names for resources in create service integration tests Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * Add image metrics for push and pull Signed-off-by: Daniel Nephin <dnephin@gmail.com> * When id is empty for overlay2/overlay, do not remove the directories. Signed-off-by: fanjiyun <fan.jiyun@zte.com.cn> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Use stdlib TLS dialer Since go1.8, the stdlib TLS net.Conn implementation implements the `CloseWrite()` interface. Signed-off-by: Brian Goff <cpuguy83@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Fix panic on daemon restart with running plugin Scenario: Daemon is ungracefully shutdown and leaves plugins running (no live-restore). Daemon comes back up. The next time a container tries to use that plugin it will cause a daemon panic because the plugin client is not set. This fixes that by ensuring that the plugin does get shutdown. Note, I do not think there would be any harm in just re-attaching to the running plugin instead of shutting it down, however historically we shut down plugins and containers when live-restore is not enabled. [kir@: consolidate code to deleteTaskAndContainer, a few minor nits] Signed-off-by: Brian Goff <cpuguy83@gmail.com> Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> * Allow vim be case insensitive for D in dockerfile Signed-off-by: Kunal Tyagi <tyagi.kunal@live.com> * Clean up in TestNegotiateAPIVersionEmpty Signed-off-by: John Stephens <johnstep@docker.com> * Fix race condition between exec start and resize Signed-off-by: David Wang <00107082@163.com> * Fix link to Docker Toolbox Signed-off-by: Francesco Mari <mari.francesco@gmail.com> * vendor: update containerd to 63522d9 Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * bump libnetwork to 19279f0492417475b6bfbd0aa529f73e8f178fb5 includes; - moby/libnetwork#2178 Fix possible race on ingress programming - moby/libnetwork#2180 Fix spurious deadlock in overlay driver Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * refactor delete network integration tests to use network package Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * Dockerfile*: bump Go to 1.10.3 Signed-off-by: Cristian Staretu <unclejack@users.noreply.github.com> * builder: snapshotter and exporter Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * daemon: access to distribution internals Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * layer: relax graphdriver ID format Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: experimental buildkit base Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: add graceful cancellation endpoint Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: add cache-from support to buildkit Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * vendor: add buildkit dependency Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: adapter update after vendor update Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: produce duplicate cache keys on pull Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: export build cache records Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: add usage to snapshotter adapter Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: expand prune to buildkit Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: fix compiling with buildkit on windows and integration tests Signed-off-by: Tibor Vass <tibor@docker.com> * builder: fixes after rebase Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: pass DOCKER_BUILDKIT to enable buildkit in tests Signed-off-by: Tibor Vass <tibor@docker.com> * builder: Add TODOBuildkit test requirement, specifically for TestBuildCancellationKillsSleep Signed-off-by: Tibor Vass <tibor@docker.com> * builder: patch incomplete download handling Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: add support for building from tarball Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: have TestBuildDockerignoringBadExclusion pass with buildkit Signed-off-by: Tibor Vass <tibor@docker.com> * builder: protect early progress writes Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: add support for separate upload-request Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: enable gateway through syntax directive Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: support for images without layers Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * integration-cli: fix health test Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: lint fixes Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: move tagging to exporter Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: fix cancellation context issue Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: notify output buffering on body close Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * api: update godoc Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: more experimental/windows validation Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: correct output buffering order Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * integration-cli: fix error message for non-buildkit Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * vendor: update runc for helper packages Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * Add support for schema 1 pull Signed-off-by: Derek McGowan <derek@mcgstyle.net> * builder: override history dates from ref metadata Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * Replace gotestyourself by gotest.tools github.com/gotestyourself/gotestyourself moved to gotest.tools with version 2.0.0. Moving to that one, bumping it to v2.1.0. Signed-off-by: Vincent Demeester <vincent@sbr.pm> * builder: updates for newer containerd Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: update ID of trace messages Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * Upgrade imdario/mergo to v0.3.5 Mainly to get inline with `docker/cli` version of that dependency Signed-off-by: Vincent Demeester <vincent@sbr.pm> * refactored network integration tests under integration/network/service_test.go to use network package Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * Update tests to use gotest.tools 👼 Signed-off-by: Vincent Demeester <vincent@sbr.pm> * Fix link anchors in CONTRIBUTING.md This is a follow up of moby#35168. Signed-off-by: Yuichiro Kaneko <spiketeika@gmail.com> * create service integration tests use network package Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * Whitelist syscalls linked to CAP_SYS_NICE in default seccomp profile * Update profile to match docker documentation at https://docs.docker.com/engine/security/seccomp/ Signed-off-by: Nicolas V Castet <nvcastet@us.ibm.com> * refactor ipvlan network integration tests to use network.Create Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * Don't fail to start daemon if builder source is not available Signed-off-by: John Howard <jhoward@microsoft.com> * Fix compilation on 32 bit systems * Update runc commit to ad0f525 * Update buildkit to dbf67a6 Signed-off-by: Eli Uriegas <eli.uriegas@docker.com> * migrate TestAPINetworkCreateDelete from integration-cli/ to integration/ Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * lcow: Allow the client to add or remove capabilities Signed-off-by: John Starks <jostarks@microsoft.com> * Updated path to be consistent w/ current Windows build process Signed-off-by: Benjamin Baker <Benjamin.baker@utexas.edu> * lcow: Allow the client to add device cgroup rules Signed-off-by: John Starks <jostarks@microsoft.com> * fix build on OpenBSD by defining Self() Signed-off-by: Fabian Raetz <fabian.raetz@gmail.com> * Update to containerd v1.1.1-rc.1 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Update overlay2 to use naive diff for changes The archive changes function is not implemented correctly to handle opaque directories. Signed-off-by: Derek McGowan <derek@mcgstyle.net> * Fix TestDaemonNoSpaceLeftOnDeviceError This test is testing if any "no space left on device" errors that occur during `docker pull` will not be masked by other errors. To test for this, a new loopback-device was created, and used as `--data-dir` ("/var/lib/docker"). However, `/var/lib/docker` is used for storing various other things, including a `cache.db` database, used by BuildKit, which is created during startup of the daemon. Creation of that file failed (due to `--data-dir` path being on a mount with limited size), which caused daemon start to fail before the test was able to run. This patch changes the size-limited mount to be used for the storage-driver directory only, so that the test is not affected by other parts of the code attempting to write files in it. To have a predictable path; the daemon used in this test is configured to use the `vfs` storage-driver. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Fix flaky test TestServiceGet Signed-off-by: Catalin Pirvu <pirvu.catalin94@gmail.com> * LCOW: Auto-select OS Signed-off-by: John Howard <jhoward@microsoft.com> Addresses moby#35089 (comment). This change enables the daemon to automatically select an image under LCOW that can be used if the API doesn't specify an explicit platform. For example: FROM supertest2014/nyan ADD Dockerfile / And docker build . will download the linux image (not a multi-manifest image) And similarly docker pull ubuntu will match linux/amd64 * add integration test guidelines Signed-off-by: Anda Xu <anda.xu@docker.com> * refactor network inspect integration tests to use network package Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * refactored remaining macvlan integration tests to use network package for creating networks Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * refactored integration/service/network integration tests to use network package Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * Just satisfying my OCD - fixed comment spacing and removed a hidden character Signed-off-by: Martin Muzatko <martin@happy-css.com> * Fix a small spacing issue As a follow up to moby#37331 (review) Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Initial support for OCI multi-platform image Add the OCI spec compatible image support in client side. Signed-off-by: Dennis Chen <dennis.chen@arm.com> * Update comments about `InitRouter` This is a follow-up of commit 408c7ad (PR: moby#32453) Signed-off-by: Yuichiro Kaneko <spiketeika@gmail.com> * vendor: update buildkit to cce2080ddb Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: buildkit rebase update Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * api: fix platform type Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * add unique names to integration/network/service_test.go Signed-off-by: Lotus Fenn <fenn.lotus@gmail.com> * vendor: update containerd to 08f7ee98 Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * LCOW: lazycontext: Use correct lstat, fix archive check Signed-off-by: John Howard <jhoward@microsoft.com> * Update containerd to v1.1.1-rc.2 Signed-off-by: Derek McGowan <derek@mcgstyle.net> * Move network conversions out of API router This stuff doesn't belong here and is causing imports of libnetwork into the router, which is not what we want. Signed-off-by: Brian Goff <cpuguy83@gmail.com> * distribution: fix passing platform struct to puller Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * builder: update platform support to puller Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * system: add back lcow validation function Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * distribution: remove custom matcher code Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * Register OCI image media types OCI types are backwards compatible with Docker manifest types, however the media types must be registered. Signed-off-by: Derek McGowan <derek@mcgstyle.net> * Update Microsoft/go-winio to 0.4.8 Fixes named pipe support for hyper-v isolated containers Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * LCOW: Prefer Windows over Linux in a manifest list When a manifest list contains both Linux and Windows images, always prefer Windows when the platform OS is unspecified. Also, filter out any Windows images with a higher build than the host, since they cannot run. Signed-off-by: John Stephens <johnstep@docker.com> * Re-add support for a custom .bashrc file in build env Signed-off-by: Harald Albers <github@albersweb.de> * Update moby to use scalable-lb libnetwork APIs This patch is required for the updated version of libnetwork and entails two minor changes. First, it uses the new libnetwork.NetworkDeleteOptionRemoveLB option to the network.Delete() method to automatically remove the load balancing endpoint for ingress networks. This allows removal of the deleteLoadBalancerSandbox() function whose functionality is now within libnetwork. The second change is to allocate a load balancer endpoint IP address for all overlay networks rather than just "ingress" and windows overlay networks. Swarmkit is already performing this allocation, but moby was not making use of these IP addresses for Linux overlay networks (except ingress). The current version of libnetwork makes use of these IP addresses by creating a load balancing sandbox and endpoint similar to ingress's for all overlay network and putting all load balancing state for a given node in that sandbox only. This reduces the amount of linux kernel state required per node. In the prior scheme, libnetwork would program each container's network namespace with every piece of load balancing state for every other container that shared *any* network with the first container. This meant that the amount of kernel state on a given node scaled with the square of the number of services in the cluster and with the square of the number of containers per service. With the new scheme, kernel state at each node scales linearly with the number of services and the number of containers per service. This also reduces the number of system calls required to add or remove tasks and containers. Previously the number of system calls required grew linearly with the number of other tasks that shared a network with the container. Now the number of system calls grows linearly only with the number of networks that the task/container is attached to. This results in a significant performance improvement when adding and removing services to a cluster that already heavily loaded. The primary disadvantage to this scheme is that it requires the allocation of an additional IP address per node per subnet for every node in the cluster that has a task on the given subnet. However, as mentioned, swarmkit is already allocating these IP addresses for every node and they are going unused. Future swarmkit modifications should be examined to only allocate said IP addresses when nodes actually require them. Signed-off-by: Chris Telfer <ctelfer@docker.com> * Adds a few more names to the name generator. Signed-off-by: Debayan De <debayande@users.noreply.github.com> * bump libnetwork to b0186632 Bump libnetwork to b0186632522c68f4e1222c4f6d7dbe518882024f. This includes the following changes: * Dockerize protocol buffer generation and update (78d9390a..e12dd44c) * Use new plugin interfaces provided by plugin pkg (be94e134) * Improve linux load-balancing scalability (5111c24e..366b9110) Signed-off-by: Chris Telfer <ctelfer@docker.com> * bump libnetwork to 430c00a Bump libnetwork to 430c00a6a6b3dfdd774f21e1abd4ad6b0216c629. This includes the following moby-affecting changes: * Update vendoring for go-sockaddr (8df9f31a) * Fix inconsistent subnet allocation by preventing allocation of overlapping subnets (8579c5d2) * Handle IPv6 literals correctly in port bindings (474fcaf4) * Update vendoring for miekg/dns (8f307ac8) * Avoid subnet reallocation until required (9756ff7ed) * Bump libnetwork build to use go version 1.10.2 (603d2c1a) * Unwrap error type returned by PluginGetter (aacec8e1) * Update vendored components to match moby (d768021dd) * Add retry field to cluster-peers probe (dbbd06a7) * Fix net driver response loss on createEndpoint (1ab6e506) (fixes docker/for-linux#348) Signed-off-by: Chris Telfer <ctelfer@docker.com> * Pass endpoint to the CloudWatch Logs logging driver Signed-off-by: haikuoliu <haikuo@amazon.com> * Update tests w/ new libnetwork contraints The TestDockerNetworkIPAMMultipleNetworks test allocates several networks simultaneously with overlapping IP addresses. Libnetwork now forbids this. Adjust the test case to use distinct IP ranges for the networks it creates. Signed-off-by: Chris Telfer <ctelfer@docker.com> * Fix bindmount autocreate race When using the mounts API, bind mounts are not supposed to be automatically created. Before this patch there is a race condition between valiating that a bind path exists and then actually setting up the bind mount where the bind path may exist during validation but was removed during mountpooint setup. This adds a field to the mountpoint struct to ensure that binds created over the mounts API are not accidentally created. Signed-off-by: Brian Goff <cpuguy83@gmail.com> * Update documents of `dispatchAdd` `ADD` does not support git. Ref: moby#14704 (comment) Signed-off-by: Yuichiro Kaneko <spiketeika@gmail.com> * Update documents of `Detect` By 0296797, `progressReader` and `remoteURL` were removed from arguments. So developers who use `Detect` not need to care about when `ProgressReaderFunc` is used. Signed-off-by: Yuichiro Kaneko <spiketeika@gmail.com> * builder: return image ID in API when using buildkit Signed-off-by: Tibor Vass <tibor@docker.com> * api: Change Platform field back to string (temporary workaround) This partially reverts moby#37350 Although specs.Platform is desirable in the API, there is more work to be done on helper functions, namely containerd's platforms.Parse that assumes the default platform of the Go runtime. That prevents a client to use the recommended Parse function to retrieve a specs.Platform object. With this change, no parsing is expected from the client. Signed-off-by: Tibor Vass <tibor@docker.com> * builder: do not send duplicate status for completed jobs Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * vendor: update buildkit to 9acf51e491 Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * add vim-plug setting this should work ( tried on my machine) Signed-off-by: Ian Chen <ianre657@gmail.com> * update fsnotify to v1.4.7 Fixes a possible deadlock on closing the watcher on kqueue Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Migrate some ipcmode tests to integration This fix migrates some ipcmode tests in integration-cli to integration tests. Signed-off-by: Yong Tang <yong.tang.github@outlook.com> * Update cloudflare/cfssl to 1.3.2 Matching the version that is used in SwarmKit Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Replaced "--update-cache" argument with "--no-cache" in apk call to reduce alpine base image by 10-12% (avoid useless indexes in /var/cache/apk) Signed-off-by: Mickaël Remars <github@remars.com> * Add /proc/acpi to masked paths The deafult OCI linux spec in oci/defaults{_linux}.go in Docker/Moby from 1.11 to current upstream master does not block /proc/acpi pathnames allowing attackers to modify host's hardware like enabling/disabling bluetooth or turning up/down keyboard brightness. SELinux prevents all of this if enabled. Signed-off-by: Antonio Murdaca <runcom@redhat.com> * Removed the "-i -t" arguments from the smoke test calling printf (these flags seem not really needed, and break jenkins builds with error "the input device is not a TTY") Signed-off-by: Mickaël Remars <github@remars.com> * Bump swarmkit to include task reaper fixes and more metrics. This includes the following behavior-modifying PRs: - moby/swarmkit#2673 - moby/swarmkit#2669 - moby/swarmkit#2675 - moby/swarmkit#2664 Signed-off-by: Ying Li <ying.li@docker.com> * Bump libnetwork to 3ac297bc Bump libnetwork to 3ac297bc7fd0afec9051bbb47024c9bc1d75bf5b in order to get fix 0c3d9f00 which addresses a flaw that the scalable load balancing code revealed. Attempting to print sandbox IDs where the sandbox name was too short results in a goroutine panic. This can occur with sandboxes with names of 1 or 2 characters in the previous code. But due to naming updates in the scalable load balancing code, it could now occur for networks whose name was 3 characters and at least one of the integration tests employed such networks (named 'foo', 'bar' and 'baz'). This update also brings in several changes as well: * 6c7c6017 - Fix error handling about bridgeSetup * 5ed38221 - Optimize networkDB queue * cfa9afdb - ndots: produce error on negative numbers * 5586e226 - improve error message for invalid ndots number * 449672e5 - Allows to set generic knobs on the Sandbox * 6b4c4af7 - do not ignore user-provided "ndots:0" option * 843a0e42 - Adjust corner case for reconnect logic Signed-off-by: Chris Telfer <ctelfer@docker.com> * Get err type in removeNetworks() w/ errors.Cause() Commit c0bc14e wrapped the return value of nw.Delete() with some extra information. However, this breaks the code in containerAdaptor.removeNetworks() which ignores certain specific libnetwork error return codes. Said codes actually don't represent errors, but just regular conditions to be expected in normal operation. The removeNetworks() call checked for these errors by type assertions which the errors.Wrap(err...) breaks. This has a cascading effect, because controller.Remove() invokes containerAdaptor.removeNetworks() and if the latter returns an error, then Remove() fails to remove the container itself. This is not necessarily catastrophic since the container reaper apparently will purge the container later, but it is clearly not the behavior we want. Signed-off-by: Chris Telfer <ctelfer@docker.com> * vendor: update containerd to b41633746 Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * Ensure RUN instruction to run without Healthcheck Before this commit Healthcheck run if HEALTHCHECK instruction appears before RUN instruction. By passing `withoutHealthcheck` to `copyRunConfig`, always RUN instruction run without Healthcheck. Fix: moby#37362 Signed-off-by: Yuichiro Kaneko <spiketeika@gmail.com> * Fix typo on test.md It said `TESTFLAGS='-test.run ^TestValidateIPAddress$' make test-unit` runs `TestBuild` test, but actually runs `TestValidateIPAddress` test. Signed-off-by: Donghwa Kim <shanytt@gmail.com> * Pass log-level to containerd dockerd allows the `--log-level` to be specified, but this log-level was not forwarded to the containerd process. This patch sets containerd's log-level to the same as dockerd if a custom level is provided. Now that `--log-level` is also passed to containerd, the default "info" is removed, so that containerd's default (or the level configured in containerd.toml) is still used if no log-level is set. Before this change: containerd would always be started without a log-level set (only the level that's configured in `containerd.toml`); ``` root 1014 2.5 2.1 496484 43468 pts/0 Sl+ 12:23 0:00 dockerd root 1023 1.2 1.1 681768 23832 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml ``` After this change: when running `dockerd` without options (same as current); ``` root 1014 2.5 2.1 496484 43468 pts/0 Sl+ 12:23 0:00 dockerd root 1023 1.2 1.1 681768 23832 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml ``` when running `dockerd --debug`: ``` root 600 0.8 2.1 512876 43180 pts/0 Sl+ 12:20 0:00 dockerd --debug root 608 0.6 1.1 624428 23672 ? Ssl 12:20 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level debug ``` when running `dockerd --log-level=panic` ``` root 747 0.6 2.1 496548 43996 pts/0 Sl+ 12:21 0:00 dockerd --log-level=panic root 755 0.7 1.1 550696 24100 ? Ssl 12:21 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level panic ``` combining `--debug` and `--log-level` (`--debug` takes precedence): ``` root 880 2.7 2.1 634692 43336 pts/0 Sl+ 12:23 0:00 dockerd --debug --log-level=panic root 888 1.0 1.1 616232 23652 ? Ssl 12:23 0:00 \_ docker-containerd --config /var/run/docker/containerd/containerd.toml --log-level debug ``` Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Bump gometalinter to v2.0.6 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Compile fix Go 1.11beta1 (rightfully) complains: > 15:38:37 daemon/cluster/controllers/plugin/controller.go:183: > Entry.Debugf format %#T has unrecognized flag # This debug print was added by commit 72c3bcf. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * aufs: fix Wrapf args Fix the following go-1.11beta1 build error: > daemon/graphdriver/aufs/aufs.go:376: Wrapf format %s reads arg #1, but call has 0 args While at it, change '%s' to %q. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * loggerutils: build fixes, improve errors There are two build errors when using go-1.11beta1: > daemon/logger/loggerutils/logfile.go:367: Warningf format %q arg f.Name is a func value, not called > daemon/logger/loggerutils/logfile.go:564: Debug call has possible formatting directive %v In the first place, the file name is actually not required as error message already includes it. While at it, fix a couple of other places for more correct messages, and make sure to not add a file name if an error already has it. Fixes: f69f09f Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Fix ineffassign linting Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Fix golint issues Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Bump libnetwork to d00ceed44cc447c77f25cdf5d59e83163bdcb4c9 The absence of the file /proc/sys/net/ipv6/conf/all/disable_ipv6 doesn't appear to affect functionality, at least at this time. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Bump containerd daemon to v1.1.1 Signed-off-by: Brian Goff <cpuguy83@gmail.com> * builder: fix duplicate calls to mountable Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com> * daemon/*.go: fix some Wrap[f]/Warn[f] errors In particular, these two: > daemon/daemon_unix.go:1129: Wrapf format %v reads arg #1, but call has 0 args > daemon/kill.go:111: Warn call has possible formatting directive %s and a few more. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Return error if basename is expanded to blank Fix: moby#37325 Signed-off-by: Yuichiro Kaneko <spiketeika@gmail.com> * lcow: fix debug in startServiceVMIfNotRunning() When go-1.11beta1 is used for building, the following error is reported: > 14:56:20 daemon\graphdriver\lcow\lcow.go:236: Debugf format %s reads > arg #2, but call has 1 arg While fixing this, let's also fix a few other things in this very function (startServiceVMIfNotRunning): 1. Do not use fmt.Printf when not required. 2. Use `title` whenever possible. 3. Don't add `id` to messages as `title` already has it. 4. Remove duplicated colons. 5. Try to unify style of messages. 6. s/startservicevmifnotrunning/startServiceVMIfNotRunning/ ... In general, logging/debugging here is a mess and requires much more love than I can give it at the moment. Areas for improvement: 1. Add a global var logger = logrus.WithField("storage-driver", "lcow") and use it everywhere else in the code. 2. Use logger.WithField("id", id) whenever possible (same for "context" and other similar fields). 3. Revise all the errors returned to be uniform. 4. Make use of errors.Wrap[f] whenever possible. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> * Fix error string in docker CLI test Signed-off-by: Sandeep Bansal <sabansal@microsoft.com> * vendor: update continuity to 0377f7d767206 This is to include the Go 1.11 fix (containerd/continuity#120). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> * Remove stray uses of "golang.org/x/net/context" Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Fix API template to not use "golang.org/x/net/context" Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Refactor daemon.info to reduce cyclomatic complexity Before this change; gocyclo daemon/info.go 17 daemon (*Daemon).SystemInfo daemon/info.go:27:1 2 daemon (*Daemon).SystemVersion daemon/info.go:150:1 1 daemon (*Daemon).showPluginsInfo daemon/info.go:195:1 After this change; gocyclo daemon/info.go 8 daemon (*Daemon).fillSecurityOptions daemon/info.go:150:1 5 daemon operatingSystem daemon/info.go:201:1 3 daemon (*Daemon).fillDriverInfo daemon/info.go:121:1 2 daemon hostName daemon/info.go:172:1 2 daemon memInfo daemon/info.go:192:1 2 daemon kernelVersion daemon/info.go:182:1 1 daemon (*Daemon).SystemVersion daemon/info.go:81:1 1 daemon (*Daemon).SystemInfo daemon/info.go:27:1 1 daemon (*Daemon).fillPluginsInfo daemon/info.go:138:1 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Unexport daemon.FillPlatformInfo Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Fix golint warning on generated "volume" types Should fix ``` api/types/volume/volume_create.go Line 10: warning: comment on exported type VolumeCreateBody should be of the form "VolumeCreateBody ..." (with optional leading article) (golint) api/types/volume/volume_list.go Line 12: warning: comment on exported type VolumeListOKBody should be of the form "VolumeListOKBody ..." (with optional leading article) (golint) ``` Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Fix flakyness in TestDockerNetworkInternalMode Instead of waiting for the DNS to fail, try to access a specific external IP and verify that 100% of the pakcets are being lost. Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com> * moved integration tests from docker_cli_config_create_test.go to integration/config Signed-off-by: Arash Deshmeh <adeshmeh@ca.ibm.com> * Bump containerd daemon to v1.1.2 Updates cri version to 1.0.4, to add `max-container-log-line-size` Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * pkg/tarsum: fix unit test for Go 1.11+ Since go-1.11beta1 archive/tar, tar headers with Typeflag == TypeRegA (numeric 0) (which is the default unless explicitly initialized) are modified to have Typeflag set to either tar.TypeReg (character value '0', not numeric 0) or tar.TypeDir (character value '5') [1]. This results in different Typeflag value in the resulting header, leading to a different Checksum, and causing the following test case errors: > 12:09:14 --- FAIL: TestTarSums (0.05s) > 12:09:14 tarsum_test.go:393: expecting > [tarsum+sha256:8bf12d7e67c51ee2e8306cba569398b1b9f419969521a12ffb9d8875e8836738], > but got > [tarsum+sha256:75258b2c5dcd9adfe24ce71eeca5fc5019c7e669912f15703ede92b1a60cb11f] > ... (etc.) All the other code explicitly sets the Typeflag field, but this test case is not, causing the incompatibility with Go 1.11. Therefore, the fix is to set TypeReg explicitly, and change the expected checksums in test cases). Alternatively, we can vendor archive/tar again (for the 100th time), but given that the issue is limited to the particular test case it does not make sense. This fixes the test for all Go versions. [1] https://go-review.googlesource.com/c/go/+/85656 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> * vendor: buildkit to 98f1604134f945d48538ffca0e18662337b4a850 Signed-off-by: Tibor Vass <tibor@docker.com> * builder: set buildkit's exported product variable via PRODUCT This introduces a PRODUCT environment variable that is used to set a constant at dockerversion.ProductName. That is then used to set BuildKit's ExportedProduct variable in order to show useful error messages to users when a certain version of the product doesn't support a BuildKit feature. Signed-off-by: Tibor Vass <tibor@docker.com> * validate: please vet Signed-off-by: Tibor Vass <tibor@docker.com> * Fix flaky TestExternalGraphDriver/pull test This test occassionally fails on s390x and Power; 03:16:04 --- FAIL: TestExternalGraphDriver/pull (1.08s) 03:16:04 external_test.go:402: assertion failed: error is not nil: Error: No such image: busybox:latest Most likely these failures are caused due to Docker Hub updating the busybox:latest image, but not all architectures yet being available. Instead of using `:latest`, pull an image by digest, so that the test doesn't depend on Docker Hub having all architectures available for `:latest`. I selected the same digest as is currently used as "frozen image" in the Dockerfile. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Use constant for task runtime value Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * Update swarmkit to 6826639 changes included: - swarmkit moby#2706 address unassigned task leak when service is removed - swarmkit moby#2676 Fix racy batching on the dispatcher - swarmkit moby#2693 Fix linting issues revealed by Go 1.11 Signed-off-by: Sebastiaan van Stijn <github@gone.nl> * vendor: bump google/certificate-transparency-go to 1.0.20 This is to include the Windows + Go1.11 fix (google/certificate-transparency-go#284). Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> * Add --device support for Windows Implements the --device forwarding for Windows daemons. This maps the physical device into the container at runtime. Ex: docker run --device="class/<clsid>" <image> <cmd> Signed-off-by: Justin Terry (VM) <juterry@microsoft.com> * Add osusergo build tar for static binaries Go 1.11 includes a fix to os/user to be working in a static binary (fixing golang/go#23265). The fix requires `osusergo` build tag to be set for static binaries, which is what this commit adds (also for containerd). [v2: sort build tags alphabetically] Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com> * Adds LinuxMetadata support by default on Windows 1. Sets the LinuxMetadata flag by default on Windows LCOW v1 MappedDirectories. Signed-off-by: Justin Terry (VM) <juterry@microsoft.com> * Dont submit * Initial check-in for Docker Windows\ARM32 * MSFT:17850093:fix docker to use registry policy rather than product sku to decide if argon is allowed * Make 'process isolation policy' 3 options: default, allow, deny * Fixing process isolation policy enforcement * Rebase merge conflict resdue * Merge conflict error fix * Tabs fix * Typo fix in GetProcessIsolationPolicy()
Commit bd613df prevented data corruption due to simultaneous driver.CreateNetwork()/driver.DeleteNetwork() by holding the network lock through the read/modify part of the operation. However, part of the DeleteNetwork operation entails sending a message to the peerDB to tell that goroutine to flush entries on deletion. This can lead to a deadlock where:
This patch fixes the issue by deferring the peerDB flush operation until after DeleteNetwork() unlocks driver.Lock(). Commit bd613df only modified CreateNetwork() and DeleteNetwork() and the critical section that driver.Lock() protects in CreateNetwork() does not perform any peerDB notifications or other locks of driver data structures. So this solution should be a complete fix for any regressions introduced in bd613df.