Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17.06 won't deploy stack anymore. no suitable node (unsupported platform on 3 nodes) on armhf docker cluster #2294

Closed
trunet opened this issue Jul 2, 2017 · 51 comments
Assignees
Labels

Comments

@trunet
Copy link

trunet commented Jul 2, 2017

Hello,

I have an armhf cluster and since 17.06 installed, I'm getting this error and my stacks doesn't come up, it stays as pending forever. There's no constraint. This was working perfectly on 17.03.

Inspecting the task, I'm getting:

        "Status": {
            "Timestamp": "2017-07-02T12:51:00.556959128Z",
            "State": "pending",
            "Message": "no suitable node (unsupported platform on 3 nodes)",
            "ContainerStatus": {},
            "PortStatus": {}
        },

docker info:

Containers: 1
 Running: 1
 Paused: 0
 Stopped: 0
Images: 9
Server Version: 17.06.0-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local nfs
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: chzqrk30d8aph7ikg60owdbjz
 Is Manager: true
 ClusterID: 1ucdfzovu4whdawkzv8wbfeb6
 Managers: 3
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 192.168.178.6
 Manager Addresses:
  192.168.178.6:2377
  192.168.178.7:2377
  192.168.178.8:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: cfb82a876ecc11b5ca0977d1733adbe58599088a
runc version: 2d41c047c83e09a6d61d464906feb2a2f3c52aa4
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.34-45
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: armv7l
CPUs: 8
Total Memory: 1.949GiB
Name: odroid01.casaams.wsartori.com
ID: DRWB:PHMO:GCXF:GDTK:QXDF:V2WM:6ZAM:MC5G:GYID:7YOH:PBQC:BOYE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 registry:5000
 127.0.0.0/8
Live Restore Enabled: false
@trunet
Copy link
Author

trunet commented Jul 2, 2017

probably introduced on 8edbb92 by @nishanttotla

@nishanttotla
Copy link
Contributor

@trunet can you show the output for service inspect and also what docker info on the nodes looks like?

@nishanttotla nishanttotla self-assigned this Jul 2, 2017
@trunet
Copy link
Author

trunet commented Jul 2, 2017

docker info is on the first comment.

docker service inspect:

[
    {
        "ID": "n0bhpy9ibyiiv2wkux65x1dsv",
        "Version": {
            "Index": 2286
        },
        "CreatedAt": "2017-07-02T21:13:59.560944718Z",
        "UpdatedAt": "2017-07-02T21:13:59.56740151Z",
        "Spec": {
            "Name": "portal_portal",
            "Labels": {
                "com.docker.stack.image": "arm32v7/httpd:latest",
                "com.docker.stack.namespace": "portal"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "arm32v7/httpd:latest@sha256:90fb0d45f7267ac96523f2030ab842d61c77c65999024b5f67771f599ae4bfa8",
                    "Labels": {
                        "com.docker.stack.namespace": "portal"
                    },
                    "Privileges": {
                        "CredentialSpec": null,
                        "SELinuxContext": null
                    },
                    "Mounts": [
                        {
                            "Type": "bind",
                            "Source": "/mnt/glusterfs/portal",
                            "Target": "/usr/local/apache2/htdocs"
                        }
                    ],
                    "StopGracePeriod": 10000000000,
                    "DNSConfig": {}
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "any",
                    "Delay": 5000000000,
                    "MaxAttempts": 0
                },
                "Placement": {
                    "Platforms": [
                        {
                            "Architecture": "arm",
                            "OS": "linux"
                        }
                    ]
                },
                "Networks": [
                    {
                        "Target": "jzho3a68yjsqvakajo7b9yqv3",
                        "Aliases": [
                            "portal"
                        ]
                    }
                ],
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 2
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "RollbackConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "EndpointSpec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 80,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip",
                "Ports": [
                    {
                        "Protocol": "tcp",
                        "TargetPort": 80,
                        "PublishedPort": 80,
                        "PublishMode": "ingress"
                    }
                ]
            },
            "Ports": [
                {
                    "Protocol": "tcp",
                    "TargetPort": 80,
                    "PublishedPort": 80,
                    "PublishMode": "ingress"
                }
            ],
            "VirtualIPs": [
                {
                    "NetworkID": "sposg3gl89sjhlhy1hdpqbwt6",
                    "Addr": "10.255.0.8/16"
                },
                {
                    "NetworkID": "jzho3a68yjsqvakajo7b9yqv3",
                    "Addr": "10.0.1.2/24"
                }
            ]
        }
    }
]

@nishanttotla
Copy link
Contributor

I see the issue.

It seems like the node reports

OSType: linux
Architecture: armv7l

but the image reports

OSType: linux
Architecture: arm

and that causes scheduling to fail. We will need to add more normalization for architecture names. I'll do that but it would be useful to have a full list of architecture name variations.

cc @thaJeztah @aaronlehmann currently we only do this for x86_64 and amd64: https://github.com/docker/swarmkit/blob/master/manager/scheduler/filter.go#L288

@trunet
Copy link
Author

trunet commented Jul 3, 2017

I can think of aarch64 armv7l, but they are completely different architecture, aarch64 is 64-bits, armv7l is 32-bits.

also, check this: https://github.com/docker-library/official-images#architectures-other-than-amd64

@trunet
Copy link
Author

trunet commented Jul 3, 2017

@nishanttotla Is there a way to build the image with armv7l architecture instead of only arm, I didn't find anything on commit, push on build to add this? how docker hub is handling that, I suppose all https://hub.docker.com/r/arm32v7/ should uses armv7l?

@nishanttotla
Copy link
Contributor

nishanttotla commented Jul 3, 2017

@trunet I'm not sure about that, but we'll see what's the best way to resolve this. Until then, you can get around this issue by using the --no-resolve-image flag on service create/update which will not add platform information to your service spec.

@trunet
Copy link
Author

trunet commented Jul 3, 2017

@nicolaka as I'm using stack, I didn't find the --no-resolve-image equivalent on compose 3.3 reference.

@trunet
Copy link
Author

trunet commented Jul 3, 2017

@nishanttotla
Copy link
Contributor

@trunet apologies, yes for stacks you can use --resolve-image=never for now.

@thaJeztah
Copy link
Member

We will need to add more normalization for architecture names.

Is there a list of valid architectures in the image's data? (Same for nodes)?

ARM is specifically tricky, as there's a lot of variations, and ways to express them (armhf, armv6, armv7 etc.)

If there's a list of valid values, we can decide to ignore other values

@nishanttotla
Copy link
Contributor

Related: opencontainers/image-spec#661

@trunet I'm working on figuring out what the right fix is for this case.

@stevvooe
Copy link
Contributor

stevvooe commented Jul 5, 2017

@tianon Do you know how Config.Architecture is getting set for images in arm32v7? Are these just using GOARCH directly or is there a special build process?

@tianon
Copy link
Member

tianon commented Jul 5, 2017

@stevvooe that'd be GOARCH directly -- the builder for those actually used to be a 64bit build (and thus was setting Architecture to arm64), but I'm now using a 32bit build of Docker on a 64bit machine and using linux32 to ensure the kernel claims it's also 32bit (just to help cut down on the types of things that can go wrong when cross-building like this), so they should all be set to arm (which is as specific as an individual image manifest can get without a manifest list, as you know)

I think way back in Docker's history, someone thought it'd be helpful to include "multiarch" metadata on images, and just threw in GOOS and GOARCH as-is, and it hasn't been adjusted since. IMO that'd be better served by having a full platform object ala manifest lists, but figured for individual non-multiarch images it really didn't make much difference either way (especially for image-spec's 1.0).

@tianon
Copy link
Member

tianon commented Jul 5, 2017

In the manifest list pushing code we've got (which again, as you know, isn't deployed to library/ just yet), arm32v7 turns into something like the following:

platform:
  os: linux
  architecture: arm
  variant: v7

@tianon
Copy link
Member

tianon commented Jul 5, 2017

Ah, see moby/moby@f9359f5 for where those fields were first populated with GOARCH (appears it was hard-coded to x86_64 before, and no OS info).

@stevvooe
Copy link
Contributor

stevvooe commented Jul 5, 2017

@tianon The problem with GOARCH on images, is that we lose variant when assembling Platform from an image configuration: https://github.com/moby/moby/blob/master/api/server/router/distribution/distribution_routes.go#L120.

For the vast number of cases, I think we can assume variant is v7 but I am sure that will break.

@tianon
Copy link
Member

tianon commented Jul 5, 2017

Indeed -- I'm 100% with you on GOARCH being insufficient, which is why I wish individual non-list manifests had a full platform object so that this could be set properly. 😞

To give another example that's completely broken, all the images under i386 will have Architecture set to amd64, since I don't build them on real 32bit hardware (and didn't want to compile a custom Docker binary just to do those builds). They only exist (in my mind) as a rough sanity check for a simple way to test multiarch (since they can build on any amd64 machine as-is), but their metadata being wrong irks me a lot (it'll be correct when they're included in a manifest list, as seen under trollin, but the individual image metadata will be wrong until/unless we get additional support in Docker itself for informing it what it should be putting there).

It's even conceivable to build s390x images on an amd64 daemon, as long as they don't have any RUN lines (ie, FROM scratch, ADD tarball /, etc), or as long as appropriate qemu bits are in the right places somehow.

Here's a more concrete example:

$ manifest-tool inspect trollin/httpd:latest
Name:   trollin/httpd:latest (Type: application/vnd.docker.distribution.manifest.list.v2+json)
Digest: sha256:6f43aac63ceb4fbfee6a724f3d8aeec4366bb83cd7ef37a9905ccf2256ba8557
 * Contains 6 manifest references:
1    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
1       Digest: sha256:8f58a3ef340038615498cead8b83fa3b31e4fe5c16961c6c3635e973ac9303ed
1  Mfst Length: 1780
1     Platform:
1           -      OS: linux
1           - OS Vers: 
1           - OS Feat: []
1           -    Arch: amd64
1           - Variant: 
1           - Feature: 
1     # Layers: 7
         layer 1: digest = sha256:9f0706ba7422412cd468804fee456786f88bed94bf9aea6dde2a47f770d19d27
         layer 2: digest = sha256:47bacf36113fe830a77280f81a8203a228fa4bb4536f145a333709b9da0f7cf7
         layer 3: digest = sha256:56798d8e5a3081371b84de38415c30c0f1ae034e4d57d82383c468a95521fc53
         layer 4: digest = sha256:94b25413538ac04aa893f729c216553d9c86b0ea69318cfe903e16eb078492b7
         layer 5: digest = sha256:97d879f4e260dd8d46a4eaf0c29f253a501fa82337840c30a6274de23227ab6e
         layer 6: digest = sha256:2a4f7e960a3e3b32308101dec9c1525b961a242dca799737d2ad0c3e19350b42
         layer 7: digest = sha256:12f5eb5312902eedee33d3b8d0f2abc7750f3ae7e32733b26547371f8eb6b034

2    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
2       Digest: sha256:90fb0d45f7267ac96523f2030ab842d61c77c65999024b5f67771f599ae4bfa8
2  Mfst Length: 1780
2     Platform:
2           -      OS: linux
2           - OS Vers: 
2           - OS Feat: []
2           -    Arch: arm
2           - Variant: v7
2           - Feature: 
2     # Layers: 7
         layer 1: digest = sha256:72c70f9f7d679945bc71d954dc0c7de236e0067af495d09e9bea24f497cc79b7
         layer 2: digest = sha256:c3529be5f104abaa0a6777ce5aa3d62a14d637886a4eebc1fc8ed730c4009654
         layer 3: digest = sha256:d8da62524d5de9ecb71f0046c66d3a291e6b2f81496a07d28e30b4e81f04ac51
         layer 4: digest = sha256:7d7da9871d55dac5ea74a2b5f684f5c94bb5ca9b51ff5f2a81330facfcf67122
         layer 5: digest = sha256:c2c771d498cd865c5dff06aa02340b0c859cebcbe6a88246c7e87e287ab70c48
         layer 6: digest = sha256:de5a89172d87cbecee5e7924eb3c003458ad751d51e25eaed5d8a89912ef3ccb
         layer 7: digest = sha256:a922909dfad204174854c24774fc54c9016d96d587480b6d40c0c56ae73612a7

3    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
3       Digest: sha256:21920abf229fd278a820287cb8fbe9db206ecbbb4b5bd8a34aab0af8b9c0f470
3  Mfst Length: 1780
3     Platform:
3           -      OS: linux
3           - OS Vers: 
3           - OS Feat: []
3           -    Arch: arm64
3           - Variant: v8
3           - Feature: 
3     # Layers: 7
         layer 1: digest = sha256:0dcb325d306f51780dbde5818f38ed8b0e8f478b56132f46bb5c30385f86eaa6
         layer 2: digest = sha256:0a2d58b6ee283f9f76e8f06d3fac65e85b9e289df29ca44dc9314586e055cecc
         layer 3: digest = sha256:eac331a2af3233030fcf1ad5300f9684767ebc08b8447e52b00783cb43de360a
         layer 4: digest = sha256:b382ea768f87d2a04cac208183cd6ee2b487748fb7fe33a9f0e6c4ab0912a029
         layer 5: digest = sha256:313fc6a3542b845ea9d51e727b1b0270de45d71e58a4cbd5e68c427761ce8107
         layer 6: digest = sha256:1a9a6cd7e8894b6345a3d83a4cefba76609c8e096d5163a67c0dee33c7927ffb
         layer 7: digest = sha256:26af40e76c32da3f9b7eac5f62d73ee6d51c30f0b257e37596da4bcbb7e34877

4    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
4       Digest: sha256:261d6c4753982e203308fc5c637a74bdcd986a2edbab7d8aac39a391cae7a683
4  Mfst Length: 1780
4     Platform:
4           -      OS: linux
4           - OS Vers: 
4           - OS Feat: []
4           -    Arch: 386
4           - Variant: 
4           - Feature: 
4     # Layers: 7
         layer 1: digest = sha256:2fa359c89a0e952ec2fe14e3c584ee13d6ec919c73a7dcac34ba320a459e2a62
         layer 2: digest = sha256:19378a3c2be7bef3f223e1ca4a0391c966e6330d8692ed17042471f25509713b
         layer 3: digest = sha256:9e17073700efe44252f1c00a14cf28030458d929d0de85f4873189e23b218530
         layer 4: digest = sha256:38c6ce3cfa9a4835b70425932f1da52dc06e8416369192b5b2fd0768956b907d
         layer 5: digest = sha256:23815ae9a9b769433bf361d90f0654af265d56f7792633aaae368cc17d8af505
         layer 6: digest = sha256:7951cc644f4c9b2aa30c2a8653ce4d2b7ac8390fb72dab2e6f5d3f522bbe738d
         layer 7: digest = sha256:381d02ad82455287029be0a8cf1a1cae9915c26289bac3c24ed7636ba77c6795

5    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
5       Digest: sha256:343ea34611a882700bf162ad929c42259446088d9933991addc64bc90af8f17e
5  Mfst Length: 1780
5     Platform:
5           -      OS: linux
5           - OS Vers: 
5           - OS Feat: []
5           -    Arch: ppc64le
5           - Variant: 
5           - Feature: 
5     # Layers: 7
         layer 1: digest = sha256:a5561821dba4ceb47be1d2f5f108a24b391df9d6a3a764d2c04ea8ac29410625
         layer 2: digest = sha256:39c4390b614987e9576d36928eb4d12a6d5e160fb83b5095198efd400b6b8c5b
         layer 3: digest = sha256:d14c381984db757a959e0f35eb170b4863f5c5e8570702bb7529f4f88998f73a
         layer 4: digest = sha256:90d23acf5ff39880a4a787a1d41e6bc4b204a7e01fc2e60183892c20aa60c5c4
         layer 5: digest = sha256:77e6d79cd73f74c4071006e03344deccab113906c139d8755461d4fd72604b1e
         layer 6: digest = sha256:e3649c2d25a8bf9910fe5f9f940924ef00277ca9a003c5fcaaeb487f116def62
         layer 7: digest = sha256:67e875c10908fb1f66c807700e155e51ccc053f596e93f0d0f1aff26240aa46f

6    Mfst Type: application/vnd.docker.distribution.manifest.v2+json
6       Digest: sha256:3c121d29d5f0c2bb4fdc54fa2e206bb9d6a4b751e29de571b4ef5af7b80c20f6
6  Mfst Length: 1780
6     Platform:
6           -      OS: linux
6           - OS Vers: 
6           - OS Feat: []
6           -    Arch: s390x
6           - Variant: 
6           - Feature: 
6     # Layers: 7
         layer 1: digest = sha256:29420dd727d39cbedfb85562111f49e24b0b96adda04de4663d2099fbbf4f993
         layer 2: digest = sha256:0eb60e029192948c200dc12526bcb19c36747de3cedae3822c3009d6250b95b9
         layer 3: digest = sha256:f2cee80bd07440bd13eb077947699252cbfba9b263af2a43bd94f31378a5c35c
         layer 4: digest = sha256:b628a0b163be3123753102d5269918189e0ff4f2cecc25b8ebb685a7e292435e
         layer 5: digest = sha256:9c67f623c538f560836daac0ea7ab1ffa883c09b87ae7b426fed7e4b828e466b
         layer 6: digest = sha256:0bf51bf880e17e555e66c3af906cd603deae0af6095d1816ffb364b9d16e426d
         layer 7: digest = sha256:16c1cb76b65df18dfffd6e1f23e843c3ddebd950f5e8ea24fd98be6c8b5997ad

(which was built by combining amd64/httpd:latest, arm32v7/httpd:latest, etc)

The info for busybox:latest is likely even better, since it'll have several arm variants:

$ manifest-tool inspect trollin/busybox:latest | grep -A7 'Platform:'
1     Platform:
1           -      OS: linux
1           - OS Vers: 
1           - OS Feat: []
1           -    Arch: amd64
1           - Variant: 
1           - Feature: 
1     # Layers: 1
--
2     Platform:
2           -      OS: linux
2           - OS Vers: 
2           - OS Feat: []
2           -    Arch: arm
2           - Variant: v5
2           - Feature: 
2     # Layers: 1
--
3     Platform:
3           -      OS: linux
3           - OS Vers: 
3           - OS Feat: []
3           -    Arch: arm
3           - Variant: v6
3           - Feature: 
3     # Layers: 1
--
4     Platform:
4           -      OS: linux
4           - OS Vers: 
4           - OS Feat: []
4           -    Arch: arm
4           - Variant: v7
4           - Feature: 
4     # Layers: 1
--
5     Platform:
5           -      OS: linux
5           - OS Vers: 
5           - OS Feat: []
5           -    Arch: arm64
5           - Variant: v8
5           - Feature: 
5     # Layers: 1
--
6     Platform:
6           -      OS: linux
6           - OS Vers: 
6           - OS Feat: []
6           -    Arch: 386
6           - Variant: 
6           - Feature: 
6     # Layers: 1
--
7     Platform:
7           -      OS: linux
7           - OS Vers: 
7           - OS Feat: []
7           -    Arch: ppc64le
7           - Variant: 
7           - Feature: 
7     # Layers: 1
--
8     Platform:
8           -      OS: linux
8           - OS Vers: 
8           - OS Feat: []
8           -    Arch: s390x
8           - Variant: 
8           - Feature: 
8     # Layers: 1

@tianon
Copy link
Member

tianon commented Jul 5, 2017

For building the individual architecture images, I'm not doing anything special or custom beyond docker build, docker tag, docker push.

@stevvooe
Copy link
Contributor

stevvooe commented Jul 6, 2017

Indeed -- I'm 100% with you on GOARCH being insufficient

For arm, I should have qualified. All other architectures have sufficient information.

We'll either need this on the manifest or all the way down on the config (not sure if we really want to take that on). Seems, as is, there is no way to resolve the arm properties from the config itself and that the contents of Config.Architecture is generally dubious (at least for cross-builds).

Do we need to file a bug with OCI image-spec?

@stevvooe
Copy link
Contributor

stevvooe commented Jul 7, 2017

@tianon ^

As far as a solution, I think we should remove the automatic constraint on architecture unless reporting on a multi-platform image. That will allow swarm to meet the image matching use case for windows/linux.

@tianon
Copy link
Member

tianon commented Jul 7, 2017

I think that makes sense, especially since if it's not a manifest list, the notion of it being "multiarch" is kind of bogus anyhow (since the arch information is likely also bogus, or at least misleading). The OS field might be accurate, but it's hard to say in that case. These architecture constraints can be directly added by hand, right?

As for filing a bug with the image-spec, I think that makes sense, but I didn't want to try and rock that boat pre-1.0 (since as shown by the commit I linked, it's a real old boat), and hadn't looked to see whether the image config itself was covered by the spec (but it makes sense for it to be). 😇

@tianon
Copy link
Member

tianon commented Jul 7, 2017

(I have to confess that I haven't tried building FROM scratch on Windows to see if I could cross-build a Linux image that way. 😅)

@stevvooe
Copy link
Contributor

stevvooe commented Jul 7, 2017

@tianon The image config is covered by the spec, unfortunately. For the cases where cross builds are happening, the image will have to be modified post-build. In the case where we just want to route a workload a host that can run that image, we likely don't have enough information, as that will have to be wrapped in a manifest list (index) to provide the actual platform information in the ARM case.

@broersa
Copy link

broersa commented Jul 7, 2017

I ran into the same issue today. I run a manager on amd64 and have a node which is armv6l. It's a raspberrypi model b+. I can run the image on the node itself, but when I want to run it as a swarm service it says:
"Message": "no suitable node (unsupported platform on 4 nodes)"

Btw. I also have two amd64 nodes.

It seems it can't map the arm image on armv6l in the swarm scheduler.

@StefanScherer
Copy link

The Scaleway C1 machine shows armv7l, as well as Raspberry Pi 2/3
Raspberry Pi 0/1 show armv6l
Docker images have arm as architecture.

So probably the platformEqual() should have the same normalization as for x86_64 -> amd64:

	switch imgPlatform.Architecture {
	case "x86_64":
		imgPlatform.Architecture = "amd64"
	case "armv6l", "armv7l":
		imgPlatform.Architecture = "arm"
	case "aarch64":
		imgPlatform.Architecture = "arm64"			
	}
	switch nodePlatform.Architecture {
	case "x86_64":
		nodePlatform.Architecture = "amd64"
	case "armv6l", "armv7l":
		nodePlatform.Architecture = "arm"
	case "aarch64":
		nodePlatform.Architecture = "arm64"			
	}

Something like this?

@stevvooe
Copy link
Contributor

@alexellis I agree. This is not a good situation and I am worried that having these constraints will create mutual incompatibilities.

Could you see if moby/moby#34021 will work?

@Toshik
Copy link

Toshik commented Aug 20, 2017

I have all nodes Orange Pi PC with platform linux armv7l and have to use --no-resolve-image, otherwise swarm services are in constant pending state.

Docker: 17.06.1-ce

@alexellis
Copy link

alexellis commented Aug 20, 2017

@stevvooe the PR by @nishanttotla fixes the issue - I've spent the whole day testing it... :-/

Is the Docker team aware that fast ARMv7 units are available on Scaleway's infrastructure for next to nothing? It could help with future CI and testing.

moby/moby#34021 (comment)

@Toshik I don't believe your work-around is good for stacks?

@Toshik
Copy link

Toshik commented Aug 20, 2017

@alexellis, no, it is ugly workaround. And I had to recreate all services with just docker service create command :(

@alexellis
Copy link

Just take this PR it fixes everything.

seemethere added a commit to seemethere/docker-install that referenced this issue Aug 29, 2017
Reverts raspbian changes back to https://apt.dockerproject.org, due to
issues with swarm in the newest release 17.07.0-ce related to issue
moby/swarmkit#2294.

This will stay in place for raspbian until that issue is resolved.

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
seemethere added a commit to seemethere/docker-install that referenced this issue Aug 29, 2017
Reverts raspbian changes back to https://apt.dockerproject.org, due to
issues with swarm in the newest release 17.07.0-ce related to issue
moby/swarmkit#2294.

This will stay in place for raspbian until that issue is resolved.

Signed-off-by: Eli Uriegas <eli.uriegas@docker.com>
@nishanttotla
Copy link
Contributor

Closing since moby/moby#34021 was merged. A longer term resolution of ARM variants will take time, but it is no longer the case that tasks don't run, so we can consider this issue resolved.

@6be709c0
Copy link

6be709c0 commented Oct 2, 2017

Same problem on 17.09, have to rollback on 17.06

@nishanttotla
Copy link
Contributor

@MLescaudron this should be fixed now. Can you share the output of docker info on one of your worker nodes that has ARM, and the output of docker service inspect <service name> for the service that fails to run?

@6be709c0
Copy link

6be709c0 commented Oct 2, 2017

Docker info (previously I was in 17.09.2-ce)

Containers: 21
 Running: 11
 Paused: 0
 Stopped: 10
Images: 14
Server Version: 17.06.2-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: vjh2cujrci6sjus9lyhjm87po
 Is Manager: true
 ClusterID: b70lb8nciap52uncs76xnfkfl
 Managers: 1
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Root Rotation In Progress: false
 Node Address: 145.239.13.20
 Manager Addresses:
  145.239.13.20:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-81-generic
Operating System: Ubuntu 16.04.3 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 14.34GiB
Name: ovh-prod-1
ID: TXNV:VBKT:MFJD:4K7W:G3JH:WDB7:LZZF:3SWD:LVQO:6SFP:WOPE:AN7E
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: nelioapp
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

docker service inspect

[
    {
        "ID": "0rvq0tmowj9nfn2gjgp0hzmyd",
        "Version": {
            "Index": 69574
        },
        "CreatedAt": "2017-09-26T22:31:04.605418562Z",
        "UpdatedAt": "2017-10-02T22:13:46.09779364Z",
        "Spec": {
            "Name": "tech_blog",
            "Labels": {
                "com.docker.stack.image": "ghost:alpine",
                "com.docker.stack.namespace": "tech"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "ghost:alpine@sha256:b7c4fcb9e78518ea85fcba80969f445301b38f76833a6668eb268da7b6e7f3a4",
                    "Labels": {
                        "com.docker.stack.namespace": "tech"
                    },
                    "Privileges": {
                        "CredentialSpec": null,
                        "SELinuxContext": null
                    },
                    "Mounts": [
                        {
                            "Type": "bind",
                            "Source": "/home/nelio/tech.nelio.io/",
                            "Target": "/var/lib/ghost/content"
                        }
                    ],
                    "StopGracePeriod": 10000000000,
                    "DNSConfig": {}
                },
                "Resources": {},
                "RestartPolicy": {
                    "Condition": "any",
                    "Delay": 5000000000,
                    "MaxAttempts": 0
                },
                "Placement": {
                    "Constraints": [
                        "node.role == manager"
                    ],
                    "Platforms": [
                        {
                            "Architecture": "amd64",
                            "OS": "linux"
                        }
                    ]
                },
                "Networks": [
                    {
                        "Target": "omgpnj44fbx9y7khp6pxmptq7",
                        "Aliases": [
                            "blog"
                        ]
                    }
                ],
                "LogDriver": {
                    "Options": {
                        "max-size": "50m"
                    }
                },
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "UpdateConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "RollbackConfig": {
                "Parallelism": 1,
                "FailureAction": "pause",
                "Monitor": 5000000000,
                "MaxFailureRatio": 0,
                "Order": "stop-first"
            },
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "PreviousSpec": {
            "Name": "tech_blog",
            "Labels": {
                "com.docker.stack.image": "ghost:alpine",
                "com.docker.stack.namespace": "tech"
            },
            "TaskTemplate": {
                "ContainerSpec": {
                    "Image": "ghost:alpine@sha256:b7c4fcb9e78518ea85fcba80969f445301b38f76833a6668eb268da7b6e7f3a4",
                    "Labels": {
                        "com.docker.stack.namespace": "tech"
                    },
                    "Privileges": {
                        "CredentialSpec": null,
                        "SELinuxContext": null
                    },
                    "Mounts": [
                        {
                            "Type": "bind",
                            "Source": "/home/nelio/tech.nelio.io/",
                            "Target": "/var/lib/ghost/content"
                        }
                    ]
                },
                "Resources": {},
                "Placement": {
                    "Constraints": [
                        "node.role == manager"
                    ],
                    "Platforms": [
                        {
                            "Architecture": "amd64",
                            "OS": "linux"
                        }
                    ]
                },
                "Networks": [
                    {
                        "Target": "omgpnj44fbx9y7khp6pxmptq7",
                        "Aliases": [
                            "blog"
                        ]
                    }
                ],
                "LogDriver": {
                    "Options": {
                        "max-size": "50m"
                    }
                },
                "ForceUpdate": 0,
                "Runtime": "container"
            },
            "Mode": {
                "Replicated": {
                    "Replicas": 1
                }
            },
            "EndpointSpec": {
                "Mode": "vip"
            }
        },
        "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "omgpnj44fbx9y7khp6pxmptq7",
                    "Addr": "10.0.0.5/24"
                }
            ]
        }
    }
]

@nishanttotla
Copy link
Contributor

@MLescaudron your node reports linux/x86_64 and your service placement constraints report linux/amd64. This is a compatible combination, and your service should run. There maybe another reason why it's failing.

@6be709c0
Copy link

6be709c0 commented Oct 2, 2017

Ok, so I have unistall and reinstall docker-ce on each node.
When I launch a service, docker ps return an empty response and the state is on pending mode.
Same problem as @trunet, except don't have the architecture problem

@beatkyo
Copy link

beatkyo commented Oct 3, 2017

Same here with 17.09

Docker info: linux aarch64
Docker inspect task: linux arm64
Docker inspect container: linux arm64

Task status: "State": "pending", "Message": "unsupported platform on 1 node"

@nishanttotla
Copy link
Contributor

nishanttotla commented Oct 3, 2017

It seems that moby/moby#34021 should have excluded more cases in addition to just "arm", like "arm64".

cc @thaJeztah @andrewhsu

@alexellis
Copy link

alexellis commented Oct 9, 2017

This appears to be broken again on ARM64 :-( in particular I have a weird situation where it will work on Dieter's ARM64 RPi image but not on Packet's Qualcomm machine - both are "aarch64" in uname.

Packet:

root@48-cores:~# docker service create --name func_prometheus  luxas/prometheus-arm64:v1.5.2
DEBU[2017-10-09T20:36:21.741818950Z] Calling GET /_ping                           
DEBU[2017-10-09T20:36:21.742930000Z] Calling GET /_ping                           
DEBU[2017-10-09T20:36:21.746669250Z] Calling GET /v1.32/distribution/luxas/prometheus-arm64:v1.5.2/json 
DEBU[2017-10-09T20:36:22.147220750Z] Calling POST /v1.32/services/create          
DEBU[2017-10-09T20:36:22.147412700Z] form data: {"EndpointSpec":{"Mode":"vip"},"Labels":{},"Mode":{"Replicated":{}},"Name":"func_prometheus","TaskTemplate":{"ContainerSpec":{"DNSConfig":{},"Image":"luxas/prometheus-arm64:v1.5.2@sha256:7011cf4a94d350cc6719b7a87eaa2cd7cbcc19bcfcf081eae4b9b6ae5f9e00d0"},"ForceUpdate":0,"Placement":{"Platforms":[{"Architecture":"amd64","OS":"linux"}]},"Resources":{"Limits":{},"Reservations":{}}}} 
DEBU[2017-10-09T20:36:22.161471000Z] Service hzxpjiytu97jfz931oduom8o8 was scaled up from 0 to 1 instances  module=node node.id=xl2vxso5gh504rosy2u9nyqkc
hzxpjiytu97jfz931oduom8o8
Since --detach=false was not specified, tasks will be created in the background.
In a future release, --detach=false will become the default.
root@48-cores:~# DEBU[2017-10-09T20:36:22.246894100Z] no suitable node available for task           module=node node.id=xl2vxso5gh504rosy2u9nyqkc task.id=yj0vos4kyogoklebm3dvw50xb
DEBU[2017-10-09T20:36:22.246938550Z] no suitable node available for task           module=node node.id=xl2vxso5gh504rosy2u9nyqkc task.id=syppv5tnsr77hzst6tahy0imq
DEBU[2017-10-09T20:36:22.336576750Z] no suitable node available for task           module=node node.id=xl2vxso5gh504rosy2u9nyqkc task.id=syppv5tnsr77hzst6tahy0imq
DEBU[2017-10-09T20:36:22.336625100Z] no suitable node available for task           module=node node.id=xl2vxso5gh504rosy2u9nyqkc task.id=yj0vos4kyogoklebm3dvw50xb
docker version
DEBU[2017-10-09T20:36:36.377204000Z] Calling GET /_ping                           
DEBU[2017-10-09T20:36:36.378193650Z] Calling GET /v1.32/version                   
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:10 2017
 OS/Arch:      linux/arm64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:38:22 2017
 OS/Arch:      linux/arm64
 Experimental: false
root@48-cores:~# DEBU[0173] containerd: process exited                    id=0574e809eb8b8787684c3e76649466cdc7ea05c1d18d56be032fbf2c91fa9a5d pid=init status=255 systemPid=7790
DEBU[2017-10-09T20:38:15.945790550Z] libcontainerd: received containerd event: &types.Event{Type:"exit", Id:"0574e809eb8b8787684c3e76649466cdc7ea05c1d18d56be032fbf2c91fa9a5d", Status:0xff, Pid:"init", Timestamp:(*timestamp.Timestamp)(0x442166c150)} 
DEBU[0173] containerd: process exited                    id=3a80448475e5b29e4a7ecb8b1dd0f11b1baaffb731871a6d02386be50fd75ca2 pid=init status=255 systemPid=8386
DEBU[2017-10-09T20:38:15.965611750Z] libcontainerd: received containerd event: &types.Event{Type:"exit", Id:"3a80448475e5b29e4a7ecb8b1dd0f11b1baaffb731871a6d02386be50fd75ca2", Status:0xff, Pid:"init", Timestamp:(*timestamp.Timestamp)(0x442166c410)} 

DEBU[2017-10-09T20:36:36.378193650Z] Calling GET /v1.32/version                   
Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:40:10 2017
 OS/Arch:      linux/arm64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:38:22 2017
 OS/Arch:      linux/arm64
 Experimental: false```

@stevvooe
Copy link
Contributor

stevvooe commented Oct 9, 2017

@alexellis Looking at the patch that should have fixed this, it seems like this case is not covered. Although, it seems like the placement constraint clearly reads amd64, from the log message.

@vielmetti
Copy link

OK, there's something funny going on here.

ed$ docker run --rm mplatform/mquery luxas/prometheus-arm64:v1.5.2
Manifest List: No
 Supports: amd64/linux

It looks like @luxas has a prometheus-arm64 image that advertises that it supports amd64/linux.

@stevvooe
Copy link
Contributor

stevvooe commented Oct 9, 2017

Looking at the configuration, the architecture is reported as amd64:

$ ctr fetch-object docker.io/luxas/prometheus-arm64@sha256:7663b9e02bac80a6b5e48ffb0602c2d515261dc2afcf10cb819fc563b8bc4126 | jq .
INFO[0000] resolving                                     ref="docker.io/luxas/prometheus-arm64@sha256:7663b9e02bac80a6b5e48ffb0602c2d515261dc2afcf10cb819fc563b8bc4126"
INFO[0001] fetching                                      ref="docker.io/luxas/prometheus-arm64@sha256:7663b9e02bac80a6b5e48ffb0602c2d515261dc2afcf10cb819fc563b8bc4126"
{
  "architecture": "amd64",
...

Were these "cross-built" binaries?

@vielmetti
Copy link

Looks like @luxas has a multiarch image, cf

ed$ docker run --rm mplatform/mquery luxas/prometheus:v2.0.0-rc.0
Manifest List: Yes
Supported platforms:
 - amd64/linux
 - arm/linux (variant: undefined)
 - arm64/linux (variant: undefined)

also

ed$ docker run --rm mplatform/mquery luxas/prometheus:v1.7.1
Manifest List: Yes
Supported platforms:
 - amd64/linux
 - arm/linux (variant: undefined)
 - arm64/linux (variant: undefined)

@luxas
Copy link

luxas commented Oct 10, 2017

@vielmetti @stevvooe Yes, I'm crossbuilding everything. That's by far the best way to do things IMO. Don't know if docker lets me set the os/arch (I don't think so?)

@stevvooe
Copy link
Contributor

@luxas Cross-building won't work unless you re-pack into manifest list or re-write the configuration with the correct architecture.

@vielmetti There is no variant field in the base image config, so it won't be defined. We can add one there if this is a huge problem.

@luxas
Copy link

luxas commented Oct 11, 2017

@luxas Cross-building won't work unless you re-pack into manifest list or re-write the configuration with the correct architecture.

@stevvooe Can you point to docs how to mutate OS/arch for a given (non-manifest-list) image? e.g. for luxas/prometheus-arm64

@vielmetti
Copy link

This patch #2411 addresses the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests