Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA replication not working 1.6beta2 #6192

Closed
tristan-weil opened this issue Jul 22, 2019 · 1 comment
Closed

CA replication not working 1.6beta2 #6192

tristan-weil opened this issue Jul 22, 2019 · 1 comment
Assignees
Milestone

Comments

@tristan-weil
Copy link
Contributor

tristan-weil commented Jul 22, 2019

Overview of the Issue

Consul 1.6beta2

The CA replication from the primary DC to the secondary DC does not work.

ACLs and Intentions are replicated but he Consul cluster in the secondary DC is not able to replicate the CA.

Reproduction Steps

2 clusters of 3 nodes each in 2 different regions (tested in AWS: eu-west-1 and eu-west-3).
ACL, Connect, TLS are enabled.
replication, agent, default and agent_master tokens are set with appropriate policies.

Consul info for both Client and Server

Replication policy:

    acl = "write"

    operator = "write"

    service_prefix "" {
      policy = "read"
      intentions = "read"
    }

Part of the configuration in the secondary DC:

{
    "datacenter": "eu-west-1",
    "primary_datacenter": "eu-west-3",
    "connect": {
        "enabled": true
    },
    "acl": {
        "default_policy": "deny",
        "down_policy": "extend-cache",
        "enable_key_list_policy": true,
        "enable_token_persistence": true,
        "enable_token_replication": true,
        "enabled": true
    }
}

Part of the configuration in the primary DC:

{
    "datacenter": "eu-west-3",
    "primary_datacenter": "eu-west-3",
    "connect": {
        "enabled": true
    },
    "acl": {
        "default_policy": "deny",
        "down_policy": "extend-cache",
        "enable_key_list_policy": true,
        "enable_token_persistence": true,
        "enabled": true
    }
}

Operating system and Environment details

debian 9 on t2.micro

Log Fragments

In the primary DC:

22:05:49 root@ip-10-3-0-13 ~ [0]
> consul members 
Node             Address            Status  Type    Build       Protocol  DC         Segment
ip-10-3-0-13     10.3.0.13:8301     alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-29     10.3.0.29:8301     alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-37     10.3.0.37:8301     alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-4      10.3.0.4:8301      alive   client  1.6.0beta2  2         eu-west-3  <default>
ip-10-3-0-7      10.3.0.7:8301      alive   client  1.6.0beta2  2         eu-west-3  <default>

22:06:03 root@ip-10-3-0-13 ~ [0]
> consul members -wan
Node                    Address         Status  Type    Build       Protocol  DC         Segment
ip-10-1-0-10.eu-west-1  10.1.0.10:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-1-0-22.eu-west-1  10.1.0.22:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-1-0-41.eu-west-1  10.1.0.41:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-3-0-13.eu-west-3  10.3.0.13:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-29.eu-west-3  10.3.0.29:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-37.eu-west-3  10.3.0.37:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>

In the secondary DC:

22:07:37 root@ip-10-1-0-22 ~ [0]
> consul members -wan
Node                    Address         Status  Type    Build       Protocol  DC         Segment
ip-10-1-0-10.eu-west-1  10.1.0.10:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-1-0-22.eu-west-1  10.1.0.22:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-1-0-41.eu-west-1  10.1.0.41:8302  alive   server  1.6.0beta2  2         eu-west-1  <all>
ip-10-3-0-13.eu-west-3  10.3.0.13:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-29.eu-west-3  10.3.0.29:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>
ip-10-3-0-37.eu-west-3  10.3.0.37:8302  alive   server  1.6.0beta2  2         eu-west-3  <all>

Here is the auto-generated CA in the primary DC:

22:07:12 root@ip-10-3-0-13 ~ [0]
> curl -sS http://127.0.0.1:8500/v1/connect/ca/roots | jq .
{
  "ActiveRootID": "9a:21:7a:e0:16:ae:19:58:bc:ca:b5:c4:97:3e:fe:d3:0c:8f:af:8e",
  "TrustDomain": "594d66d8-240c-260c-33d0-71589d845d99.consul",
  "Roots": [
    {
      "ID": "9a:21:7a:e0:16:ae:19:58:bc:ca:b5:c4:97:3e:fe:d3:0c:8f:af:8e",
      "Name": "Consul CA Root Cert",
      "SerialNumber": 7,
      "SigningKeyID": "37:63:3a:33:65:3a:39:38:3a:36:34:3a:30:36:3a:30:36:3a:30:61:3a:34:30:3a:33:61:3a:65:30:3a:64:31:3a:36:39:3a:31:61:3a:31:65:3a:31:38:3a:66:35:3a:37:34:3a:38:32:3a:65:63:3a:38:64:3a:33:65:3a:31:66:3a:32:66:3a:35:62:3a:38:31:3a:30:64:3a:65:33:3a:32:65:3a:38:64:3a:64:32:3a:33:39:3a:66:36",
      "ExternalTrustDomain": "594d66d8-240c-260c-33d0-71589d845d99",
      "NotBefore": "2019-07-22T21:47:11Z",
      "NotAfter": "2029-07-22T21:47:11Z",
      "RootCert": "-----BEGIN CERTIFICATE-----\nMIICWjCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\n...\neWXKTgt8Mvzb5sQlgCeJvekBQk6in29TqJHD/ovf\n-----END CERTIFICATE-----\n",
      "IntermediateCerts": null,
      "Active": true,
      "CreateIndex": 8,
      "ModifyIndex": 8
    }
  ]
}

Here is the auto-generated CA in the secondary DC:

22:08:44 root@ip-10-1-0-22 ~ [0]
> curl -sS http://127.0.0.1:8500/v1/connect/ca/roots | jq .
{
  "ActiveRootID": "ee:75:44:b2:f8:92:06:37:06:cf:55:be:7f:87:d9:93:ee:ad:c8:6e",
  "TrustDomain": "2b56ec58-87f8-dda6-bbf5-b3f3f24fa3ed.consul",
  "Roots": [
    {
      "ID": "ee:75:44:b2:f8:92:06:37:06:cf:55:be:7f:87:d9:93:ee:ad:c8:6e",
      "Name": "Consul CA Root Cert",
      "SerialNumber": 7,
      "SigningKeyID": "39:66:3a:30:38:3a:32:31:3a:64:65:3a:65:30:3a:38:66:3a:61:66:3a:33:66:3a:62:65:3a:63:32:3a:65:64:3a:32:37:3a:64:64:3a:64:64:3a:65:63:3a:36:30:3a:30:65:3a:64:61:3a:33:38:3a:39:30:3a:64:39:3a:64:62:3a:30:64:3a:31:39:3a:35:39:3a:39:62:3a:62:31:3a:33:62:3a:33:39:3a:38:65:3a:65:39:3a:65:34",
      "ExternalTrustDomain": "2b56ec58-87f8-dda6-bbf5-b3f3f24fa3ed",
      "NotBefore": "2019-07-22T21:47:09Z",
      "NotAfter": "2029-07-22T21:47:09Z",
      "RootCert": "-----BEGIN CERTIFICATE-----\nMIICWDCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\n...\nqJRf+hLfFc1SdWq8eiMuyt422i/PSpby05pMnw==\n-----END CERTIFICATE-----\n",
      "IntermediateCerts": null,
      "Active": true,
      "CreateIndex": 8,
      "ModifyIndex": 8
    }
  ]
}

=> as we can see, in the secondary DC, the CA is not replicated

Here is the state of the replication in the secondary DC (it's ok):

22:11:39 root@ip-10-1-0-22 /var/log [0]
> curl -sS http://127.0.0.1:8500/v1/acl/replication | jq .
{
  "Enabled": true,
  "Running": true,
  "SourceDatacenter": "eu-west-3",
  "ReplicationType": "tokens",
  "ReplicatedIndex": 1090,
  "ReplicatedRoleIndex": 1,
  "ReplicatedTokenIndex": 1131,
  "LastSuccess": "2019-07-22T22:10:00Z",
  "LastError": "0001-01-01T00:00:00Z"
}

In the log of the leader in the secondary DC, I have:

Jul 22 22:11:39 ip-10-1-0-22 consul[9365]:     2019/07/22 22:11:39 [ERR] consul: RPC failed to server 10.3.0.13:8300 in DC "eu-west-3": rpc error making call: rpc error making call: Permission denied
Jul 22 22:11:39 ip-10-1-0-22 consul[9365]:     2019/07/22 22:11:39 [ERR] connect: error watching primary datacenter roots: rpc error making call: rpc error making call: Permission denied

I have tested to replace all the tokens with a global-management token: same error.
I have also tested to restart, deactivate/reactive Connect, etc.
I think the problem lies in RPC message sent by the leader of the secondary DC: it does not include the replication token to check the health of the primary cluster.

See the PR #6193

With this PR, the leader in the secondary cluster immediately replicates the CA:

Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] consul: New leader elected: ip-10-1-0-22
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] acl: started ACL policy replication
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] acl: started ACL role replication
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] acl: started ACL token replication
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO]  raft: pipelining replication to peer {Nonvoter 9db86766-a5f7-ad14-bf13-a8399a7df6c1 10.1.0.10:8300}
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] connect: updated root certificates from primary datacenter
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] connect: received new intermediate certificate from primary datacenter
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] connect: initialized secondary datacenter CA with provider "consul"
Jul 22 22:47:53 ip-10-1-0-22 consul[12366]:     2019/07/22 22:47:53 [INFO] replication: started Config Entry replication

And the old CA is replaced with the one from the primary DC:

22:49:36 root@ip-10-1-0-22 /var/log [130]
> curl -sS http://127.0.0.1:8500/v1/connect/ca/roots | jq .
{
  "ActiveRootID": "9a:21:7a:e0:16:ae:19:58:bc:ca:b5:c4:97:3e:fe:d3:0c:8f:af:8e",
  "TrustDomain": "594d66d8-240c-260c-33d0-71589d845d99.consul",
  "Roots": [
    {
      "ID": "9a:21:7a:e0:16:ae:19:58:bc:ca:b5:c4:97:3e:fe:d3:0c:8f:af:8e",
      "Name": "Consul CA Root Cert",
      "SerialNumber": 7,
      "SigningKeyID": "37:63:3a:33:65:3a:39:38:3a:36:34:3a:30:36:3a:30:36:3a:30:61:3a:34:30:3a:33:61:3a:65:30:3a:64:31:3a:36:39:3a:31:61:3a:31:65:3a:31:38:3a:66:35:3a:37:34:3a:38:32:3a:65:63:3a:38:64:3a:33:65:3a:31:66:3a:32:66:3a:35:62:3a:38:31:3a:30:64:3a:65:33:3a:32:65:3a:38:64:3a:64:32:3a:33:39:3a:66:36",
      "ExternalTrustDomain": "594d66d8-240c-260c-33d0-71589d845d99",
      "NotBefore": "2019-07-22T21:47:11Z",
      "NotAfter": "2029-07-22T21:47:11Z",
      "RootCert": "-----BEGIN CERTIFICATE-----\nMIICWjCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\n...\neWXKTgt8Mvzb5sQlgCeJvekBQk6in29TqJHD/ovf\n-----END CERTIFICATE-----\n",
      "IntermediateCerts": null,
      "Active": true,
      "CreateIndex": 691,
      "ModifyIndex": 691
    },
    {
      "ID": "ee:75:44:b2:f8:92:06:37:06:cf:55:be:7f:87:d9:93:ee:ad:c8:6e",
      "Name": "Consul CA Root Cert",
      "SerialNumber": 7,
      "SigningKeyID": "39:66:3a:30:38:3a:32:31:3a:64:65:3a:65:30:3a:38:66:3a:61:66:3a:33:66:3a:62:65:3a:63:32:3a:65:64:3a:32:37:3a:64:64:3a:64:64:3a:65:63:3a:36:30:3a:30:65:3a:64:61:3a:33:38:3a:39:30:3a:64:39:3a:64:62:3a:30:64:3a:31:39:3a:35:39:3a:39:62:3a:62:31:3a:33:62:3a:33:39:3a:38:65:3a:65:39:3a:65:34",
      "ExternalTrustDomain": "2b56ec58-87f8-dda6-bbf5-b3f3f24fa3ed",
      "NotBefore": "2019-07-22T21:47:09Z",
      "NotAfter": "2029-07-22T21:47:09Z",
      "RootCert": "-----BEGIN CERTIFICATE-----\nMIICWDCCAf+gAwIBAgIBBzAKBggqhkjOPQQDAjAWMRQwEgYDVQQDEwtDb25zdWwg\n...\nqJRf+hLfFc1SdWq8eiMuyt422i/PSpby05pMnw==\n-----END CERTIFICATE-----\n",
      "IntermediateCerts": null,
      "Active": false,
      "CreateIndex": 8,
      "ModifyIndex": 691
    }
  ]
}

@mkeeler
Copy link
Member

mkeeler commented Jul 22, 2019

I think you are correct, that ServerHealth RPC would need a token in order to succeed.

I think the better solution however might be to use the information advertised via Serf instead of making RPC requests to all the servers to figure this out. Thats what we do to determine the legacy/new ACL mode. I will be looking into this more tomorrow morning.

Also thank you for the extremely detailed and clear bug report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants