Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: consul cluster start fails during auto discovery #3875

Closed
arnodenuijl opened this issue Feb 8, 2018 · 3 comments · Fixed by #3876
Closed

Azure: consul cluster start fails during auto discovery #3875

arnodenuijl opened this issue Feb 8, 2018 · 3 comments · Fixed by #3876
Labels
type/bug Feature does not function as expected type/crash The issue description contains a golang panic and stack trace
Milestone

Comments

@arnodenuijl
Copy link

Description of the Issue (and unexpected/desired result)

Trying to start up consul server agent that is configured to use the azure provider for the rejoin.
The process exits with panic: runtime error: invalid memory address or nil pointer dereference.

I saw an old bug report #3193 that has the same kind of exception. But in that stacktrace it seems to come from consul/command/agent.(*Config).discoverAzureHosts and my exception comes from hashicorp/go-discover/provider/azure.fetchAddrsWithTags.

Reproduction steps

Config file:

{
    "bootstrap": true,
    "server": true,
    "datacenter": "west-europe",
    "data_dir": "data",
    "recursors": ["8.8.8.8",  "8.8.4.4"],
    "ui": true,
    "retry_join": [ "provider=azure tenant_id=XXXXXX subscription_id=XXXX client_id=XXXX secret_access_key=XXXX tag_name=purpose tag_value=consul-tst"    ]
}

I start it up with
consul.exe agent -config-dir "c:\consul\consul.d" -log-level trace

And then I get

C:\Consul>consul.exe agent -config-file consul.d -log-level trace
bootstrap = true: do not enable unless necessary
==> Starting Consul agent...
==> Consul agent running!
           Version: 'v1.0.5'
           Node ID: '8a2ae81d-99e1-5808-6782-603744b5dbd0'
         Node name: 'consul-tst-001'
        Datacenter: 'west-europe' (Segment: '<all>')
            Server: true (Bootstrap: true)
       Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, DNS: 8600)
      Cluster Addr: 10.188.3.244 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false

==> Log data will now stream in as it occurs:

    2018/02/08 14:11:26 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:8a2ae81d-99e1-5808-6782-603744b5dbd0 Address:10.188.3.244:8300}]
    2018/02/08 14:11:26 [INFO] raft: Node at 10.188.3.244:8300 [Follower] entering Follower state (Leader: "")
    2018/02/08 14:11:26 [INFO] serf: EventMemberJoin: consul-tst-001.west-europe 10.188.3.244
    2018/02/08 14:11:26 [WARN] serf: Failed to re-join any previously known node
    2018/02/08 14:11:26 [INFO] serf: EventMemberJoin: consul-tst-001 10.188.3.244
    2018/02/08 14:11:26 [INFO] agent: Started DNS server 127.0.0.1:8600 (udp)
    2018/02/08 14:11:26 [WARN] serf: Failed to re-join any previously known node
    2018/02/08 14:11:26 [INFO] consul: Adding LAN server consul-tst-001 (Addr: tcp/10.188.3.244:8300) (DC: west-europe)
    2018/02/08 14:11:26 [INFO] consul: Handled member-join event for server "consul-tst-001.west-europe" in area "wan"
    2018/02/08 14:11:26 [INFO] agent: Started DNS server 127.0.0.1:8600 (tcp)
    2018/02/08 14:11:26 [INFO] agent: Started HTTP server on 127.0.0.1:8500 (tcp)
    2018/02/08 14:11:26 [INFO] agent: started state syncer
    2018/02/08 14:11:26 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce os scaleway softlayer
    2018/02/08 14:11:26 [INFO] agent: Joining LAN cluster...
    2018/02/08 14:11:26 [DEBUG] discover: Using provider "azure"
    2018/02/08 14:11:26 [DEBUG] discover-azure: using tag method. tag_name: purpose, tag_value: consul-tst
    2018/02/08 14:11:26 Sending GET https://management.azure.com/subscriptions/8086f7e2-f8ad-4eaa-aded-f43a65a0e13e/providers/Microsoft.Network/networkInterfaces?api-version=2016-09-01
    2018/02/08 14:11:27 GET https://management.azure.com/subscriptions/8084f7e2-f2ad-4eaa-aded-f43a65a0e13e/providers/Microsoft.Network/networkInterfaces?api-version=2016-09-01 received 200 OK
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x0 pc=0xd8d38c]

goroutine 44 [running]:
github.com/hashicorp/consul/vendor/github.com/hashicorp/go-discover/provider/azure.fetchAddrsWithTags(0xc0424a21f0, 0x7, 0xc0424a2210, 0xa, 0x1e20ee0, 0xc042246c80, 0x1e246e0, 0xc042325760, 0x0, 0x0, ...)
        /gopath/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/go-discover/provider/azure/azure_discover.go:115 +0x26c
github.com/hashicorp/consul/vendor/github.com/hashicorp/go-discover/provider/azure.(*Provider).Addrs(0x1fac948, 0xc04238e060, 0xc042212910, 0xc0424c5d20, 0x1, 0x1, 0x9, 0x150beb5)
        /gopath/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/go-discover/provider/azure/azure_discover.go:81 +0xfbf
github.com/hashicorp/consul/vendor/github.com/hashicorp/go-discover.(*Discover).Addrs(0xc042325640, 0xc042130000, 0xe3, 0xc042212910, 0x1, 0x1, 0x37, 0x0, 0x3)
        /gopath/src/github.com/hashicorp/consul/vendor/github.com/hashicorp/go-discover/discover.go:123 +0x2e6
github.com/hashicorp/consul/agent.(*retryJoiner).retryJoin(0xc042631f80, 0xc04214bcc0, 0x0)
        /gopath/src/github.com/hashicorp/consul/agent/retry_join.go:81 +0x40d
github.com/hashicorp/consul/agent.(*Agent).retryJoinLAN(0xc042326000)
        /gopath/src/github.com/hashicorp/consul/agent/retry_join.go:21 +0x108
created by github.com/hashicorp/consul/agent.(*Agent).Start
        /gopath/src/github.com/hashicorp/consul/agent/agent.go:369 +0x866

consul version for both Client and Server

Server: Consul v1.0.5

consul info for both Client and Server

Client:

no consul info since it won't start

Operating system and Environment details

Windows server 2016 datacenter on azure

I hope this rings a bell somewhere. If I can help in providing more info or testing something i'd be happy to do so.

kind regards,

Arno den Uijl

@arnodenuijl
Copy link
Author

It seems to go wrong somewhere around https://github.com/hashicorp/go-discover/blob/40e569ec58fc0a103cbafe3032078c5fe9175495/provider/azure/azure_discover.go#L117

I installed Go and i'm now able to build the discover cmd tool.
If I insert the following code before line 117

for key, value := range *v.Tags{
    fmt.Println("Key:", key, "Value:", *value)
}

i get Key: environment Value: tst

And if i print tv

tv := (*v.Tags)[tagName] // *string`
l.Printf("tv: %s", tv)

i get tv: %!s(*string=<nil>)

I don't know if this helps because i never programmed anything in Go. But I hope a trained eye sees something in it :-)

@preetapan
Copy link
Contributor

preetapan commented Feb 8, 2018

@arnodenuijl Thanks for reporting this and investigating it. This looks like a legit bug in this if block where it enters the if block even if tf is nil and attempts to dereference the value.

Will discuss internally how we want to patch this for the latest Consul release, in the meantime could you use Consul 1.0.2 as a workaround? This should work for you unless you rely on the feature brought in by fb31d0e which adds support for Azure VM scale sets. If you don't need that feature, using Consul 1.0.2 should work for now.

@preetapan preetapan added the type/bug Feature does not function as expected label Feb 8, 2018
@slackpad slackpad added this to the Next milestone Feb 8, 2018
@slackpad slackpad added the type/crash The issue description contains a golang panic and stack trace label Feb 8, 2018
@arnodenuijl
Copy link
Author

Thanks for the workaround. That will work perfect for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Feature does not function as expected type/crash The issue description contains a golang panic and stack trace
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants