Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to get zk lock on Burrow 1.0 #322

Closed
gustavosoares opened this issue Jan 8, 2018 · 13 comments
Closed

failed to get zk lock on Burrow 1.0 #322

gustavosoares opened this issue Jan 8, 2018 · 13 comments
Labels

Comments

@gustavosoares
Copy link

We had a fork from the old version with some custom notifiers and I'm just merging Burrow 1.0 into it. I've left it running overnight locally and when I got back today there were heaps of failed to get zk lock message. See bellow.

Burrow, kafka and zookeeper were running through docker containers.

Is this a known issue?

{"level":"warn","ts":1515445808.989889,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.090661,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.190752,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.2908428,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.3909829,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.49214,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.592896,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.694001,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.795182,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.895586,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445809.9966788,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445810.097111,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445810.197451,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445810.298532,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445810.3990872,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}
{"level":"warn","ts":1515445810.499922,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}

Thanks in advance,
Gus

@toddpalino
Copy link
Contributor

No. I've had no problems with problems with the ZK lock in the master branch (or the release).

@solsson
Copy link

solsson commented Jan 23, 2018

I had the same error in https://hub.docker.com/r/solsson/burrow/ built directly from source. Lots of them actually, something like 10/second for 2.5 hours. I can't see any issues or maintenance with the Kafka cluster around when it started. Could dig more though.

My setup is in Kubernetes, Yolean/kubernetes-kafka#125. The image was built prior to e47ec4c - could the issue there be related?

@solsson
Copy link

solsson commented Jan 23, 2018

Pod restart solved the issue. Burrow seems to be working normally again.

@toddpalino
Copy link
Contributor

Any update on this? Since it's a "tried to acquire lock twice" message, I assume it's a code bug in your fork. Haven't seen this at all in master.

If there's no update, I'll close this in a few days.

@gustavosoares
Copy link
Author

Hi @toddpalino, you may go ahead and close it. I shipped our fork to production yesterday. I'll monitor and if come back I'll let you know. 😄 @solsson reported similar issue tho... unsure if it is a code bug in our fork, but you never know. All I did was adding two new notifiers and enabling ShowAll to true as described in #341, which I'm assuming wouldn't cause this.

@toddpalino
Copy link
Contributor

That shouldn't cause it, but if the ZK logic doesn't match up between the fork and master it could cause a problem.

@blieberman
Copy link

Just observed the same issue in first hours of running burrow in a Docker container built from https://github.com/linkedin/Burrow/archive/v1.0.0.tar.gz. Container restart resolved.

@c-nichols
Copy link

I can reliably reproduce on latest master by disrupting the network for 10-20 seconds.

@cluyihunter
Copy link

Having a similar issue, but a warning instead of error message. Burrow is running and listing consumers & lags in HTTP Endpoints, but no notifications were sent.
Any idea on how to fix this? Nothing is named as test in config.

{"level":"warn","ts":1533153351.2114217,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"strconv.Atoi: parsing "test": invalid syntax"}

@vivekyaji
Copy link

Same issue as mentioned by @cluyihunter . Restarting the burrow instance started the notifier again.

{"level":"warn","ts":1533736850.2858596,"msg":"failed to get zk lock","type":"coordinator","name":"notifier","error":"zk: trying to acquire a lock twice"}

@gleithall
Copy link

We regularly have the same issue as @cluyihunter, with the warning text the same as given by @vivekyaji .

We are running v1.1.0 in a container, with Kafka and ZooKeeper also in containers. Restarting the Burrow container solves the problem. This makes sense because, if I understand correctly, the lock uses an ephemeral znode, so stopping the container will remove the lock znode.

@toddpalino how is the zk lock released? It looks like the manageEvalLoop function in the notifier coordinator is where the warning message comes from. That function creates the lock and locks it, but I cannot see where the lock is unlocked. I'm not very familiar with the code base, so I can easily imagine the unlocking happens somewhere I haven't found.

@harbinzhang
Copy link

The same issue as @cluyihunter, The HTTP endpoint server is working, but no notification sent out.
We are running Burrow from master branch, with Kafka version == 1.1.1.

@alvarolmedo
Copy link
Contributor

Same issue with burrow 1.2.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants