Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume CQ results on request timeouts #3522

Merged
merged 1 commit into from
Aug 14, 2015

Conversation

dim
Copy link
Contributor

@dim dim commented Jul 30, 2015

As a follow-up to #3517, I have been able to isolate the actual issue. When timeouts occur during the execution of a CQ, the result channel is never consumed, which causes a deadlock in the query_executor. Because we are stuck in the executor the mapper is never closed and the read-only transaction is never released, locking all subsequent writes.

@corylanou
Copy link
Contributor

Can you write a test that simulates this easily to show that this fixes it? Also, could we just close the channel instead of draining it? Not sure if that will cause a panic if it is still trying to be written to on the other side or not (I didn't read all the code).

@dim
Copy link
Contributor Author

dim commented Aug 6, 2015 via email

@corylanou
Copy link
Contributor

This service was set up using interfaces to purposefully make it testable. What dependencies specifically are causing the problem for the test? Let me know if I can help. Thanks!

@jwilder
Copy link
Contributor

jwilder commented Aug 13, 2015

@corylanou I think this makes sense. If the PointsWriter times out, the in-flight writes are not currently cancelable and will still hold onto the write locks. PointsWriter kicks off each write in a goroutine but we can't interrupt it to release the lock because it's block down in the bolt shard. The timeout causes the PointsWriter to return and then exit early from the CQ query loop but the the writing goroutine holding the lock will still live on. If we drain the results channel, the writing goroutine will be able to acquire the write and complete.

@dim If you could sign the CLA we can get this and #3517 merged.

@jwilder jwilder self-assigned this Aug 13, 2015
@dim
Copy link
Contributor Author

dim commented Aug 14, 2015

@jwilder done, signed.

jwilder added a commit that referenced this pull request Aug 14, 2015
Test script from #3517 that reproduces a CQ deadlock.  This is related
to #3522 as well.
jwilder added a commit that referenced this pull request Aug 14, 2015
Consume CQ results on request timeouts
@jwilder jwilder merged commit bb6de3b into influxdata:master Aug 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants