Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sqllogic test hangs (cluster mod + clickhouse handler) #9615

Merged
merged 1 commit into from
Jan 16, 2023

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Jan 15, 2023

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

to avoid sqllogic test hanging in cluster mode, in the clickhouse handler, wraps resultset stream pulling action in the "query-ctx" thread.

how to re-produce: pls see the summary of #9576

while execution queries:

  1. the clickhouse handler uses "global" the tokio runtime (the "tokio-runtime-workers" threads) to pull the resultset datastream (SendableDataBlockStream).

the pulling of SendableDataBlockStream , calls PipelinePullingExecutor::pull_data (for PipelinePullingExecutor)

https://github.com/datafuselabs/databend/blob/3b45a672222b5bc928810dd0b3a64803d18a590d/src/query/service/src/pipelines/executor/pipeline_pulling_executor.rs#L176-L185

note that here the receive is NOT async, if data is not available, the runtime thread might be trapped in this loop.

  1. the FlightExchange also uses "tokio-runtime-worker" threads to pulling data from flight rpc, and forwarding data to downstream

https://github.com/datafuselabs/databend/blob/750852820100579243172a507d8d6e455080a768/src/query/service/src/api/rpc/flight_client.rs#L131-L147

if rt threads are trapped in 1, waiting for data from FlightExchange, and async tasks of FlightExchange are waiting for rt thread to drive them, the execution hangs.

NOTE: seems that not all the threads that named tokio-runtime-threads are trapped in 1 while execution of query hangs. but a) I am not sure if there were other ad-hoc tokio runtimes with default thread names there. b) do not know if work stealing of tokio scheduling helps here


to avoid hanging, the clickhouse handler uses query-ctx thread to pull the data in this PR

Closes #9576

to avoid sqllogic test hanging in cluster mode
@vercel
Copy link

vercel bot commented Jan 15, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Jan 15, 2023 at 1:58PM (UTC)

@mergify mergify bot added the pr-bugfix this PR patches a bug in codebase label Jan 15, 2023
@dantengsky dantengsky marked this pull request as ready for review January 15, 2023 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: sqllogic test hangs (cluster mod)
3 participants