Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(reader): Add ClickHouse HTTPReader #819
feat(reader): Add ClickHouse HTTPReader #819
Changes from 6 commits
7d1949d
60a231f
8151f95
1d38b33
c4ead7c
d264c37
88d615a
2cc0846
ecbad6c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a new condition we did not have on native?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of.
The settings passed here are supposed to be closer to "session settings" (
client_settings
for theClickhousePool
constructor) than they are to "query settings", but we don't use ClickHouse HTTP sessions here due to not wanting to deal with session timeouts, etc. I needed to be some way to allow constant (session-like) settings here to support things likeoutput_format_json_quote_64bit_integers
without leaking that intoraw_query
. This is more of a manifestation ofquery_id
being passed as a setting, rather than some other way (e.g. a header.) I don't know how the native driver here would deal withquery_id
, such as whether or not it considers that a valid setting, or if that's an implementation detail of the way settings are implemented in the HTTP interface. (I don't know what it would do ifquery_id
was provided in both thesettings
mapping as well as thequery_id
keyword argument toexecute
.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment I guess is a bit misleading.
FORMAT JSON
doesn't implicitly addWITH TOTALS
or anything, but it does already split out thetotals
row from thedata
rows so that the caller doesn't need to. When the AST is available everywhere, it'd be better to removewith_totals
from this function signature and just use the query attribute to decide whether or not the totals row should exist or not for the native reader.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of my least favorite things about mypy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be useful to set up retries for network connection errors. Also what's the default timeout we would have here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will look into that.
Not sure. Will investigate tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context on how we handled the retries and timeout in the queries from sentry to snuba with the reasoning around retries if it helps
getsentry/sentry@6d54866
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started writing a comment about how on second thought, I didn't think we should just blindly retry on errors without a reason to, but then I checked the distinction between
execute
andexecute_robust
inClickhousePool
and I guess we retry there as well.This basically falls through to to
socket.getdefaulttimeout
:I guess this should be able to be parameterized (the writer also has this issue.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the retries, the difference between execute_robust and using the default Retry object of urllib3 is that execute_robust also waits by default 1 second (besides being more flexible on the type of errors).
I would say we should always retry (a given amount of times) for errors when establishing the connection (NetworkError, SocketTimeout, etc.) That covers the transient network issues that we run into. In these case you will get the Exception from urllib.
Whether to retry on ServerError that is debatable. Here (this is only a read path so idempotent) it is not dangerous but it puts useless load to the DB if we do that on errors that cannot be transient. I am not sure we have a clean way to make this distinction on clickhouse ServerErrors, unless the HTTP reader clickhosue used HTTP 5xx for actual server errors and HTTP 4xx for query errors. If it does not we would have to selectively blacklist/whitelist error codes I guess.
I'd be ok being conservative and not retry at all for these cases. (Sentry retries them by the way).
I would certainly not retry on ReadTimeout errors.
How does it sound as a strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the Retry object from urllib3 is already capable of making the distinction between network errors, read errors and redirect errors so you do not need to implement the logic in your code to do network connection retries:
https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my original comment was unclear — I meant that i was surprised that
execute
also had some number of baked in retries. (That doesn't really change anything about your comment, though.) It makes some more sense why after this conversation.In my opinion, we should try and avoid implicit retries of application operations as much as possible, and allow (well, not so much allow as require) the caller to retry when necessary. I don't think a one-size-fits-all solution in the reader (or writer, or connection pool, etc.) is a good idea.
Like you mentioned, I do think it makes sense to always retry on transport level errors (connection timeouts.) I don't think of this so much as the responsibility of the
Reader
as the transport itself (specifically the connection pool, native or HTTP) so it probably makes sense to just pass the instantiated pool to the constructor, rather than constructing it in the reader itself.This also is similar to some of the conversations around #844 around connection sharing. It'd be unnecessarily wasteful to create a new connection pool for every reader if they are all communicating with the same cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be totally ok if we decided to retry only on transport errors. I am not a big fan of baking application specific retries in the Reader (or lower) layer either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use the AST for this but that would have meant either
AstClickhouseQuery
constructor so that it would be more like a dataclass, independent of theQuery
object entirelyBoth of those are a rabbit holes I have no interest in falling down into right now. We should do the second one eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it is fine not to use the AST to write a unit test at this level, anyway if you want to use it, you don't need to depend on dataset to build a
Query
object and transform it into a Clickhosue query since theQuery
object depends on a DataSource (which is the abstract class above TableStorage and can be created from scratch without all the heavy weight of the dataset.Example:
If you just want a Query object starting from the AST you can do it this way
https://github.com/getsentry/snuba/blob/master/tests/query/parser/test_query.py#L20-L32
(you have to use ASTClickhouseQuery instead of DictClickhouseQuery).