Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV uploads for ClickHouse Cloud #236
CSV uploads for ClickHouse Cloud #236
Changes from all commits
2b26552
7ad64a0
7fb8206
6b8935b
d1c025d
2cec411
a51c32e
847b80f
5554a61
eb3cdc7
490a558
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, select_sequential_consistency just limits which replica you query from, and doesn't affect the performance of the query on that specific replica.
This logic seems like it will break consistency for multi-replica on prem installations, and needlessly disable it on single replica ones. It seems best to just to enable it unconditionally for now?
For any multi-replica installation, enabling it for all connections seems to wipe out most of the benefits of connecting to a distributed installation, but I don't see any way around this besides reworking metabase to support separate pools for different connection flavors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're mistaking the behaviour of
select_sequential_consistency
on CH Cloud (docs here), thinking it's the same as on-premise (the link you shared). We're only enablingselect_sequential_consistency
on CH Cloud, where it's much less of an issue.There's more on SharedMergeTree in this blog post.
And I can't imagine fetching metadata from Keeper can be a bottleneck. I also got this DM from Serge on
select_sequential_consistency
:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
select_sequential_consistency
is correctly enabled for the Cloud type of instance only.For on-premise clusters (in a follow-up PR), we should just set
insert_quorum
equal to the nodes count so that replica sync won't be necessary and will be effectively done during the CSV insert, and it won't affect other read-only operations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than use the default sql-jdbc implementation, I've chosen to use the approach recommended here, to allow for large uploads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not a bottleneck, but making a protocol like
(ps-set! v ps idx)
could remove the reflection oftype
and the O(n) branching ofcondp
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, how do you imagine this being implemented exactly? The tricky part is this multimethod doesn't know the type of each column without using reflection, and values can be
nil
. I'm imagining this:(ps-set! v ps idx)
px-set!
looks up the method to use for theidx
in a volatile map, e.g.setString
etc. If the map contains a method for theidx
, it uses it. Otherwise calculate the method using a similarcondp
expression and save it to the map under the keyidx
, and execute the method.That way we'll only use reflection once for every column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing something, but I was thinking we'd just implement the protocol directly on each of these java types for
v
, eschewing any reflection or map look-up. I think you can just useObject
for the fallback case.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, anyway this seems pretty inconsequential compared to all the slow stuff we're doing on the Metabase side. I think I'd prefer to keep it simple and not change this
condp
for now