Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

importccl: lift computed column check to input converter #55845

Merged
merged 2 commits into from
Oct 27, 2020

Conversation

pbardea
Copy link
Contributor

@pbardea pbardea commented Oct 22, 2020

The current implementation of checking for validating the number of rows
for computed columns for in non-IMPORT INTO backups is inefficient. This
commit moves the check from being performed on every row to only being
performed once per import.

Fixes #55842.

Release note: None

@pbardea pbardea requested review from dt and a team October 22, 2020 03:42
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@nvanbenschoten
Copy link
Member

That was fast! I'm curious if you see a way to get rid of https://github.com/cockroachdb/cockroach/pull/55845/files#diff-e56c3f528820bd6c95bc632c127d4ef1b5ae752a8accb667690a5e07c7c9b25eR137 as well. What are the cases where len(p.expectedColumns) == 0?

@pbardea
Copy link
Contributor Author

pbardea commented Oct 22, 2020

We expect that case when there are no "targetted" columns. In IMPORT INTO my_table(a, b) ... (a, b) would be the targetted columns. I believe we would also expect it to be 0 on non-IMPORT INTO flavours.

#55846 is built on top of this, and I think that should do the trick. I took a couple of quick profiles and it looks like that change got rid of VisibleColumns from the Row() profile as expected.

@pbardea pbardea force-pushed the import-only-into branch 2 times, most recently from 92d82a2 to 555457d Compare October 22, 2020 06:32
@pbardea pbardea changed the title importccl: only support computed columns for IMPORT INTO importccl: lift computed column check to input converter Oct 22, 2020
The current implementation of checking for validating the number of rows
for computed columns for in non-IMPORT INTO backups is inefficient. This
commit moves the check from being performed on every row to only being
performed once per import.

Release note: None
This commit moves the computation of finding the expected number of data
columns to the creation of the input converter, rather than
re-calculating it on every row. The memory footprint of loading all of
the visible columns for every row was noticeable.

Release note (performance improvement): CSV imports should now be
slightly faster.
@pbardea
Copy link
Contributor Author

pbardea commented Oct 27, 2020

TFTR!
bors r=dt

@craig
Copy link
Contributor

craig bot commented Oct 27, 2020

Build succeeded:

@craig craig bot merged commit 30b9f20 into cockroachdb:master Oct 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bulk-io: Immutable.VisibleColumns responsible for over half of heap allocated memory in IMPORT
4 participants