Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop crates.textsearchable_index_col from the export #3612

Merged
merged 1 commit into from
May 14, 2021

Conversation

jtgeibel
Copy link
Member

This column is postgres specific and is normally populated via a
trigger. The trigger is now enabled during the import so that the column
can be dropped from the export.

This addresses part of what was raised in bullet point 2 of #2078. The large readme column remains because there could be people using that data, but the text search column is redundant.

r? @smarnach
cc @kornelski

Copy link
Member

@pietroalbini pietroalbini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes look good!

I tested locally what the impact of this would be for how long it takes to import a database dump, since the trigger would have to be executed every crate we insert. To test that I downloaded today's dump, removed version_downloads as that takes ages and is not relevant and I made a copy of the dump with the changes of this PR (enabling the trigger and removing the column from the dump):

  • Status quo: 42.3 seconds (176 MB crates.csv)
  • With this PR: 56.7 seconds (95 MB crates.csv)

So, with this importing the dump is 34% slower. We're removing 46% of the data in crates.csv though, which will definitely help download size and everyone who parses the csvs manually instead of importing them in PostgreSQL.

Personally I'd say the tradeoff is worth it.

src/tasks/dump_db/dump-import.sql.hbs Outdated Show resolved Hide resolved
This column is postgres specific and is normally populated via a
trigger. The trigger is now enabled during the import so that the column
can be dropped from the export.
@jtgeibel jtgeibel force-pushed the drop-tsvector-from-db-export branch from 0e1a3dc to 8fb7d30 Compare May 14, 2021 02:49
@jtgeibel
Copy link
Member Author

Thanks for the analysis of the space/time tradeoff. I've pushed a new commit with your consistency suggestion.

@pietroalbini
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented May 14, 2021

📌 Commit 8fb7d30 has been approved by pietroalbini

@bors
Copy link
Contributor

bors commented May 14, 2021

⌛ Testing commit 8fb7d30 with merge e421caa...

@bors
Copy link
Contributor

bors commented May 14, 2021

☀️ Test successful - checks-actions
Approved by: pietroalbini
Pushing e421caa to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants