-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop crates.textsearchable_index_col from the export #3612
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes look good!
I tested locally what the impact of this would be for how long it takes to import a database dump, since the trigger would have to be executed every crate we insert. To test that I downloaded today's dump, removed version_downloads
as that takes ages and is not relevant and I made a copy of the dump with the changes of this PR (enabling the trigger and removing the column from the dump):
- Status quo: 42.3 seconds (176 MB
crates.csv
) - With this PR: 56.7 seconds (95 MB
crates.csv
)
So, with this importing the dump is 34% slower. We're removing 46% of the data in crates.csv
though, which will definitely help download size and everyone who parses the csv
s manually instead of importing them in PostgreSQL.
Personally I'd say the tradeoff is worth it.
This column is postgres specific and is normally populated via a trigger. The trigger is now enabled during the import so that the column can be dropped from the export.
0e1a3dc
to
8fb7d30
Compare
Thanks for the analysis of the space/time tradeoff. I've pushed a new commit with your consistency suggestion. |
@bors r+ |
📌 Commit 8fb7d30 has been approved by |
☀️ Test successful - checks-actions |
This column is postgres specific and is normally populated via a
trigger. The trigger is now enabled during the import so that the column
can be dropped from the export.
This addresses part of what was raised in bullet point 2 of #2078. The large readme column remains because there could be people using that data, but the text search column is redundant.
r? @smarnach
cc @kornelski