-
Notifications
You must be signed in to change notification settings - Fork 639
Change the default sorting on search to something less expensive #1809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
bors
merged 1 commit into
rust-lang:master
from
sgrif:sg-change-default-search-ordering
Sep 11, 2019
Merged
Change the default sorting on search to something less expensive #1809
bors
merged 1 commit into
rust-lang:master
from
sgrif:sg-change-default-search-ordering
Sep 11, 2019
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
d8a7e76
to
dfe3faa
Compare
Looks good to me. |
☔ The latest upstream changes (presumably #1807) made this pull request unmergeable. Please resolve the merge conflicts. |
f01fd19
to
0738c75
Compare
This changes the behavior of `/api/v1/crates` if no sort option is provided. This does not change the behavior of the website when accessed through a browser, as the Ember frontend always specifies a sort field. The query run by search when sorting by recent downloads currently accounts for nearly 40% of our total DB load. It takes ~60ms on average, which means it's extremely high load. Our logs for traffic to this endpoint with an explicit `sort=recent-downloads` don't match up with this. We can fix this by making the query faster, but this has drawbacks that were determined not to be worth it (see rust-lang#1755). So if we can't decrease execution time, the only other option is to decrease execution count. A significant number of crawlers hit us without specifying any sorting, which should mean they don't care about ordering. I'm sure there's someone out there who is relying on this behavior, but most crawlers I've seen don't specify ordering, and none of them care about the order they get the results in since they're crawling the whole registry anyway. What's important is that this lowers the execution time from ~60ms to ~12ms. And sorting on recent_downloads will grow linearly with the table size, while this can be done on an index alone (which should be log(log(n)) IIRC). >90% of the query is spent on getting the count, which will be going away soon as we change pagination strategies, bringing us to sub-millisecond execution. I do feel strongly that we either need to make this change, or accept the drawbacks in rust-lang#1755, as this will soon become a scaling issue for us.
0738c75
to
75f1b01
Compare
📌 Commit 75f1b01 has been approved by |
bors
added a commit
that referenced
this pull request
Sep 11, 2019
…ibel Change the default sorting on search to something less expensive This changes the behavior of `/api/v1/crates` if no sort option is provided. This does not change the behavior of the website when accessed through a browser, as the Ember frontend always specifies a sort field. The query run by search when sorting by recent downloads currently accounts for nearly 40% of our total DB load. It takes ~60ms on average, which means it's extremely high load. Our logs for traffic to this endpoint with an explicit `sort=recent-downloads` don't match up with this. We can fix this by making the query faster, but this has drawbacks that were determined not to be worth it (see #1755). So if we can't decrease execution time, the only other option is to decrease execution count. A significant number of crawlers hit us without specifying any sorting, which should mean they don't care about ordering. I'm sure there's someone out there who is relying on this behavior, but most crawlers I've seen don't specify ordering, and none of them care about the order they get the results in since they're crawling the whole registry anyway. What's important is that this lowers the execution time from ~60ms to ~12ms. And sorting on recent_downloads will grow linearly with the table size, while this can be done on an index alone (which should be log(log(n)) IIRC). >90% of the query is spent on getting the count, which will be going away soon as we change pagination strategies, bringing us to sub-millisecond execution. I do feel strongly that we either need to make this change, or accept the drawbacks in #1755, as this will soon become a scaling issue for us. r? @jtgeibel
☀️ Test successful - checks-travis |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This changes the behavior of
/api/v1/crates
if no sort option isprovided. This does not change the behavior of the website when accessed
through a browser, as the Ember frontend always specifies a sort field.
The query run by search when sorting by recent downloads currently
accounts for nearly 40% of our total DB load. It takes ~60ms on average,
which means it's extremely high load. Our logs for traffic to this
endpoint with an explicit
sort=recent-downloads
don't match up withthis. We can fix this by making the query faster, but this has drawbacks
that were determined not to be worth it (see #1755). So if we can't
decrease execution time, the only other option is to decrease execution
count.
A significant number of crawlers hit us without specifying any sorting,
which should mean they don't care about ordering. I'm sure there's
someone out there who is relying on this behavior, but most crawlers
I've seen don't specify ordering, and none of them care about the order
they get the results in since they're crawling the whole registry
anyway.
What's important is that this lowers the execution time from ~60ms to
~12ms. And sorting on recent_downloads will grow linearly with the table
size, while this can be done on an index alone (which should be
log(log(n)) IIRC). >90% of the query is spent on getting the count,
which will be going away soon as we change pagination strategies,
bringing us to sub-millisecond execution.
I do feel strongly that we either need to make this change, or accept
the drawbacks in #1755, as this will soon become a scaling issue for us.
r? @jtgeibel