-
-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MBS-13801: Remove limits on tag cloud subqueries #3400
base: master
Are you sure you want to change the base?
Conversation
Makes sense to me. I tested the query on hendrix before and after the patch, and the difference was more like 200ms than 1s, so I'm guessing the PG v12 -> v16 upgrade improved things quite a bit. The commit message seems slightly exaggerated to me though. 😄
When ace41c6 was committed, we only had the tag cloud (no list, and no genres either). Any tags that would've been added/removed from the cloud due to the 200-limit would've been in the smallest font size possible: if you compared them side-by-side it'd be difficult to even figure out all the ones that changed. So "the results were reasonable enough" doesn't seem that dodgy to me at the time the commit was made (ignoring the inaccurate counts on hover, which does seem like an oversight). (BTW, prior to that commit, the tag cloud didn't update at all, and was displaying data that was many years old.) And I was curious what the "bunch of entirely new tags" are, and it's five genres and two tags that would be added/removed. It is a bunch, but since the current query "severely undercounts," it sounded like things would be much worse. 😄
|
Huh. It certainly brought more tags in earlier, such as "music video" :) But I see that one grew enough now that it shows up in the current list.
Sure, my point was that even then we showed the exact counts on hover, so it wasn't just size differences :) That's what I meant with the assumption that we don't show counts being wrong. I'm not trying to say that commit wasn't an improvement over the broken tag cloud, please don't take it as such! :) If the commit message seems over the top to you I can always reword it. |
Yeah, I noticed that one shifted, and things may have been worse before given the right conditions.
I agree, but that was the first wrong assumption ("exact numbers weren't too important because we didn't show them"). I was just talking about the second one ("results were reasonable enough"):
The old commit was basically saying it gave reasonably accurate results for the purposes of displaying a tag cloud, which I still agree with. :) My only complaint with the wording above is that the sequence of "dodgy, bunch of entirely new tags, severely undercounts" make it sound like the cloud will be vastly improved, but in reality it's hard to notice unless you're looking at the list UI and the counts, which I think is what you meant. |
These were introduced in ace41c6 under the assumption exact numbers weren't too important because we didn't show them, and the results were reasonable enough. The first was already a wrong assumption at the time, since we did show the counts on hover in the cloud page. It's even worse now, since we have a list version with counts front and center. The second assumption seems also a bit off on testing without the limits; it brings a few entirely new tags to the list, since the current query undercounts tags that are used a bit on many entity types in comparison with tags used a lot on one entity type only. The original worry why these limits were added (a slow query that got a second or so faster with them) still applies, of course. That said, this query runs once, and then is cached for 24h. It seems acceptable to have one person a day wait a bit longer for their results in exchange for actually correct results when the count differences are in the thousands on the lower ranks of the lists (the top is of course used everywhere and already has the right counts mostly).
7ab39b3
to
81b2a59
Compare
I reworded the commit message to be more chill about it, since it was a bit alarmist in hindsight 😅 |
Fix MBS-13801
Problem
The counts on the tag list / cloud do not adequately represent reality, except for the top genres. That's because each subquery is limited, so any genres not on the top 200 and tags not on the top 50 for any specific entity are just ignored entirely for that entity.
These limits were introduced in ace41c6 under the assumption exact numbers weren't too important because we didn't show them, and the results were reasonable enough.
The first was already a wrong assumption at the time, since we did show the counts on hover in the cloud page. It's even worse now, since we have a list version with counts front and center.
The second assumption seems also a bit off on testing without the limits; it brings a few entirely new tags to the list, since the current query undercounts tags that are used a bit on many entity types in comparison with tags used a lot on one entity type only.
Solution
This just removes the per-subquery limits, leaving only a limit on the full query to return the top 200 genres and 50 tags. The original worry why these limits were added (a slow query that got a second or so faster with them) still applies, of course. That said, this query runs once, and then is cached for 24h. It seems acceptable to have one person a day wait a bit longer for their results in exchange for actually correct results when the count differences are in the thousands on the lower ranks of the lists (the top is of course used everywhere and already has the right counts mostly).
Testing
Manually, checking that the lower ranks of the genres list and especially a big chunk of the tags list change significantly for the better without the limits (the "music video" tag, which was missing and led to the report, was on position 51 before - it jumps around 2k uses and 12 positions with the unabridged data).