You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
247 notes, 140 questions, 53 wikis (note one more question was shown since Jeanette's screenshot)
Exact discrepancy
A full date range CSV i got showed:
303 notes | 97 questions | 42 wikis
that means we are showing discrepancies of
-56 notes | 42 questions | 11 wikis (where the /tag page has this # MORE than the stats CSV)
Known sources of discrepancy
First, noting that some of the questions are for notes tagged with question:air-quality but which lack air-quality - this accounts for some or all of the 139-97 = 42 questions discrepancy.
Second, the stats pages do not count notes, questions, or wikis which bear tags which have a parent tag (a system we are trying to phase out) of air-quality. The last line of this section of code shows those extra nodes getting included for the /tag/air-quality page.
I was able to find 61 notes and 11 wikis that bear a child tag of air-quality, which has affected this count. That seems to account for the wikis discrepancy.
After accounting for 61 extra notes, we actually have 61 + 56 = 117 notes shown on the CSV which were not shown on the /tag page.
But, according to these lines, we exclude all questions of any kind from this note count. Let's see how that affects the count:
irb(main):035:0> Node.where(status: 1,type: 'note').includes(:revision,:tag).references(:term_data,:node_revisions).where('term_data.name = ? OR term_data.parent = ?','air-quality','air-quality').where('node.nid NOT IN (?)',@qids).size=>247
So, that took us from 365 to 247, if we are including parent tags. That's the number shown on /tags/air-quality.
Without counting parent tags OR questions, we get 206 notes - that's vs. 303 in the CSV.
That's for the same nids collection as we got for the tags page - with parent tags, and excluding questions. Let's try running it without the parent tags, but leaving the questions in...
OK, so the discrepancy seems to be (within an error of 2 notes) that the stats are excluding parent tags and including questions.
Takeaway
I believe this means that we don't need to change any queries, but we should add some of these caveats to the stats pages for those wondering. I can make an FTO once we settle on explanatory text!
Linking this thread to this explanation of questions counts on tag pages: #8246
The text was updated successfully, but these errors were encountered:
jywarren
changed the title
Discrepancy in tag stats CSV download counts and tab labels on /tag/air-quality
Explainable discrepancy in tag stats CSV download counts and tab labels on /tag/air-quality
Oct 20, 2020
The graphs above are stacked, and questions are counted both on their own as well as part of the tally for notes (because they are a form of note).
So the text could be expanded to:
The graphs above are stacked, and questions are counted both on their own as well as part of the tally for notes (because they are a form of note). Additional discrepancies may come from the tag page also listing questions tagged with "question:_____" but lacking the base tag, and also listing notes with only "child tags" of the base tag, in a system we are planning to slowly deprecate.
Jeanette from the PL staff noted a discrepancy - when downloading a CSV and summing notes, questions, and wikis, the totals Jeanette got are:
From /stats: notes = 206; questions = 97; wikis = 42
However this was for a range of: https://publiclab.org/tag/air-quality/stats?utf8=%E2%9C%93&start=01-01-2010&end=14-10-2020&commit=Go
These don't match the tab totals shown at https://publiclab.org/tag/air-quality, of:
247 notes, 140 questions, 53 wikis
(note one more question was shown since Jeanette's screenshot)Exact discrepancy
A full date range CSV i got showed:
303 notes | 97 questions | 42 wikis
that means we are showing discrepancies of
-56 notes | 42 questions | 11 wikis
(where the /tag page has this # MORE than the stats CSV)Known sources of discrepancy
First, noting that some of the questions are for notes tagged with
question:air-quality
but which lackair-quality
- this accounts for some or all of the139-97 = 42
questions discrepancy.Second, the stats pages do not count notes, questions, or wikis which bear tags which have a
parent tag
(a system we are trying to phase out) ofair-quality
. The last line of this section of code shows those extra nodes getting included for the/tag/air-quality
page.I was able to find
61 notes
and11 wikis
that bear achild tag
ofair-quality
, which has affected this count. That seems to account for the wikis discrepancy.After accounting for 61 extra notes, we actually have
61 + 56 = 117
notes shown on the CSV which were not shown on the /tag page.But, according to these lines, we exclude all questions of any kind from this note count. Let's see how that affects the count:
So, that took us from 365 to 247, if we are including parent tags. That's the number shown on
/tags/air-quality
.Without counting parent tags OR questions, we get
206 notes
- that's vs.303
in the CSV.Let's look at where the CSV is being compiled:
plots2/app/models/tag.rb
Lines 216 to 239 in 27a3839
This is a little convoluted, but i traced through it and it seems OK.
Running
Tag.nodes_for_period()
on the whole 10 year span returned 248, which is only 1 off:That's for the same
nids
collection as we got for the tags page - with parent tags, and excluding questions. Let's try running it without the parent tags, but leaving the questions in...OK, so the discrepancy seems to be (within an error of 2 notes) that the stats are excluding parent tags and including questions.
Takeaway
I believe this means that we don't need to change any queries, but we should add some of these caveats to the stats pages for those wondering. I can make an FTO once we settle on explanatory text!
Linking this thread to this explanation of questions counts on tag pages: #8246
The text was updated successfully, but these errors were encountered: