-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Look into analytics discrepancy on 2014-11-02 #147
Comments
So this is slightly less of an issue than I first thought: All the data is there, there's just 2 points showing up for the Sunday date in the "Filter Logs" view (the real metrics for Sunday is the higher point to the left). The underlying issues appears to be a bug in ElasticSearch's date histogram aggregations that leads to an extra "0" result for a day when daylight savings time rolls around. It seems related to one of these two issues: elastic/elasticsearch#8209 There are ways we could work around this on the display-side of things, but since the data actually is present, this doesn't seem like a huge deal, and I'm somewhat inclined to wait until ElasticSearch fixes this on their end. |
Elasticsearch appears to have fixed things on their end, and I believe this will be fixed by the Elasticsearch 1.3.6 or 1.4.1 updates (released Nov. 26). However, since this issues isn't super critical (the data's all there, it just adds this extra "0" result, and we're now further away from this DST oddity), I'll probably wait to do the ElasticSearch upgrade. There are some other new stuff in the pipeline that will also require lower-level server upgrades, so I'll probably wait until all that goes live and we can test all the upgraded components together (which I think should happen in the month-ish timeframe and definitely before we get hit by DST again in March). |
Hm, I thought the ElasticSearch 1.4.2 upgrade would fix this, but apparently not. It seems to have changed the bucketing behavior slightly (the first Sunday Nov 2 bucket I think is now for the hits from midnight-2AM, rather than just reporting 0), but there are still two buckets for that date. I believe there may still be issues on ElasticSearch's end related to DST (it may be related to our specific use of Since I don't think this is a super-critical issue, just a little odd and annoying twice a year, I'm going to remove this from the milestone and hope that a future ElasticSearch update more completely fixes this. But if anyone feels this is more important, let us know, and there are probably workarounds we could do on our end. |
This underlying bug in Elasticsearch got fixed by our recent upgrade to Elasticsearch 1.5. But let's add some tests to ensure we continue to get the expected behavior around daylight savings time. See 18F/api.data.gov#147 This also uncovered a slight issue, in that the API Drilldown graphs weren't taking into account time zones, so the daily totals might have appeared different than "Filter Logs" charts (since the hours for each day was shifted around).
It turns out this was fixed last week when we upgraded to ElasticSearch 1.5. To ensure this doesn't unexpectedly change in future ElasticSearch upgrades, I've added more specific tests surrounding the beginning and ending of daylight savings time in NREL/api-umbrella-web@981e31d |
There are currently no logs showing up for this past Sunday in the "Filter Logs" view:
Oddly, there are hits showing up in the "API Drilldown" view:
So there's either something weird going on with the "Filter Logs" query, or we are missing data. Some current suspicions:
The text was updated successfully, but these errors were encountered: