-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix missing archived grapher datasets in version tracker #3397
Conversation
Quick links (staging server):
Login: chart-diff: ✅No charts for review.data-diff: ❌ Found differences= Dataset garden/health/latest/global_health_mpox
= Table global_health_mpox
= Dataset garden/who/2024-09-09/flu_test
= Table flu_test
~ Dim country
- - Removed values: 30 / 71625 (0.04%)
date country
2024-09-30 Bhutan
2024-09-30 Indonesia
2024-09-23 Kenya
2024-09-30 Slovakia
2024-09-30 Switzerland
~ Dim date
- - Removed values: 30 / 71625 (0.04%)
country date
Bhutan 2024-09-30
Indonesia 2024-09-30
Kenya 2024-09-23
Slovakia 2024-09-30
Switzerland 2024-09-30
~ Column denomcombined (changed data)
- - Removed values: 30 / 71625 (0.04%)
country date denomcombined
Bhutan 2024-09-30 55
Indonesia 2024-09-30 21
Kenya 2024-09-23 64
Slovakia 2024-09-30 14
Switzerland 2024-09-30 37
~ Changed values: 29 / 71625 (0.04%)
country date denomcombined - denomcombined +
China 2024-09-23 15701 21567
Indonesia 2024-07-08 52 53
Indonesia 2024-08-19 70 63
Malaysia 2024-08-26 930 928
Malaysia 2024-09-23 924 918
~ Column pcnt_poscombined (changed data)
- - Removed values: 30 / 71625 (0.04%)
country date pcnt_poscombined
Bhutan 2024-09-30 9.090909
Indonesia 2024-09-30 28.571428
Kenya 2024-09-23 21.875000
Slovakia 2024-09-30 14.285714
Switzerland 2024-09-30 24.324324
~ Changed values: 31 / 71625 (0.04%)
country date pcnt_poscombined - pcnt_poscombined +
France 2024-06-17 0.427727 0.452887
Indonesia 2024-06-24 25.423729 25.000000
Indonesia 2024-08-12 12.500000 12.195122
Slovenia 2024-09-16 0.085470 0.085543
Switzerland 2024-09-16 86.666664 76.666664
= Dataset garden/who/latest/monkeypox
= Table monkeypox
Legend: +New ~Modified -Removed =Identical Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included Edited: 2024-10-11 08:20:34 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Pablo!
I've added some brief docs.
Also a reminder, that we should probably slowly stop using db_conn in favour of OWIDEnv objects. This basically means refactoring some functions in etl.grapher_io and their usage (for the future).
Some archived grapher datasets were missing in
VersionTracker.steps_df
. The reason was that, when extracting the relevant info from the database, steps were assign eitherdata://
ordata-private://
prefixes based onisPrivate
. But there were grapher datasets that were manually archived (by settingisArchived
andisPrivate
to 1) which came from non-private ETL steps.This PR fixes the issue, and those archived grapher datasets now appear in
steps_df
.