Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing archived grapher datasets in version tracker #3397

Merged
merged 2 commits into from
Oct 11, 2024

Conversation

pabloarosado
Copy link
Contributor

Some archived grapher datasets were missing in VersionTracker.steps_df. The reason was that, when extracting the relevant info from the database, steps were assign either data:// or data-private:// prefixes based on isPrivate. But there were grapher datasets that were manually archived (by setting isArchived and isPrivate to 1) which came from non-private ETL steps.
This PR fixes the issue, and those archived grapher datasets now appear in steps_df.

@pabloarosado pabloarosado self-assigned this Oct 10, 2024
@pabloarosado pabloarosado marked this pull request as ready for review October 10, 2024 21:03
@owidbot
Copy link
Contributor

owidbot commented Oct 10, 2024

Quick links (staging server):

Site Admin Wizard

Login: ssh owid@staging-site-fix-missing-datasets-in-version-tracker

chart-diff: ✅ No charts for review.
data-diff: ❌ Found differences
= Dataset garden/health/latest/global_health_mpox
  = Table global_health_mpox
= Dataset garden/who/2024-09-09/flu_test
  = Table flu_test
    ~ Dim country
-       - Removed values: 30 / 71625 (0.04%)
                date     country
          2024-09-30      Bhutan
          2024-09-30   Indonesia
          2024-09-23       Kenya
          2024-09-30    Slovakia
          2024-09-30 Switzerland
    ~ Dim date
-       - Removed values: 30 / 71625 (0.04%)
              country       date
               Bhutan 2024-09-30
            Indonesia 2024-09-30
                Kenya 2024-09-23
             Slovakia 2024-09-30
          Switzerland 2024-09-30
    ~ Column denomcombined (changed data)
-       - Removed values: 30 / 71625 (0.04%)
              country       date  denomcombined
               Bhutan 2024-09-30             55
            Indonesia 2024-09-30             21
                Kenya 2024-09-23             64
             Slovakia 2024-09-30             14
          Switzerland 2024-09-30             37
        ~ Changed values: 29 / 71625 (0.04%)
            country       date  denomcombined -  denomcombined +
              China 2024-09-23            15701            21567
          Indonesia 2024-07-08               52               53
          Indonesia 2024-08-19               70               63
           Malaysia 2024-08-26              930              928
           Malaysia 2024-09-23              924              918
    ~ Column pcnt_poscombined (changed data)
-       - Removed values: 30 / 71625 (0.04%)
              country       date  pcnt_poscombined
               Bhutan 2024-09-30          9.090909
            Indonesia 2024-09-30         28.571428
                Kenya 2024-09-23         21.875000
             Slovakia 2024-09-30         14.285714
          Switzerland 2024-09-30         24.324324
        ~ Changed values: 31 / 71625 (0.04%)
              country       date  pcnt_poscombined -  pcnt_poscombined +
               France 2024-06-17            0.427727            0.452887
            Indonesia 2024-06-24           25.423729           25.000000
            Indonesia 2024-08-12           12.500000           12.195122
             Slovenia 2024-09-16            0.085470            0.085543
          Switzerland 2024-09-16           86.666664           76.666664
= Dataset garden/who/latest/monkeypox
  = Table monkeypox


Legend: +New  ~Modified  -Removed  =Identical  Details
Hint: Run this locally with etl diff REMOTE data/ --include yourdataset --verbose --snippet

Automatically updated datasets matching weekly_wildfires|excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included

Edited: 2024-10-11 08:20:34 UTC
Execution time: 14.00 seconds

Copy link
Member

@lucasrodes lucasrodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Pablo!

I've added some brief docs.

Also a reminder, that we should probably slowly stop using db_conn in favour of OWIDEnv objects. This basically means refactoring some functions in etl.grapher_io and their usage (for the future).

@pabloarosado pabloarosado merged commit d5c62c8 into master Oct 11, 2024
8 checks passed
@pabloarosado pabloarosado deleted the fix-missing-datasets-in-version-tracker branch October 11, 2024 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants