Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to processing of external data about vaccines #1104

Merged
merged 20 commits into from
Mar 30, 2022

Conversation

rando2
Copy link
Contributor

@rando2 rando2 commented Mar 23, 2022

Description of the proposed additions or changes

This PR updates the OWID external analysis to include data about each vaccine scraped from https://covid19.trackvaccines.org/
This will allow us to avoid needing to manually update the types of each vaccine
Additionally, I added in the code needed to store the URL associated with each vaccine in case we want to pull other data from that site down the road. (Since most of the data is on a separate page where the url contains an arbitrary number assigned to each vaccine)
I also added some new outputs to the json file, including markdown tables listing approved vaccines in each category

The map file names for the inactivated whole virus vaccines has changed. The incorrect reference is removed in #1105

One other concern: I manually added the packages I used to environment.yml. Checking the environment in conda generates a huge list because it's the project as a whole. I think it will be clear whether this worked based on whether it works OK, but I would guess there's likely a better way to do this!

Related issues

closes #1101

Suggested reviewers (optional)

Checklist

  • Text is formatted so that each sentence is on its own line.
  • Pre-prints cited in this pull request have a GitHub issue opened so that they can be reviewed.

@rando2 rando2 requested a review from agitter March 23, 2022 20:36
@rando2 rando2 changed the title Owid update Update to processing of external data about vaccines Mar 23, 2022
@rando2 rando2 linked an issue Mar 23, 2022 that may be closed by this pull request
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall. I didn't try running the code. You could add the .json file to the commit so we can inspect it before merging.

Are you concerned the that the conda environment may not build now that you added new packages? We could wait to see what happens in GitHub Actions or run a local dry run to check whether the versions can all be resolved.

# If they add a new category (which seems unlikely), handle & throw error
if vaxtype not in types.keys():
print("Unknown vaccine platform:", vaxtype)
return vaxtype
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this to return or exit with an error? Exiting is extreme, but we don't check the log files for error messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should ever happen because I think it handles all the major categories of vaccines that currently exist? I just wrote it in as a just-in-case. So I think either would be fine!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I prefer we exit with an error. That should happen rarely if ever, so we would want to know about the new major vaccine category.

owiddata/generate-owiddata-stats.py Outdated Show resolved Hide resolved
vaccines = vaccine_df[vaccine_df["Category"] == category]
numTypes = len(set(vaccines["Type"].to_list()))
if numTypes > 1:
return vaccines[["Company", "Type"]].to_markdown()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how the JSON will store the markdown. We can see what happens and adjust if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if this will work!!! In theory I think the JSON looked ok but there are some special characters that could make things interesting

owiddata/generate-owiddata-stats.sh Show resolved Hide resolved
@agitter
Copy link
Collaborator

agitter commented Mar 28, 2022

This is looking good. Once the code is stable, could you please run it locally and commit the output images and json file so we can inspect them before merging?

@rando2
Copy link
Contributor Author

rando2 commented Mar 29, 2022

Thank you @agitter for the feedback! This runs in conda and I believe all the components are here now!

I will assume we want to keep the standardized color-coding across all figures, but I'm happy to change this if the shading is too faint for some of the newer methods.

environment.yml Outdated Show resolved Hide resolved
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think this looks ready to merge.

@rando2
Copy link
Contributor Author

rando2 commented Mar 30, 2022

@agitter I think this aligns the datasets well now. I did write in a fairly bulky output so that we can investigate things that seem weird (above). It prints all of the info for vaccines in the VIPER dataset that aren't found in the OWID data. Logically that is the only direction we need (since VIPER tracks candidates and OWID only tracks shots in arms) but I'm also happy to add in another check if we'd feel better checking both directions!

@agitter
Copy link
Collaborator

agitter commented Mar 30, 2022

I trust your judgement about how to best confirm the matching across data sources worked well. This seems good enough to me. The country counts in the json output have been restored to their expected numbers.

@rando2
Copy link
Contributor Author

rando2 commented Mar 30, 2022

@agitter thank you so much for all of the help with getting this to work! It turned out way more complicated than I initially thought. I'll merge now, fingers crossed it integrates well.

@rando2 rando2 merged commit 974c756 into greenelab:external-resources Mar 30, 2022
@rando2 rando2 deleted the owid-update branch March 30, 2022 15:35
@agitter
Copy link
Collaborator

agitter commented Mar 30, 2022

I manually triggered the external resources build from here so that the figure URLs in the json file get updated with actual commits. That will also let us confirm everything runs in the updated environment.

@agitter
Copy link
Collaborator

agitter commented Mar 31, 2022

The workflow ran successfully, but it took 5 hours (!) for the conda environment to install. I'm not sure what would cause it to be so slow.

We could switch to mamba at some point.

@rando2
Copy link
Contributor Author

rando2 commented Mar 31, 2022

@agitter Oh my gosh, five hours??? When I generate my conda environment.yml file locally, there are a lot of other dependencies, so it does seem like a complex environment. Sorry if these changes are slowing things down!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

External resources workflow failing due to missing vaccine platforms
2 participants