-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update sars-cov-2 search regularly #230
Conversation
this is a Github action which runs scripts/collect-search-results.js every weekday as close as possible to the time by which we have usually published new sars-cov-2 (ncov) builds to s3. More jobs in this workflow or separate ones with separate timelines can be added for other pathogens we want to maintain search pages for. If we want to make this 'smarter' we can use something like aws lambdas which could automatically run upon updating of certain files in an s3 bucket
I'm not really qualified to review the code for this, but the functionality sounds great, and I think this seems like a good solution to me, as far as being (relatively) easy to get up and running, and will greatly improve having that search function stay up-to-date! |
you need to explicitly keep changes to the environment from one step to the next within a github action job by using the PATH variable if I remember correctly. Avoiding dealing with this for now by sticking it all in the same step, since if we want to update other search databases in addition to sars-cov-2, they'll either be on the same timeline and can be tacked onto this step, or a separate timeline and will therefore need a separate workflow. Unless we want some updates to fail within a workflow and not have it prevent the rest (which seems smart).
This basically works now. Still todo:
|
These only work on default branch for a repository so won't be able to test until we merge. How often would we like to update https://nextstrain.org/search/seasonal-flu and are there other search pages we have or would like to have that we should include here? Once we get those added, I think it's ok to merge and then begin testing the different ways to run this workflow mentioned above. |
this removes two ways of triggering the workflow: 1. On push (this was for testing purposes) 2. On repository_dispatch this only works on master and has not been able to be tested on this branch. It also could be seen as redundant to just running the update-search script locally (although this assumes you have installed dependencies)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good Eli! Thanks for making those recent changes. Let's merge at a time when we can monitor that the schedule
works as expected.
Looking like the schedule is working now. I let Jover know this exists in case we want to use it for seasonal-flu. I also realized the only difference between
and running this locally as we have done in the past would be that it happens on gh actions, which has pros and cons. I would imagine
is a bigger priority, @jameshadfield did you have specific improvements in mind or did you just want to generally fetch things more quickly/efficiently? |
The script as written was amost a proof-of-principle approach. Each time it runs it downloads every JSON; a better approach would only download new/changed JSONs and store the results somewhere for access next time. Not a huge issue right now as S3 access is cheap, but something to consider in the future as needed. I would place other things - SARS front-page UI etc - as a higher priority. |
This is a Github action which runs
scripts/collect-search-results.js
and uploads this data to s3 in order to update https://nextstrain.org/search/sars-cov-2. It is scheduled to happen every weekday as close as possible to the time by which we have usually published new sars-cov-2 (ncov) builds to s3; as I wrote in a comment in the workflow itself, the timing of this is imperfect for sars-cov-2 (ncov) builds because they are run in different time zones on alternating days. More jobs in this workflow or separate ones with separate timelines can be added for other pathogens we want to maintain search pages for. If we want to make this 'smarter' we can use something like aws lambdas which could automatically run upon updating of certain files in an s3 bucket, but that doesn't seem to be such a huge improvement over this to necessitate the time it would take (at least for me with my lack of lambdas experience).Description of proposed changes
What is the goal of this pull request? What does this pull request change?
Related issue(s)
Slack thread
#187
#192
#196
#182
Testing
I will trigger this manually now to test that it works and also monitor it to see that the cron schedule is as desired (I think it will apply even though it's in a branch since I didn't say anything about which branch it runs on, but I need to check the default there.
Thank you for contributing to Nextstrain!