-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: update local dev setup #253
Conversation
bin/docker-run
Outdated
declare -a config | ||
config+=( | ||
database_name="gisaid" | ||
s3_src="s3://nextstrain-ncov-private/branch/${GITHUB_BRANCH}" | ||
s3_dst="s3://nextstrain-ncov-private/branch/${GITHUB_BRANCH}" | ||
fetch_from_database=false | ||
trigger_preprocessing=false | ||
keep_all_files: true | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that, as before, because workflow interacts with prod data, it's extremely easy to mess up the daily ingest by just setting a wrong S3 url here. So be careful.
# version becomes the new default, if this ncov-ingest image itself is still | ||
# relevant at that time. | ||
# -trs, 19 Jan 2020 | ||
FROM nextstrain/base:branch-python-base | ||
|
||
# Install Python package for which Python 3.7 wheels do not yet exist on PyPI. | ||
RUN apt-get update && apt-get install -y --no-install-recommends \ | ||
python3-netifaces \ | ||
time\ | ||
xz-utils | ||
# Let's drop dependency on an image of unknown origin and start over. | ||
# -ivan.aksamentov 2021-12-13 | ||
FROM python:3.7-buster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused on historical context here, but maybe @tsibley's TODO comment can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The python-base has been merged, so I think we can just use nextstrain/base:latest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A-ha! The mystery of a missing container image solved!
We don't need anything from the base image though, and pretty much anything other than python and pipenv.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for prompting this update Ivan! It's been a ~while since anyone has touched this Dockerfile.
Revamp the Dockerfile. It is no longer needed for production, so I think it is dev-only. I could not find the source for the base image and we don't need it anyways.
The GitHub actions use this docker image so it's not just dev-only, but I definitely agree it should be updated. I think we can use nextstrain/base:latest
with additional dependencies on top.
Add helper script to run the container
For the bin/docker-run
script, I think it would be best to mimic how the pipeline runs in production by using the nextstrain cli (this might also just be me not wanting to deal directly with docker 😅 )
Add missing snakemake package to Pipfile
The pipeline has been running fine without explicitly adding snakemake here...(Aha, snakemake is included in the nextstrain base image)
@joverlee521 Oh no! I overlooked that it's used in prod. In this case, nothing here makes sense. I deal with ingest a lot lately and it's not a great dev experience, so I sketched this minimal thing that runs and a more convenient dev flow, where I can modify things quickly. I could not even build the old image, because it uses apt repos from Debian version that is too old. I also don't know how the docker option of nextstrain cli works. Only used the AWS Batch path so far. I might try it sometimes and see how it goes. |
Still worthwhile to change |
@victorlin Probably. Might also make dockerignore more restrictive like here. It is currently copying the stuff from the repo into the build context, including |
This: - Removes duplicated nextclade downloads. Nobody knows wha version they are and when they are updated. Latest version of nextclade is already downloaded in the `./bin/run-nextclade` and `./bin/run-nextclade-fill` scripts. We want it to always be cutting edge. Nextclade releases are frequent. We may consider bringing it back into the image later if we setup automated rebuild of the image on Nextclade release. However I also often play with different custom versions of Nextclade in full runs and having other versions duplicated versions may cause mistakes. Mixed feelings here. - Adds various tools we might want to play with on branches (related to #247 #256): - aria2 - for downloading on multiple connections - lbzip2 - for parallel unbzipping - pigz - for parallel gz/ungz - pixz - for parallel xz/unxz - Changes base image to as suggested in #253. We don't use much out of it, so I am not super convinced it's needed, but not a big deal. Current inconvenience is that for any changes in deps on branches (e.g. if I want to play with aria2c and lbzip2) we need to change the container on master and/or publish an image and change the tag in the GitHub Action on master. Which both disrupt production. We might consider installing deps dynamically on AWS batch instead. I tired and it failed because currently the debian repo cache in the image is too old. Or maybe there's a better way to play with new tools right now? In any case, mixing prod and dev as it is now seems weird and unproductive, we might think about a dedicated, simpler dev flow.
This refreshes the local dev setup now that we use snakemake (#231)
snakemake
package toPipfile
This setups is what I used for #237
I cancelled GHA because this PR only affects local dev flow.