Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Batch runner does not support runtime-configurable Docker images #57

Closed
tsibley opened this issue Nov 7, 2019 · 4 comments
Closed
Labels
enhancement New feature or request

Comments

@tsibley
Copy link
Member

tsibley commented Nov 7, 2019

This limitation is because the --aws-batch runner (nextstrain.cli.runner.aws_batch) relies on a static job definition created manually during AWS Batch setup (as documented). The job definition hardcodes an image, and individual job instances based on that definition are not allowed to override the image (unlike other job properties like CPU, memory, command line, etc).

The --aws-batch runner should allow a Docker image to be selected by the user at runtime, not just setup, using the same mechanisms supported by the --docker runner. Specifically, --aws-batch should use the image passed in via --image or the default Docker image (configured via a config file or environment var).

To accomplish this, the --aws-batch runner will have to create a new job definition before job submission if no job definition already exists for the desired image. The new job definition should be based on the default job definition, using it as a template for all properties except the image. The help message for --aws-batch-job should be updated to document that it may be used this way.

As a rough map of where to start for the person taking on this issue, I annotated the places which need new code to support this functionality.

@jstoja
Copy link

jstoja commented Apr 4, 2020

If I understand correctly:

  • The job definition nextstrain-job would be the main definition for every job using the latest image.
  • A new job definition should be created before the job submission if a new image is specified.

This raise a few questions:

  • The naming convention of the job definition. Something like nextstrain-job_<image_tag> would probably work fine. For example nextstrain-job_branch-python-base.
  • How should the accumulation of job definition be handled? Since it'd be only linked to an image, it's not clear how/when a job definition could/should be erased. After some time, you could have a lot of definitions created, but only a few used.

@tsibley
Copy link
Member Author

tsibley commented Apr 7, 2020

Thanks for thinking about this, @jstoja! Here are my thoughts, would appreciate feedback.

The job definition nextstrain-job would be the main definition for every job using the latest image.

Yes, assuming the requested job/image are default as shipped. Any given build job should use the requested job description as-is it uses the requested image.

Note that the name of the job definition can be controlled by config, env, or option and may not be nextstrain-job.

Also note that the requested image name will be nextstrain/base (implicitly :latest) by the shipped default, but it can be set by config, env, or option to something else, like nextstrain/base:branch-python-base or even nextstrain/seasonal-flu:latest (hypothetical, doesn't exist).

A new job definition should be created before the job submission if a new image is specified.

Correct.

This raise a few questions:

The naming convention of the job definition. Something like nextstrain-job_<image_tag> would probably work fine. For example nextstrain-job_branch-python-base.

The image part will have to be the fully-specified name, not just the tag of nextstrain/base since that may not always be the image name. I think some suffixing scheme will be good enough, though. I'm not sure what characters AWS Batch allows in job definition names, but I'd prefer a less common delimiter char than underscore. Perhaps nextstrain-job|nextstrain/base:branch-python-base or nextstrain-job:nextstrain/seasonal-flu:latest or nextstrain-job~nextstrain/base:branch-python-base.

How should the accumulation of job definition be handled? Since it'd be only linked to an image, it's not clear how/when a job definition could/should be erased. After some time, you could have a lot of definitions created, but only a few used.

This is a reasonable concern, but I think for now this issue can be punted. Unless you have some clever ideas here?

@jstoja
Copy link

jstoja commented Apr 7, 2020

Happy to contribute, awesome project guys, the amount of work and the quality of the software is incredible!

I'm not sure what characters AWS Batch allows in job definition names, but I'd prefer a less common delimiter char than underscore.

Documentation lists explicitly hyphens and underscores:

jobDefinitionName
    The name of the job definition to register. Up to 128 letters (uppercase and lowercase), numbers, hyphens, and underscores are allowed.

I've tested with / and ~ in the AWS interface: /, :, | and ~. None of them work with the following message: Error executing request, Exception : Job definition name should match valid pattern.

Therefore, I would suggest:

  • We use nexstrain-job_xxx
  • xxx would be the docker image name and tag, but with the characters /, and _ replaced to -, and : with _.

With this proposal a job definition with image nextstrain/seasonal-flu:latest would have a job like: nextrain-job_nextstrain-seasonal-flu_latest. This allows to list jobs and split by _ to have the name of the job definition, the image and the tag (if we agree that there shouldn't be _ in the job definition name).

This is a reasonable concern, but I think for now this issue can be punted. Unless you have some clever ideas here?

No I don't.

@tsibley
Copy link
Member Author

tsibley commented Apr 7, 2020

Documentation lists explicitly hyphens and underscores:

Ah, well, underscores it is! I think your proposal for substitutions makes sense.

jstoja pushed a commit to jstoja/cli that referenced this issue Apr 7, 2020
To support runtime-configurable Docker image in AWS Batch,
the creation of a Batch Job also created a Job Definition with the
configured image.

Fixes nextstrain#57
jstoja pushed a commit to jstoja/cli that referenced this issue Apr 7, 2020
To support runtime-configurable Docker image in AWS Batch,
the creation of a Batch Job also created a Job Definition with the
configured image.

Fixes nextstrain#57
jstoja pushed a commit to jstoja/cli that referenced this issue May 10, 2020
To support runtime-configurable Docker image in AWS Batch,
the creation of a Batch Job also created a Job Definition with the
configured image.

Fixes nextstrain#57
@tsibley tsibley closed this as completed in 05ed802 Jun 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants