-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2 #15
Conversation
@natefoo this doesn't build properly, and I'm not sure where the bug is, just a reminder to check this before this branch gets merged. No action needed right now.
- support changing version - use 'default' recipes - fixes for 'image' metadata - fixes for ${version}
This is an EXTREMELY experimental tool. Checking for updates is a waste of time. We should keep up with updates to the software we care about. Thus, `check-for-updates` was born. It inspects the 'default' entry of every package with a `build.yml` file. If that's found, then it extracts the urls from the urls section, and checks for those with a `${version}` in them. When those are found, it uses some hardcoded logic to check for more recent releases. When a more recent release is found, it will copy the build.yml file across, and generate a new folder/version with the updated version number value. This should let the docker image for the original be run once, that archive generated. Later, updates can be checked for, and any missing packages can be generated.
- Apparently ca-certs is included when you install wget from the command line, but not in the automated way we do it.
TODO (not necessarily as part of this PR):
|
I have a few problems, need to investigate this further ... |
@bgruening okay. I don't want to force my view of "v2" on anyone, it didn't take that long to hack this out and anything can be changed if you have needs. |
|
||
prebuild_packages = ['wget', 'openssl', 'ca-certificates', 'build-essential'] | ||
if 'prebuild' in image_data and 'packages' in image_data['prebuild']: | ||
prebuild_packages.extend(image_data['prebuild']['packages'].strip().split()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's easier for cuttin' and pastin' if the packages are a blob, but being part of structured data it feels somewhat icky that a list is a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'll make that accept a list or a string for those who want to put effort into refactoring their package lists
Eric; Dockerfiles are simply enough to code them, at least for the target users we are aiming for (us? :)). So why do we need the yaml abstraction et all? As I understood your code correctly you are creating a shell script out of the yaml definitions and using a reusable/chaced Dockerfile with build-essentials. Wouldn't it be easier to transfer Do I miss something obvious? |
What am I missing? (Other than the --nocache flag)
Well, some of the build stuff from the yaml goes into the image. The image isn't just build-essential, it's that plus all listed packages (e.g. see chado image) and any commands you want in the image. That's why I calculate an image name in the code, as all of the images are different due to installing different packages.
We probably don't, I like the cleanliness of writing a pure metadata file rather than tying us to docker, in case someone wants to reuse the metadata with some other build system. You prefer thinking about docker layers and stringing dozens of commands togethwe with E.g. I'll be able to patch in a command to store the downloaded URLs so we have the original .tar.gz (for reproducibility) in one place, in build.py, and then rerun/regenerate all of the docker images. If we hardcode the images, we either are merging two dockerfiles together and doing this is non-trivial (as the urls would be in the 'custom' part of the image), or I'm patching that into N imaged and every new one we generate |
OpenMS for example downloads more than 200MB and the first commands taking ages (compiling boost etc...). Would be nice to have this cached, or at least to have control over it.
Indeed, and this is a very nice improvements.
The
I don't see where yaml has an advantage over a plain Dockerfile here.
Yes I understand this, but the question is what we want to archive here? |
hmm. Agreed. For this, you could place the wgets in the
fair point.
you do! Definitely. I had trouble converting your packages because you use the squeeze backports, and there wasn't an easy way to stick that in before the package fetching.
it always was, still is.
definitely not! @jmchilton teased me about this in IRC, I agree. I don't want another packaging format. Since this can ONLY be used to build docker images of packages right now...it feels like we're not trying to do that.
I agree. It's definitely over-engineered compared to writing Dockerfiles, but I'd argue that this is the cleaner, easier to refactor approach should we decide to add features in the future. Given how the feature scope has creeped in the past, I expect that this will be an easier format to work with going forward given that we can regenerate ALL of the dockerfiles at once, from metadata. That's the crux of my argument, is that if we decide to do different things with the dockerfiles (e.g. my storing of packages), all of our code stays DRY.
(We=me) want to archive all the artifacts of the build process. Inputs and outputs. This happens on a VM somewhere, so I don't care about space/runtime/etc. I just don't like having bedtools fail with the version specified in build.sh because they removed the file, so now we can never redo that build. I really understand that we probably won't ever want to redo that build, but I don't see that as a reason not to store artifacts. |
@erasche can you come up with a different use case for what we need
Yes an old problem, but it's not the problem we can fix here. I'm more worried about the debian team disables the archive :) My main argument is: This is a hack, so let's use hacky scripts to get our binaries and not split the logic in (wonderful) yaml + a backend that needs to be constantly updated if we need a new feature. |
@bgruening I can't come up with another use case for the metadata. All of my use cases are theoretical "someone might want to do this" and thatit's prettier/cleaner. I understand what you're saying, and I don't have anything to refute it with. It's a hack, it doesn't need to be over-engineered. If you'd prefer, I'll back out all of my changes to other packages, and will settle for just adding this in as another, optional build system. |
@natefoo any comments? |
Feel free to merge as it is, this was not a |
okay! Less work! Yay! :P
yeah. Well, let's keep all of the build scripts (for the time being) and call it good? |
FYI, I've been building against this branch in jenkins and we had our first completely successful build!! https://gx.hx42.org/job/Docker-Build/18/ Clicking "Expand all" will show all of the build files and build logs. Build logs/outputs will need to be checked manually at some point, but most things seem to be building at least semi-correctly. I would not automatically pull yet, lest any 100% real/known correct artifacts get overwritten. NB: It's surprising that it's completely successful because ugly hacks were needed to get there. 20 GB of disk space = jenkins crashes and burns after about 4-5 image builds due to disk space required for layers. Thus, I had to insert some steps to regularly wipe out all unused images and remove them from disk, in between the build.py steps. |
Any -1s or should this be merged now that it's ~3 weeks old? |
@erasche you mentioned keeping the build scripts, should I add them back? |
@natefoo you can add them back if you wish, but most everything converted builds properly. You're probably right, I'll re-add the original build.sh files and they can all co-exist until |
@natefoo okay, this is ready to merge.
|
👍 from me. @bgruening, any comments? |
Let's merge this! Awesome work Eric. |
I recognise that this is a huge PR, so please take your time to review and comment. :)
<package>/<version>
, where the version is usually 'default' for the base recipe, and then subsequent/specific versions of recipes will be added in their own folders.check-for-updates.py
script which should automatically update certain pieces of software..yaml
format image/build metadata.ENV
section (probably overkill, but logically separate frommeta
)prebuild
are commands that will occur in the Dockerfile, allowing you to extract all of the packages/dependencies into that section. Thepackages
key is a text field of packages, while thecommands
section allows you to specify a list of commands to run. Useful for things likecpan
ing perl dependencies.build
section concerns itself with things occurring during the build phase.urls
are a list of urls to download via wget (wget
,ca-certificates
(so--no-check-cerfificate
isn't needed), andbuild-essential
are provided by default).commands
are a list of commands to execute. They start execution from the/build/
directory. Downloaded archives are not extracted for you (wanted in select cases, e.g. atlas).This probably shouldn't be merged as-is, as there are some packages that fail to build/aren't function. E.g. bcftools removed version 1.0 from their github, so I was forced to update to 1.2. Atlas has some build issues but those are beyond my knowledge. All packages that people added should be manually built once just to verify :)