-
-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI for HTCondor using Docker #247
Comments
I'd be happy to take a stab at this if no one else is working on it.I haven't used HTCondor before, but it looks pretty cool. |
Sure, that would be great! Let me know if you need some guidance to begin this. |
@kaelancotter FWIW the HTMap project within HTCondor uses a |
@kaelancotter, still motivated? |
Still motivated indeed! Unfortunately I've been unexpectedly low on available bandwidth lately. If someone else wants to step up while I continue to putter along, by all means! |
Looks like the |
Note the docker image they are using for their CI (in .travis.yml) is this one: Just a quick note before I forget: our current test infrastructure for SGE, SLURM and PBS use a docker-compose setup (this way it looks more like a real cluster, where you have the master node and some compute nodes). If that's easier to setup for |
Here is a proof of concept that shows that a single Dockerfile setup seems promising: git clone https://github.com/htcondor/htmap
cd htmap
docker build -t htmap-test --file tests/_inf/Dockerfile --build-arg HTCONDOR_VERSION=8.9 \
--build-arg PYTHON_VERSION=3.7 .
docker run -it htmap-test bash -c 'git clone https://github.com/dask/dask-jobqueue;\
cd dask-jobqueue;\
pip install -e .;\
pytest dask_jobqueue/tests/test_htcondor.py --verbose -E htcondor' The pytest output shows that
For integrating
If anything needs clarification, let me know! |
I think the |
Sorry for the silence on this. I did try to get a Docker image setup with Condor running but so far I've been running into difficulties, namely the script used as an entry point to start Condor hangs indefinitely waiting for everything to come up. I'll keep working at it as I find time. |
I am more than willing to help you on this if you provide a bit more information:
If you provide a branch with your WIP, I will try my best to have a look at it next week. |
My branch is located here. Rather than using the Dockerfile from htmap directly, I attempted to slim it down to only include what was needed to get Condor installed and running (note it doesn't even install Dask into the image yet). When I tried to run it using the |
I probably won't have time to look at this before next week. If I were to look at it I would do it in the reverse order:
|
did you see that @matyasselmeci in the meantime has created a whole set of docker containers at https://github.com/htcondor/htcondor/tree/master/build/docker/services , particular he seems to maintain a base image for different versions (would be great to have a historical one for the CI). However, I could not find a base image pushed to docker hub, yet. IMHO they would be the easiest way to go with if availabe (no need to maintain them here). There is I guess still some documentation missing and I guess work is not finished yet. Didn't find a compose file yet but I might give it a try (the docs for the execute note give the necessary hints, I guess). |
Great to hear! Last time I tried it seemed like adding HTCondor to the CI was definitely within reach. About docker-compose, this is not a requirement at all. If you can get it to work with a single docker image this would probably simpler. We use docker-compose for historical reasons and also because it allows us to test some edge cases (recently I added a test in #400 for when you have to use a different interface on the worker and on the scheduler). For example Dask-Gateway use single docker image in their CI. |
We put images for htcondor/mini and htcondor/execute up on Dockerhub but they are very much in the "technology preview" stage. htcondor/mini is a single-machine all-in-one image, so if you don't need to test multi-machine support, you could use that. We welcome comments and suggestions on how to improve those images -- there's a lot of room for improvement and we'd like to know what direction to go in. |
I know this is a bit much to ask but given I am unlikely to be to look at this in the near future, I would encourage one of the person involved you to have a go at it. Here are some steps to help you getting started (maybe #247 (comment) can also help fill in the blanks), do let me know if you get stuck:
#!/usr/bin/env bash
function jobqueue_before_install {
# start the docker container in the background probably need to give it a nice name
docker run -d -t htcondor-container-name htcondor/mini
}
function jobqueue_install {
docker exec -it htcondor-container-name /bin/bash -c "cd /dask-jobqueue; pip install -e ."
}
function jobqueue_script {
docker exec -it htcondor-container-name /bin/bash -c "pytest /dask-jobqueue/dask_jobqueue --verbose -E htcondor -s"
}
function jobqueue_after_script {
# do something useful for debugging here if you think it is worth it
}
|
This totally no unreasonable request! The problem is a bit my personal availability, I think it could really quickly be done (need to get familiar with the testing anyways, if I finally want to get the stuff from #411 into the code base: more work to maintain a fork) I also will try to get a student worker at our lab to support the work at our lab, this could accelerate things a lot, but it takes a bit of time. We are really grateful for your efforts and happy to support. |
using minicondor image and single docker Signed-off-by: Till Riedel <riedel@teco.edu>
I guess this can be closed for now with #420 merged. Will open a few other issues in order to improve/align the support (Dockerfile) |
Great to see that the Triage permissions work! |
Now that #245 is in, we would need some CI testing as what is done for PBS, SGE and Slurm. If someone has any hints on interest on doing this, please come help!
The text was updated successfully, but these errors were encountered: