-
Notifications
You must be signed in to change notification settings - Fork 196
Jenkins CI Infrastructure on ARM for Kata-container #516
Comments
Thanks @kalyxin02 First thing we probably need to work out is the connection method between the master (http://jenkins.katacontainers.io/ - which is an always-on cloud hosted VM I believe, with access to 'the internet') - and your slave machine. Let us know your slave connectivity setup, and then we can provide/swap the appropriate keys etc., and let's get that slave online. We'll also have to work out how we filter/assign jobs to just that slave (and stop it try to process all/any jobs for now) - previously I have used Jenkins labels for this - @chavafg , do we have any job/slave allocation filtering in place at present? I think @kalyxin02 and @chavafg can work together to set this up? thx! |
Hi, @grahamwhaley @chavafg Thanks for the scenario. Our sever is on packet.net which is publicly visible i believe. So we can use SSH connection. Currently, we can provide you the information about the slave machine are: IP address, username, private key for ssh using the username. I have sent out a email for these info to you all already. Wish this help. Anything needed, please let us know. Thanks! For the filter for the job allocation, @Pennyzct has a pull request to add a relationship between the architecture and the filtered jobs at #514. Maybe you all may have a look and comment. Thanks! One more thing we want to know is about the software installation needed on the slave. Any configuration files? tools - java, ant, or maven or something else? Thanks! |
Thanks for the email - got it. Hopefully @chavafg can get the key loaded and we can test the slave/agent launch connection. It will not surprise me if the distro-specific install script might need tweaking for the addition of an architecture - we'll find that out the first time we launch a proper job (although, it can probably be tested and debugged by hand on the slave, which might be more efficient). For the job allocation/launching - yes, there are two 'parts' we need to work out.
I think I mentioned above maybe, I think there are two ways to configure Jenkins to do this:
I would suggest we investigate the 'matrix' abilities of Jenkins, and set up a test Job for trialling this. Once we are happy that works then we could roll it out to all the other CI jobs. I'm sort of hoping that the |
Hi @grahamwhaley @chavafg, thanks for the connection. I can saw "arm01_slave" on jenkins master page now. Great the first step! We installed JDK on the slave, and we will try the https://github.com/kata-containers/tests/blob/master/.ci/setup.sh you point out. We have tested the scripts before, and we need to workaround some of the environment setting up. We don't have nested virtualization support now, so we can't run jobs in virtual machine - we have to run the jobs on bare metal, and we are trying to figure out whether we could have the jobs to run in containers. Let's see how we can progress for the setup.sh. For the job allocation and launching, 1. The CI jobs need to work on all arch's - that is what #514 is enabling I believe I guess not all the CI jobs need to work on all archs? It depends on the test cases themselves... What #514 wants to achieve is to make different set of test cases run on different arch - in current case, it can make ARM CI just test the job which we have already verified and they won't block the current PR to be merged. And then we can add more and more bits of the jobs into the successful lists. The benefit is that we can save time and figure out how to setup the suitable build of jenkins for different platform first. For the "matrix" build, I agree with you about the usage. Per my understanding, the hard part is to define the suitable and efficient "axis" to generate the test matrix, and sometimes add necessary filters here. Though the configuration itself is comprehensible. For parallel, do you mean to maintain the order of the execution of different jobs but with relationship inside? Then it does't relate to the current problem? |
And the slave is online! Which means the agent has been injected/launched and connected back to the master!
As you don't have nested VM, you will be able to run the static checks and I'm hoping also most of the unit tests ('go test'). I do have a feeling though that virtcontainers unit tests may not work without nesting - if that is the case we will have to adapt the test scripts to take that into account. It would be great if you can run all the other unit tests on ARM :-) Running on bare-metal/in containers - have a look at kata-containers/ci#39 if you have not seen it already - I tried to detail where all the pain points might be (that I ran into when trying to do the same for the metrics CI, where I ended up running nested VMs at present). #514 should fix how we can run the same scripts (https://github.com/kata-containers/tests/blob/master/.ci/jenkins_job_build.sh) on multiple architectures. Yes, we can start by having that only run a minimal set of tests for ARM and then expand that later. My points around setting up a matrix and parallel jobs is about how we get Jenkins configured to launch ARM as well as x86 jobs when a PR lands or changes. At present when a PR lands/changes Jenkins launches a number of parallel builds - Fedora, Ubuntu, Centos etc. - all on x86. What we need to configure is to add to that list the Arm variations are also launched. The quick/dirty way is to just add another set of runtime/agent/proxy/shim jobs (to the list of jobs you can see on the CI homepage. That does not scale very well though. If we can set up Jenkins to have a 'matrix of parallel jobs to run', then this should scale better (and maybe ultimately be more maintainable for us) - but, it is a larger up-front cost as we do not have Jenkins configured that way at present. I probably suggest we add a single new ARM job that targets one of our low bandwidth repos (the proxy maybe?), make that a 'non-required' CI item on github, and then we can test that out with you firing a test PR to that repo. Alternatively, we can tie that to a user repo where you can test locally without injecting noise into the main repo until it is up and running. Let us know how you are progressing running I do expect we may have to modify and add a new feature or two to the jenkins script and infrastructure to maybe only run the unit tests and skip virtcontainers tests if we find we are on a system that does not support nesting. |
Agree with @grahamwhaley about adding a test job for the proxy repo, which does not have too much activity. Just like we did when experimenting with the zuul job. @kalyxin02 once you think |
@grahamwhaley @chavafg Thanks for the connection. It's so great to see the Arm slave server are online even it's now idle. :-) I'm happy to hear that you all agree to run a minimal set of tests for ARM and then expand that later. Thanks! We do pass the static check and most of the unit tests, except the issue we have raised for runtime repository unit tests at kata-containers/runtime#403. Not sure the current status of the issue, although it was marked as "close". It needs the modifications of x86 as well. At present when a PR lands/changes Jenkins launches a number of parallel builds - Fedora, Ubuntu, Centos etc. - all on x86. What we need to configure is to add to that list the Arm variations are also launched. Currently we only have one bare metal server which is Ubuntu based, so this is the only build we can have now, except we have multiple servers or we can find some way to have builds in container. For kata-containers/ci#39 you pointed out, I'm not quite understood why the problem happen as it looks like something wrong to add new PR to branch... and we're looking at them to find what happen there. The static-checks.sh is no problem on Arm. For the setup.sh, at this stage, we have to skip the installation scripts of CNI, CRIO, Kubernetes, Openshift and comment off the nest-virt support. Then it can run on the Arm server. The modification was actually included in the 3 commits of PR#514. And according to the current feedback of upstream, we will update a new version today or tomorrow. What should we do to add a test job for the proxy repo? Please let us know if you need something from us on Arm. Thanks! |
Hi @kalyxin02 - thanks for the update! The PR associated with kata-containers/runtime#403 (kata-containers/runtime#414) has been merged.
I'm not clear what you mean here?
Is this environment using an LTS release (16.04 or 18.04)?
Great!
All you need to do is get your Jenkins to call the following scripts I think (@chavafg and @grahamwhaley might have more info though): |
Hi @kalyxin02 For Jenkins, let me go set up a proxy build job for Arm and tie it to your server (using a Jenkins label I expect). Then we can fire a test job/PR at the proxy repo and see how the Arm build goes. I think that is going to be the quickest way for us to see what works/fails and to push this forwards. I will set the job up to look exactly like the x86 jobs - that is basically they call https://github.com/kata-containers/tests/blob/master/.ci/jenkins_job_build.sh. If you know where to look ;-), then you can see the actual commands we run stored in the CI repo:
As you say, as you are running on a bare metal machine without the builds containerised inside a VM or other container type, then you may run into issues I detailed on kata-containers/ci#39 . This normally only happens when you get a particularly 'bad build' - say a PR that leaves a QEMU laying around, or corrupts docker or the runtime somehow. It is fairly rare, but it does happen. So, we will start with your server as bare metal, and then see how it goes. I'll let you know here once I've set up the Jenkins job. |
@kalyxin02 - I have set up a proxy ARM build job on the Jenkins master, and filed a test PR over on the proxy. Let's keep an eye on that (I think we are expecting it to fail right now) - see what the first failure point is, and then I think it will be best for you to open a test PR on the proxy repo and start chasing down the failures - OK? |
OK, build failed as expected, but I suspect not in the way we expected. Here is the output in the console log:
I think the crucial line:
is going to relate to the fact our Jenkins kickoff script (configured inside the Jenkins job) already did a git clone of the 'tests' repo before invoking the script - BUT - I thought the job itself would then run inside the Jenkins WORKDIR or somewhere. This is probably not hard to fix. I'm not quite sure which line bombed out though... Actually, @kalyxin02 - does your build machine happen to have a 'tests' directory in the jenkins user homedir that is not a git checkout of the kata tests repo? That might be making that initial git clone of the tests repo fail? |
A note for @chavafg - I had configured the ARM slave node with the labels 'arm_node' and 'ubuntu-1604' - but - then realised that all the x86 jobs only need the label 'ubuntu-1604', so Jenkins would start scheduling x86 jobs on the ARM node... |
And just one more thought. On the metrics CI (where we used to run on bare metal) my startup script is slightly different, which will probably avoid this 'tests dir exists' issue. I effectively have:
So, run each new build within the 'clean' |
OK, I updated the job script to be:
kicked off a rebuild, and now in the console logs we have from http://jenkins.katacontainers.io/job/kata-containers-proxy-ARM-16.04-PR/2/console :
That got a bit further. Seems we have some sudo and perms things to sort out. Note @kalyxin02 - Jenkins does not provide a tty! You can replicate this locally by hand by running under a OK, let me know if/when you need anything more from myself or @chavafg |
Hi @jodh-intel, testlog.txt: file already closed For your second question, the Arm slave machine is running Ubuntu 18.04 LTS. |
OK, at some point either myself or @chavafg will have to rename the '1604' arm slave on Jenkins QA CI. It won't affect functionality, but we should name it correctly.... |
Hi, @grahamwhaley, thanks for your efforts to start the try of CI jobs and your updates of the scripts to make a clean build. I'm trying to figure out how to move forward a bit based on your steps and will update you once I got something. Thanks! |
Hi, @grahamwhaley, I'm a bit confused. kata-containers/proxy#94 trigger the CI job on task today, I still saw the below errors... fatal: destination path '/home/jenkins/workspace/kata-containers-proxy-ARM-16.04-PR/go/src/github.com/kata-containers/tests' already exists and is not an empty directory. |
Hi @kalyxin02 - indeed. OK, I see that the build does not happen in the tmpdir, but always happens in the same fixed workspace. Let me go make those changes, and then I will nudge a rebuild and we'll see what happens. |
OK, looks like we are back to like before
One suggestion maybe @kalyxin02 - is the |
@kalyxin02 ARM CI is already running, can we close this issue? thanks |
We are working on the public Jenkins CI setup on ARM platform. The first step should be making one ARM server be recognized by current Jenkins master and can scheduled the server as one of its Jenkins slave. The n we can run a number of "safe" CI jobs which we have already tested on it. In parallel, we will continue to progress on making more and more CI jobs be successful. See #472. But anyway, setting up the physical infrastructure is the first step.
@grahamwhaley @chavafg Thanks for the help.
The text was updated successfully, but these errors were encountered: