Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builds are 1200% slower on Openshift 4 #813

Closed
nicolaferraro opened this issue Jul 12, 2019 · 7 comments
Closed

Builds are 1200% slower on Openshift 4 #813

nicolaferraro opened this issue Jul 12, 2019 · 7 comments
Labels
area/build-operator Related to the internal image build operator kind/bug Something isn't working
Milestone

Comments

@nicolaferraro
Copy link
Member

It takes 10 seconds for camel k to build a Kit in openshift 3.11 (minishift). It takes 120 seconds to do the same on openshift 4 with crc.

That can be buildah slower than docker, but 12x slower means they are doing something wrong and we should tell them.

Or it's possible that we're doing something wrong on our side.

cc: @astefanutti, @aldettinger

@nicolaferraro nicolaferraro added kind/bug Something isn't working area/build-operator Related to the internal image build operator labels Jul 12, 2019
@nicolaferraro nicolaferraro added this to the 1.0.0 milestone Jul 12, 2019
@oscerd
Copy link
Contributor

oscerd commented Jul 12, 2019

I would maybe open an issue as discussion on the buildah community. They could be interested, I hope.

@nicolaferraro
Copy link
Member Author

Yes, the main task here is to figure out with them what's happening, but we need to provide them more info.

@nicolaferraro nicolaferraro modified the milestones: 1.0.0, 1.0.0-M3 Oct 4, 2019
@nicolaferraro nicolaferraro modified the milestones: 1.0.0-M3, 1.0.0-M4 Oct 11, 2019
@dhirajsb dhirajsb self-assigned this Oct 30, 2019
@dhirajsb
Copy link
Member

@nicolaferraro does this happen with pretty much any camelk example, or is there a specific use case where this is really evident? And is this issue specific to CRC only not any other OCP 4 install?

@nicolaferraro
Copy link
Member Author

Any ocp4 cluster I've tried so far has this issue. Crc and aws. We need to understand what's the source of this problem and fix it (or ask who's responsible to fix it).

@nicolaferraro nicolaferraro modified the milestones: 1.0.0-M4, 1.0.0-M5 Nov 12, 2019
@valdar
Copy link
Member

valdar commented Nov 20, 2019

I think you might be interested in this: containers/buildah#1849 @dhirajsb @nicolaferraro

@nicolaferraro nicolaferraro modified the milestones: 1.0.0-M5, 1.0.0-CR1 Dec 3, 2019
@nicolaferraro nicolaferraro modified the milestones: 1.0.0-RC1, 1.0.0-RC2 Dec 17, 2019
@nicolaferraro nicolaferraro modified the milestones: 1.0.0-RC2, 1.0.0-future Feb 20, 2020
@dhirajsb
Copy link
Member

@astefanutti iirc you've looked at this issue, do you want to assign it to yourself and fill out details on what's going on in OpenShift for this?

@dhirajsb dhirajsb removed their assignment Apr 18, 2020
@astefanutti
Copy link
Member

Multiple tests on the latest OCP 4.3.5 version gives much better build times, e.g.:

kubectl get builds.camel.apache.org     
NAME                       PHASE       AGE     STARTED   DURATION          ATTEMPTS
kit-bpofajispg2h46llnmh0   Succeeded   4m59s   4m59s     1m19.909447568s   
kit-bpofcfqspg2h46llnmhg   Succeeded   58s     58s       53.03602298s  

From the containers team:

With each version of OCP there are improvements to buildah and its underlying libraries. [...] between 4.2 and 4.3.5, which upgraded buildah from v1.10.1 to v1.11.6+, respectively.
The main places where we have impact on the speed of a build are when we pull and push layers, during COPY and ADD instructions in Dockerfiles, and when we generate layer diffs after processing instructions.
Between those two buildah versions, it looks like the changes that would affect these were in how we handle COPY and ADD.
There are currently two code paths for handling them: one for when there isn't a .dockerignore file in the build context, and one for when there is. The first one is significantly faster. In a number of cases, though, fixes for other bugs ended up putting us on the second path even when we didn't need to be.
The longer term plan is to speed up the with-.dockerignore path so that it can take the place of both of the current code paths.

There is still the open point of image layers caching:

in 3.x we shared the image cache on the node, so any layer that had been previously pulled to that node, or built on that node, would get reused in subsequent builds.
in 4.x there is no caching of any layers between builds (the layers only live within the pod's container, so they get deleted when the build pod terminates), so everything must be repulled + rebuilt every time.

An enhancement proposal is being worked out in openshift/enhancements#216.

I think we can close this issue and follow-up as we work on image caching in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build-operator Related to the internal image build operator kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants