Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve build performance #413

Merged
merged 5 commits into from
Jul 2, 2021

Conversation

ap-wtioit
Copy link
Contributor

On some of our build systems (2/3) we seem to hit this issue:
docker/for-linux#388

With this merge request:

  • The number of needed chown and chmod operations is greatly reduced so we can finish our CI Tests on those systems within the timeout (1h)

  • autoaggregate now checks out files as odoo user right away (with the correct umask as default)

  • python pre compile in /opt/odoo/custom/src is now performed as odoo user to reduce the need of chowning the python cache after compilation

  • /opt/odoo/custom/src now belongs to odoo:odoo instead of root:odoo (this seemed to be more correct for development where odoos UID and GID are mapped to the developers UID and GID)

@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch 3 times, most recently from 9ce205a to 95741d9 Compare April 12, 2021 08:06
@ap-wtioit
Copy link
Contributor Author

Note: test errors seem unrelated to merge request and could not be reproduced locally:

#20 3.406 Requirement already satisfied: click in /usr/local/lib/python3.5/site-packages
#20 3.430 Collecting coverage
#20 4.008   Downloading https://files.pythonhosted.org/packages/18/97/fda94f1e78b508dd1a493106c236076c4c82a9925c9347211afc4d6bd4e0/coverage-5.5-cp35-cp35m-manylinux1_x86_64.whl (240kB)
#20 4.087 Collecting flake8
#20 4.190   Downloading https://files.pythonhosted.org/packages/2a/cb/cd92e789442e234b8701bf6e886a55fbc83b7fd6e529b047e20b9cf196e8/flake8-3.9.0-py2.py3-none-any.whl (73kB)
#20 4.252 Collecting pylint-odoo
#20 4.321   Downloading https://files.pythonhosted.org/packages/74/ae/9cc3d387291e1e2866e0af906e1b76e766885867fe765a3f296f5879fd42/pylint-odoo-3.7.1.tar.gz (69kB)
#20 5.055     Complete output from command python setup.py egg_info:
#20 5.055     Couldn't find index page for 'pbr' (maybe misspelled?)
#20 5.055     No local packages or working download links found for pbr
#20 5.055     Traceback (most recent call last):
#20 5.055       File "<string>", line 1, in <module>
#20 5.055       File "/tmp/pip-build-9p5nu15_/pylint-odoo/setup.py", line 8, in <module>
#20 5.055         package_data={'': ['*.yaml']})
#20 5.055       File "/usr/local/lib/python3.5/distutils/core.py", line 108, in setup
#20 5.055         _setup_distribution = dist = klass(attrs)
#20 5.055       File "/qa/venv/lib/python3.5/site-packages/setuptools/dist.py", line 315, in __init__
#20 5.055         self.fetch_build_eggs(attrs['setup_requires'])
#20 5.055       File "/qa/venv/lib/python3.5/site-packages/setuptools/dist.py", line 361, in fetch_build_eggs
#20 5.055         replace_conflicting=True,
#20 5.055       File "/qa/venv/lib/python3.5/site-packages/pkg_resources/__init__.py", line 850, in resolve
#20 5.055         dist = best[req.key] = env.best_match(req, ws, installer)
#20 5.055       File "/qa/venv/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1122, in best_match
#20 5.055         return self.obtain(req, installer)
#20 5.055       File "/qa/venv/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1134, in obtain
#20 5.055         return installer(requirement)
#20 5.055       File "/qa/venv/lib/python3.5/site-packages/setuptools/dist.py", line 429, in fetch_build_egg
#20 5.055         return cmd.easy_install(req)
#20 5.055       File "/qa/venv/lib/python3.5/site-packages/setuptools/command/easy_install.py", line 659, in easy_install
#20 5.055         raise DistutilsError(msg)
#20 5.055     distutils.errors.DistutilsError: Could not find suitable distribution for Requirement.parse('pbr')

@ap-wtioit
Copy link
Contributor Author

ap-wtioit commented Apr 12, 2021

TODOs @ap-wtioit: there seems to be issues with merges defined in scaffoldings

@joao-p-marques
Copy link
Contributor

joao-p-marques commented Apr 12, 2021

@ap-wtioit thanks for your contribution once more 😃

Note: test errors seem unrelated to merge request and could not be reproduced locally:

Indeed, it is an unrelated error. Taking a look at some similar errors, it seems to be an issue with the Pypi mirrors or firewall restrictions in Github Actions, as I am also unable to reproduce this locally.

Maybe let's wait to see if it is fixed in some hours, and possibly file an issue to GH. 🤷‍♂️

@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch 2 times, most recently from 153e0f0 to f233d77 Compare April 13, 2021 08:57
Copy link
Contributor

@yajo yajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workaround seems good to me, although conceptually there's one change that we can't merge. It's commented below.

#!/usr/bin/env bash
set -e
# find all files where group or user are not odoo that are not a symlink, and fix permissions for those
# we use find and xargs because chown because chown -R can be very slow on certain systems: https://github.com/docker/for-linux/issues/388
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# we use find and xargs because chown because chown -R can be very slow on certain systems: https://github.com/docker/for-linux/issues/388
# we use find and xargs because chown -R can be very slow on certain systems: https://github.com/docker/for-linux/issues/388

@@ -0,0 +1,14 @@
#!/usr/bin/env bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind converting this to python (at least, the same but with plumbum)? We have a goal here of getting rid of as much bash as possible (see #7).

That way we get more readability (the code is a bit cumbersome.. not that it could be better, bash is this way...), and code formatting, and all that stuff for free.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure plumbum would make this more readable but i can try.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the advantages here is that it could drop the pipelined commands and the usage of xargs and instead use Python to read the output from one command, parse it, and pass it to the other, making it a lot more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well the point of this merge request is to make the build of scaffolding much faster (and therefore the tests). i'm pretty sure inventing our own xargs and find in python will not run as fast as just using the tools other people already worked hard on optimizing. in my experience custom code never runs as fast as the core linux utils.

I'll still try to implement this. But i also then need to verify the speed again (and i already worked a few weeks for this on our systems), so please be patient with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, of course, if the lost in performance is big, it might not be worth it, but the idea @yajo was mentioning was to use Plumbum to call those Linux tools (find, chown) and just use Python to control the flow and parse the results, that shouldn’t introduce a big overhead...

I'll still try to implement this. But i also then need to verify the speed again (and i already worked a few weeks for this on our systems), so please be patient with me.

Don’t worry, the contribution is very appreciated 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm pretty sure inventing our own xargs and find in python will not run as fast as just using the tools other people already worked hard on optimizing

You can still do the pipelining in plumbum and use xargs too.

At the end of the day, bash spawns a new subprocess for every command. Replacing that with python shouldn't impact negatively in performance (at least I never noticed anything significant). However, an easier to read syntax makes code easier to reason about. And we have linters, formatters, and (more important) more human experience on that language, so it helps in the long run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well as plumbum is python i wonder if some magic env variables for python have an effect on it. (E.g. -u or PYTHONUNBUFFERED to get stdout and stderr synced in python also seems to affect file io and i would expect also affecting pipelining in plumbum, but that's all on the test plan)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one more detail. I just realized we have the build.d/800-permissions file which was meant to fix permissions.

Probably it would make more sense to move this code there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well i was moving it to a seperate bin file because i wanted consistent permissions across all steps (to prevent changing permissions to often) and as bin is present all the way (build + run time) in the container but build.d sounded like someday it could be present just during build time. so i didn't want bin/autoaggregate invoking something in build.d and instead invoke bin/fix_permissions from bin/autoaggregate and build.d/800-fix_permissions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it might just be a java habbit in me to not cross invoke package
s (build.d -> bin -> build.d -> bin ...) but instead have a clean one directional dependency (build.d -> bin but never bin -> build.d)

If that's not an issue for you i can jump over my own shadow and just invoke build.d/800-fix_permissions in bin/autoaggregate. (and it might not be needed after all)

'test "$(stat -c \'%U:%G\' /opt/odoo/custom/src)" == "root:odoo"',
'test "$(stat -c \'%U:%G\' /opt/odoo/custom/src)" == "odoo:odoo"',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a safety belt against malicious code: odoo and addons code is unmodifiable by the odoo user itself.

Sorry but this must stay as it was... 🙄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will change that, i was not sure because /opt/odoo/custom/src was chowned to odoo:odoo in autoaggregate and then chowned to root:odoo in 800-permissions

how do root:odoo permissions work with dev mode (/opt/odoo/custom mapped to the users working directory)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do root:odoo permissions work with dev mode (/opt/odoo/custom mapped to the users working directory)?

The chown to root:odoo is done here:

chown -R root:odoo /opt/odoo

That's at build time.

However, in devel, your code is bind-mounted into the container, so whatever was in that directory at build time is overriden, and you can write to it.

Besides, invoke tasks in the template automatically set UID, GID and UMASK to make autoaggregate behave as you would, outside the container, to avoid developer friction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that's right, somehow i didn't realize the difference between build and runtime mappings. Thanks for the explanation.

'test "$(stat -c \'%U:%g\' /opt/odoo/custom/src)" == "root:20"',
'test "$(stat -c \'%U:%g\' /opt/odoo/custom/src)" == "odoo:20"',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also undo.

'test "$(stat -c \'%U:%G\' /opt/odoo/custom/src)" == "root:odoo"',
'test "$(stat -c \'%U:%G\' /opt/odoo/custom/src)" == "odoo:odoo"',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also undo.

# Chown recursively, if UID or GID are specified
# just to make sure everything is owned by odoo:odoo (should run empty)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is probably fake (see below).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by fake comment (so i can improve it)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry for the bare explanation.

I mean that not everything should be owned by odoo:odoo, but by $UID:$GID, in case they're specified. That's what the previous comment line says, and probably you were trying this chown because of #413 (comment).

So probably this comment is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying this chown because that's how it worked before (as long as i didn't missinterprete the code before):

  • checkout everything as root:root in autoaggregate (by using root for gitaggregate), writing the data once
  • chowning to odoo:odoo ($UID:$GID) with python in autoaggregate, writing the data twice
  • later chowning to root:odoo with 800-permissions, writing the data a third time
    And ideally with the fix/workarround the fix_permissions would not be needed anymore in aggregate (when we set the right permissions from the beginning). But unfortunately it still does change the .git (pack if i remeber correctly) readonly files from r--r----- to rw-r-----. Getting rid of this would make the code for fix_permissions even more complicated and unreadable and the performance impact was less than a second on our slow systems so it seemed not worth it.

So i would (if you agree that this is the right way)

  • set the UID to 0 here for gitaggregate (to make everything owned by user root, and using git config from root)
  • keep the GID code (to set the correct group right away, so everything should be assigned to group odoo)
  • adapt the comments (explaining that ideally fix_permissions should do nothing at this point)
  • also i would try to make the doodba DEBUG log level switch own the -c output for chmod and -v output for chown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok had some time to think the plan i mentioned above does not work:

  • autoaggregate is also invoked with setup-devel.yaml to checkout locally, and permissions there need to match UID:GID of the invoking user, otherwhise the user is not the owner of his own files

How would you feel about another user (having a checkout user that can be the same UID as the developer but a different UID from the runtime odoo user (sharing the odoo group together))?

@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch from 619eab0 to 98a0f5b Compare April 14, 2021 06:38
logger.debug(
"Error trying to chown on file. Skipping...", exc_info=True
)
check_call(["fix_permissions"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, you're changing the permissions in the system beforehand to avoid chowning later.

If that's the case, what's the point on doing this call? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to make sure we have the right permissions after autoaggregate to prevent changing it later again. I think i could get rid of it but i was not 100% sure, so i kept fixing the permission here. (currently it only changes some permissions in .git from r--r----- to rw-r-----)

@yajo
Copy link
Contributor

yajo commented Apr 15, 2021

I'll answer to #413 (comment) here because I think it can solve more questions than just that one, and can be helpful as a reference on why things are as they are. Maybe this will light some 💡 around 😄

As you know, we support devel, test and prod environments.

  • checkout everything as root:root in autoaggregate (by using root for gitaggregate), writing the data once

  • chowning to odoo:odoo ($UID:$GID) with python in autoaggregate, writing the data twice

The needs for devel are a bit specific: the volume must be bind-mounted in the container to develop comfortably. To make that possible, the developer needs RW access to all the files. And to make development faster, we don't want to force him to rebuild the image every time.

Thus, the invoke git-aggregate task calls the autoaggregate binary inside the container, with the specific UID, GID and UMASK required for it to work. Another gory detail is that invoke img-build builds the image assigning to the inner odoo user the same UID and GID as yourself outside of the container. Hopefully some day we'll be able to move development to podman and forget about all these things, but with docker it's what we have.

later chowning to root:odoo with 800-permissions, writing the data a third time

The important bit here is that 800-permissions runs on builds, while autoaggregate can run when a container was already built (in devel only, of course).

When we're building for test or prod, changing odoo user UID/GID is a mess because it would conflict with the UID/GID inside the backup container (for example), and possibly other things, so it's always 1000 (although it can be changed, but it's not really recommended).

As such, when we're building test or prod, those env variables don't really matter for autoaggregate, because anyways we're gonna chown it later in the same build (on 800-permissions).

Finally (this was already told, but it completes the picture), the fact of using root:odoo is a safety belt that effectively forbids a malicious addon to modify odoo code.


So, after putting all of these words together, I think that we can safely get rid of fix_permissions or 800-permissions, because:

  1. For devel, we can just modify UID/GID/UMASK before doing the operations, and that will simply work, without needing to change anything. That is container runtime.
  2. For test/prod, we can do the same as for devel, but exporting UID=root GID=odoo UMASK=u=rwX,g=rX,o= inside 100-repos-aggregate to make sure we aggregate properly in the 1st place. The latter chown in 800-permissions would do nothing, so it could be removed. That is image build time.

Or maybe you have a better idea?

@ap-wtioit
Copy link
Contributor Author

That sound's good, (i already tested a variant of exporting UID=0 inside 100-repos-aggregate before autoaggregate, but the full export including umask sound's much better) i will test that and update the pull request when it works (not if ;-) )

@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch from 98a0f5b to 74166b8 Compare April 15, 2021 13:45
@ap-wtioit
Copy link
Contributor Author

ap-wtioit commented Apr 15, 2021

TODOs @ap-wtioit:

  • add a test / ensure there is a test for devel autoaggregate permissions (autoaggregate run as root with odoo:odoo UID:GID during runtime)
  • remove fix_permissions calls where not needed
  • move fix_permissions code back into 800-fix_permissions
  • rework 800-fix_permissions to plumbum
  • prepare documentation for "breaking" fix for umask in autoaggregate (umask for autoaggregate needed to be given as decimal integer, now it needs to be given as octal integer which is way more common)

@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch 4 times, most recently from 2a5da68 to 28b3521 Compare April 21, 2021 06:54
@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch 10 times, most recently from 74d43f5 to 33764d7 Compare June 16, 2021 12:52
@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch 4 times, most recently from 679ead6 to 0c44361 Compare June 16, 2021 15:05
@ap-wtioit
Copy link
Contributor Author

ap-wtioit commented Jun 22, 2021

Template for change announcement in #67:

# Change of UMASK integer format to use common octal integers
## When will this happen?

Once #413 is merged, UMASK needs to be given as an octal integer.


## Why will this happen?

Umasks in linux usually are given as an octal integer when in integer form. Also all documentation mentioned a UMASK that pointed to octal integer form. But during autoaggregate the string given as UMASK was read a decimal integer which would give unexpected results.

## What does it break?

Users specifying a UMASK as decimal integer may get the wrong permissions after this change. Users following the documentation and specifying the umask as octal integer (e.g. `UMASK=$(umask)`, `UMASK=27` or using https://github.com/Tecnativa/doodba-copier-template/blob/main/tasks_downstream.py) should get the correct permissions with this change.


Previously linux umask 0027 (permit nothing for others, do not permit write for group) needed to be given as `UMASK=23`. Whereas `UMASK=27` would have been linux umask 0033 (do not permit read and write for group and other (but allow execute for group and others))

## Last non-breaking image/template version

TBD

@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch from 0c44361 to 0f5d59e Compare June 23, 2021 09:56
@ap-wtioit
Copy link
Contributor Author

@yajo @joao-p-marques sorry it took so long. from my point of view this would now be ready.

@joao-p-marques
Copy link
Contributor

Hi @ap-wtioit 👋

Thanks again for the work and contribution 👍
Let's just wait a few days until @yajo gets a chance to review again. I'll also try to review and test in the next days.

Copy link
Contributor

@yajo yajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All seems pretty good, thanks!

I only saw one thing that I think must have been forgotten in the dev loop, please check below.

@joao-p-marques please review.

13.0.Dockerfile Outdated Show resolved Hide resolved
@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch from 0f5d59e to c3359b6 Compare June 28, 2021 09:58
Copy link
Contributor

@joao-p-marques joao-p-marques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the contribution @ap-wtioit 👍

Just some minor comments (mostly doubts 😄). Otherwise, tested locally and everything looks good, it is taking nearly 1 minute less to build in my tests for v14 👍 🎉

bin/autoaggregate Show resolved Hide resolved
bin/autoaggregate Outdated Show resolved Hide resolved
@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch from c3359b6 to b14a1df Compare June 29, 2021 07:32
Copy link
Contributor

@joao-p-marques joao-p-marques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍

bin/autoaggregate Outdated Show resolved Hide resolved
by executing checkout with correct permissions and using consistent
fix permissions script
we force a non fast forward merge of ocb and ocb^1 to test
if git is correctly configured for autoaggregate
@ap-wtioit ap-wtioit force-pushed the master-improve_build_performance branch from b14a1df to d0c400a Compare June 29, 2021 07:59
Copy link

@wtaferner wtaferner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

@yajo yajo merged commit 7a414a6 into Tecnativa:master Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants