Skip to content

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Dec 2, 2025

With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are:

Before 7.63GB
After 7.66GB

So we have images bigger by < 0.5%

And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created:

https://lwn.net/Articles/814535/

Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started.

While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use exec in containers to run Health Checks.

Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory.

By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem.

This PR likely:

Fixes: #58509
Fixes: #42195


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@potiuk potiuk added this to the Airflow 3.1.4 milestone Dec 2, 2025
@potiuk potiuk added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Dec 2, 2025
@boring-cyborg boring-cyborg bot added area:dev-tools area:production-image Production image improvements and fixes kind:documentation labels Dec 2, 2025
@potiuk potiuk changed the title Do note remove .pyc and .pyo files after building Python Do not remove .pyc and .pyo files after building Python Dec 2, 2025
With .pyc files removal after compilation we save very little
space. Uncompressed sizes of regular airflow image are:

Before  7.63GB
After   7.66GB

So we have images bigger by < 0.5%

And it seems that long running containers without those files can
suffer from continuous attempts to recreate the .pyc files that
fail due to lack of permissions and cause negative dentries to
be continuously created:

https://lwn.net/Articles/814535/

Those negative dentries are created by kernel - caching the fact
that a file was not available - which speeds up lookup but also
takes a bit of memory. It seems that when compiled Python has
the .pyc files removed, it tries to recreate them with timestamped
entries every time new interpreter is started.

While this is not a problem for long running processes - because
those interpreters are run exactly once per container, this is
a problem if you use `exec` in containers to run Health Checks.

Evey health-check creates a new interpreter and every time it is
created, a new negative dentries to take kernel memory.

By not removing the .pyc files we increase a bit the size of the
image but improve a little the startup time (no need to compile
Python internal .py files, as well as get rid of the negative
dentries problem.

This PR likely:

Fixes: apache#58509
Fixes: apache#42195
@potiuk
Copy link
Member Author

potiuk commented Dec 2, 2025

mypy checks unrelated being fixed independently

@potiuk potiuk merged commit bcda508 into apache:main Dec 2, 2025
97 of 98 checks passed
@potiuk potiuk deleted the do-not-remove-pyc-files branch December 2, 2025 15:04
github-actions bot pushed a commit that referenced this pull request Dec 2, 2025
…58944)

With .pyc files removal after compilation we save very little
space. Uncompressed sizes of regular airflow image are:

Before  7.63GB
After   7.66GB

So we have images bigger by < 0.5%

And it seems that long running containers without those files can
suffer from continuous attempts to recreate the .pyc files that
fail due to lack of permissions and cause negative dentries to
be continuously created:

https://lwn.net/Articles/814535/

Those negative dentries are created by kernel - caching the fact
that a file was not available - which speeds up lookup but also
takes a bit of memory. It seems that when compiled Python has
the .pyc files removed, it tries to recreate them with timestamped
entries every time new interpreter is started.

While this is not a problem for long running processes - because
those interpreters are run exactly once per container, this is
a problem if you use `exec` in containers to run Health Checks.

Evey health-check creates a new interpreter and every time it is
created, a new negative dentries to take kernel memory.

By not removing the .pyc files we increase a bit the size of the
image but improve a little the startup time (no need to compile
Python internal .py files, as well as get rid of the negative
dentries problem.

This PR likely:
(cherry picked from commit bcda508)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Fixes: #58509
Fixes: #42195
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Backport successfully created: v3-1-test

Status Branch Result
v3-1-test PR Link

@potiuk potiuk linked an issue Dec 2, 2025 that may be closed by this pull request
2 tasks
github-actions bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Dec 2, 2025
…pache#58944)

With .pyc files removal after compilation we save very little
space. Uncompressed sizes of regular airflow image are:

Before  7.63GB
After   7.66GB

So we have images bigger by < 0.5%

And it seems that long running containers without those files can
suffer from continuous attempts to recreate the .pyc files that
fail due to lack of permissions and cause negative dentries to
be continuously created:

https://lwn.net/Articles/814535/

Those negative dentries are created by kernel - caching the fact
that a file was not available - which speeds up lookup but also
takes a bit of memory. It seems that when compiled Python has
the .pyc files removed, it tries to recreate them with timestamped
entries every time new interpreter is started.

While this is not a problem for long running processes - because
those interpreters are run exactly once per container, this is
a problem if you use `exec` in containers to run Health Checks.

Evey health-check creates a new interpreter and every time it is
created, a new negative dentries to take kernel memory.

By not removing the .pyc files we increase a bit the size of the
image but improve a little the startup time (no need to compile
Python internal .py files, as well as get rid of the negative
dentries problem.

This PR likely:
(cherry picked from commit bcda508)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Fixes: apache#58509
Fixes: apache#42195
ephraimbuddy pushed a commit that referenced this pull request Dec 2, 2025
…58944) (#58947)

With .pyc files removal after compilation we save very little
space. Uncompressed sizes of regular airflow image are:

Before  7.63GB
After   7.66GB

So we have images bigger by < 0.5%

And it seems that long running containers without those files can
suffer from continuous attempts to recreate the .pyc files that
fail due to lack of permissions and cause negative dentries to
be continuously created:

https://lwn.net/Articles/814535/

Those negative dentries are created by kernel - caching the fact
that a file was not available - which speeds up lookup but also
takes a bit of memory. It seems that when compiled Python has
the .pyc files removed, it tries to recreate them with timestamped
entries every time new interpreter is started.

While this is not a problem for long running processes - because
those interpreters are run exactly once per container, this is
a problem if you use `exec` in containers to run Health Checks.

Evey health-check creates a new interpreter and every time it is
created, a new negative dentries to take kernel memory.

By not removing the .pyc files we increase a bit the size of the
image but improve a little the startup time (no need to compile
Python internal .py files, as well as get rid of the negative
dentries problem.

This PR likely:
(cherry picked from commit bcda508)


Fixes: #58509
Fixes: #42195

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
ephraimbuddy pushed a commit that referenced this pull request Dec 3, 2025
…58944) (#58947)

With .pyc files removal after compilation we save very little
space. Uncompressed sizes of regular airflow image are:

Before  7.63GB
After   7.66GB

So we have images bigger by < 0.5%

And it seems that long running containers without those files can
suffer from continuous attempts to recreate the .pyc files that
fail due to lack of permissions and cause negative dentries to
be continuously created:

https://lwn.net/Articles/814535/

Those negative dentries are created by kernel - caching the fact
that a file was not available - which speeds up lookup but also
takes a bit of memory. It seems that when compiled Python has
the .pyc files removed, it tries to recreate them with timestamped
entries every time new interpreter is started.

While this is not a problem for long running processes - because
those interpreters are run exactly once per container, this is
a problem if you use `exec` in containers to run Health Checks.

Evey health-check creates a new interpreter and every time it is
created, a new negative dentries to take kernel memory.

By not removing the .pyc files we increase a bit the size of the
image but improve a little the startup time (no need to compile
Python internal .py files, as well as get rid of the negative
dentries problem.

This PR likely:
(cherry picked from commit bcda508)


Fixes: #58509
Fixes: #42195

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
RoyLee1224 pushed a commit to RoyLee1224/airflow that referenced this pull request Dec 3, 2025
With .pyc files removal after compilation we save very little
space. Uncompressed sizes of regular airflow image are:

Before  7.63GB
After   7.66GB

So we have images bigger by < 0.5%

And it seems that long running containers without those files can
suffer from continuous attempts to recreate the .pyc files that
fail due to lack of permissions and cause negative dentries to
be continuously created:

https://lwn.net/Articles/814535/

Those negative dentries are created by kernel - caching the fact
that a file was not available - which speeds up lookup but also
takes a bit of memory. It seems that when compiled Python has
the .pyc files removed, it tries to recreate them with timestamped
entries every time new interpreter is started.

While this is not a problem for long running processes - because
those interpreters are run exactly once per container, this is
a problem if you use `exec` in containers to run Health Checks.

Evey health-check creates a new interpreter and every time it is
created, a new negative dentries to take kernel memory.

By not removing the .pyc files we increase a bit the size of the
image but improve a little the startup time (no need to compile
Python internal .py files, as well as get rid of the negative
dentries problem.

This PR likely:

Fixes: apache#58509
Fixes: apache#42195
itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025
With .pyc files removal after compilation we save very little
space. Uncompressed sizes of regular airflow image are:

Before  7.63GB
After   7.66GB

So we have images bigger by < 0.5%

And it seems that long running containers without those files can
suffer from continuous attempts to recreate the .pyc files that
fail due to lack of permissions and cause negative dentries to
be continuously created:

https://lwn.net/Articles/814535/

Those negative dentries are created by kernel - caching the fact
that a file was not available - which speeds up lookup but also
takes a bit of memory. It seems that when compiled Python has
the .pyc files removed, it tries to recreate them with timestamped
entries every time new interpreter is started.

While this is not a problem for long running processes - because
those interpreters are run exactly once per container, this is
a problem if you use `exec` in containers to run Health Checks.

Evey health-check creates a new interpreter and every time it is
created, a new negative dentries to take kernel memory.

By not removing the .pyc files we increase a bit the size of the
image but improve a little the startup time (no need to compile
Python internal .py files, as well as get rid of the negative
dentries problem.

This PR likely:

Fixes: apache#58509
Fixes: apache#42195
@raphaelauv
Copy link
Contributor

thanks @potiuk for the explanation, that's really interesting 👍

@potiuk
Copy link
Member Author

potiuk commented Dec 9, 2025

thanks @potiuk for the explanation, that's really interesting 👍

The credit goes to @arkadiuszbach who has done a fantastic analysis and found the reason :) #42195 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:production-image Production image improvements and fixes backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch kind:documentation

Projects

None yet

3 participants