-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Do not remove .pyc and .pyo files after building Python #58944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+15
−3
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are: Before 7.63GB After 7.66GB So we have images bigger by < 0.5% And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created: https://lwn.net/Articles/814535/ Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started. While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use `exec` in containers to run Health Checks. Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory. By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem. This PR likely: Fixes: apache#58509 Fixes: apache#42195
7bd2319 to
dc83db4
Compare
This was referenced Dec 2, 2025
Lee-W
approved these changes
Dec 2, 2025
Member
Author
|
mypy checks unrelated being fixed independently |
github-actions bot
pushed a commit
that referenced
this pull request
Dec 2, 2025
…58944) With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are: Before 7.63GB After 7.66GB So we have images bigger by < 0.5% And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created: https://lwn.net/Articles/814535/ Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started. While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use `exec` in containers to run Health Checks. Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory. By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem. This PR likely: (cherry picked from commit bcda508) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Fixes: #58509 Fixes: #42195
2 tasks
2 tasks
github-actions bot
pushed a commit
to aws-mwaa/upstream-to-airflow
that referenced
this pull request
Dec 2, 2025
…pache#58944) With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are: Before 7.63GB After 7.66GB So we have images bigger by < 0.5% And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created: https://lwn.net/Articles/814535/ Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started. While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use `exec` in containers to run Health Checks. Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory. By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem. This PR likely: (cherry picked from commit bcda508) Co-authored-by: Jarek Potiuk <jarek@potiuk.com> Fixes: apache#58509 Fixes: apache#42195
ephraimbuddy
pushed a commit
that referenced
this pull request
Dec 2, 2025
…58944) (#58947) With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are: Before 7.63GB After 7.66GB So we have images bigger by < 0.5% And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created: https://lwn.net/Articles/814535/ Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started. While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use `exec` in containers to run Health Checks. Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory. By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem. This PR likely: (cherry picked from commit bcda508) Fixes: #58509 Fixes: #42195 Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
ephraimbuddy
pushed a commit
that referenced
this pull request
Dec 3, 2025
…58944) (#58947) With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are: Before 7.63GB After 7.66GB So we have images bigger by < 0.5% And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created: https://lwn.net/Articles/814535/ Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started. While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use `exec` in containers to run Health Checks. Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory. By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem. This PR likely: (cherry picked from commit bcda508) Fixes: #58509 Fixes: #42195 Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
RoyLee1224
pushed a commit
to RoyLee1224/airflow
that referenced
this pull request
Dec 3, 2025
With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are: Before 7.63GB After 7.66GB So we have images bigger by < 0.5% And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created: https://lwn.net/Articles/814535/ Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started. While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use `exec` in containers to run Health Checks. Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory. By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem. This PR likely: Fixes: apache#58509 Fixes: apache#42195
itayweb
pushed a commit
to itayweb/airflow
that referenced
this pull request
Dec 6, 2025
With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are: Before 7.63GB After 7.66GB So we have images bigger by < 0.5% And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created: https://lwn.net/Articles/814535/ Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started. While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use `exec` in containers to run Health Checks. Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory. By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem. This PR likely: Fixes: apache#58509 Fixes: apache#42195
Contributor
|
thanks @potiuk for the explanation, that's really interesting 👍 |
Member
Author
The credit goes to @arkadiuszbach who has done a fantastic analysis and found the reason :) #42195 (comment) |
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area:dev-tools
area:production-image
Production image improvements and fixes
backport-to-v3-1-test
Mark PR with this label to backport to v3-1-test branch
kind:documentation
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With .pyc files removal after compilation we save very little space. Uncompressed sizes of regular airflow image are:
Before 7.63GB
After 7.66GB
So we have images bigger by < 0.5%
And it seems that long running containers without those files can suffer from continuous attempts to recreate the .pyc files that fail due to lack of permissions and cause negative dentries to be continuously created:
https://lwn.net/Articles/814535/
Those negative dentries are created by kernel - caching the fact that a file was not available - which speeds up lookup but also takes a bit of memory. It seems that when compiled Python has the .pyc files removed, it tries to recreate them with timestamped entries every time new interpreter is started.
While this is not a problem for long running processes - because those interpreters are run exactly once per container, this is a problem if you use
execin containers to run Health Checks.Evey health-check creates a new interpreter and every time it is created, a new negative dentries to take kernel memory.
By not removing the .pyc files we increase a bit the size of the image but improve a little the startup time (no need to compile Python internal .py files, as well as get rid of the negative dentries problem.
This PR likely:
Fixes: #58509
Fixes: #42195
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.