Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate linter.config.jobs in cgroupsv2 environments #10089

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DominicLavery
Copy link

@DominicLavery DominicLavery commented Nov 21, 2024

Type of Changes

Type
βœ“ πŸ› Bug fix
βœ“ ✨ New feature

Description

In containers running on cgroupv2 systems _query_cpu currently returns None. This results in sched_getaffinity being used, which will normally return all installed CPUs of the host. This can result in crashes with the error:

concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

The changes here use the CPU quota in v2 systems. max 100000 is the default value, and will continue to result in the hosts CPU count being used.

cpu.weight (the replacement of cpu shares from v1) could also be used, but as it's impact on CPU scheduling is relative to the rest of the cgroup hierarchy it isn't possible to get an accurate value on all systems where pylint may be run. Whereas this method is reliable on any container with a CPU quota

Closes #10103

This comment has been minimized.

Copy link

codecov bot commented Nov 23, 2024

Codecov Report

All modified and coverable lines are covered by tests βœ…

Project coverage is 95.80%. Comparing base (55098c7) to head (972f45e).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main   #10089   +/-   ##
=======================================
  Coverage   95.80%   95.80%           
=======================================
  Files         174      174           
  Lines       18973    18992   +19     
=======================================
+ Hits        18177    18196   +19     
  Misses        796      796           
Files with missing lines Coverage Ξ”
pylint/lint/run.py 89.18% <100.00%> (+1.59%) ⬆️

This comment has been minimized.

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for the PR. I just have some questions about structure and placement of the tests, but the code LGTM!

@@ -65,6 +65,18 @@ def _query_cpu() -> int | None:
cpu_shares = int(file.read().rstrip())
# For AWS, gives correct value * 1024.
avail_cpu = int(cpu_shares / 1024)
elif Path("/sys/fs/cgroup/cpu.max").is_file():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this take precedence over cpu.shares? If the new file is present it should probably be preferred over the old one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect them to co-exist so just went with a quick grouping around v1 vs v2 but it probably makes sense just in case.
I've pushed a new commit which prefers the v2 files and I think also makes the logic a bit clearer

from pylint.testutils.utils import _test_cwd


@pytest.mark.parametrize(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this grouped with the other tests? I don't think we need a separate file as we don't follow the "file + test_file" structure

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a bit of confusion around using protected members in tests python. Figured it out and merged the tests back into the other file

@DominicLavery
Copy link
Author

Hey @DanielNoord! Thanks so much for the review. I've replied to your comments and pushed some fixes

This comment has been minimized.

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @DominicLavery thank you for the PR ! would you mind adding a new changelog please ? (By adding a file here: https://github.com/pylint-dev/pylint/tree/main/doc/whatsnew/fragments)

@DominicLavery
Copy link
Author

Hey @DominicLavery thank you for the PR ! would you mind adding a new changelog please ? (By adding a file here: https://github.com/pylint-dev/pylint/tree/main/doc/whatsnew/fragments)

Thanks @Pierre-Sassoulas! Absolutely :) I've pushed that change

@DominicLavery
Copy link
Author

Ah I suspect now that the v2 checks are happening first, I need to add a mock along the lines of

        if args[0] == "/sys/fs/cgroup/cpu.max":
            return MagicMock(is_file=lambda: False)

to test_pylint_run_jobs_equal_zero_dont_crash_with_cpu_fraction
to make sure the v1 path is still tested

This comment has been minimized.

This comment has been minimized.

@Pierre-Sassoulas Pierre-Sassoulas added this to the 3.3.3 milestone Dec 2, 2024
@Pierre-Sassoulas
Copy link
Member

I moved some code around to minimize the diff and make more apparent that the existing code was not modified. This should help with the coverage job being unhappy.

@Pierre-Sassoulas
Copy link
Member

(But better test coverage would be appreciated if you feel inclined to test the existing code better than it was originally πŸ˜„ )

Copy link
Contributor

github-actions bot commented Dec 2, 2024

πŸ€– According to the primer, this change has no effect on the checked open source code. πŸ€–πŸŽ‰

This comment was generated for commit 972f45e

@DanielNoord
Copy link
Collaborator

Sorry @DominicLavery, life got very busy all of a sudden and I didn't have time to finish this review. If @Pierre-Sassoulas approves the PR I'm happy as well! Thanks for your contribution! :)

@DominicLavery
Copy link
Author

No worries @DanielNoord!

@Pierre-Sassoulas I've added some extra tests and created shared mock set ups to dedupe a bit. Hope this helps :)

@Pierre-Sassoulas
Copy link
Member

Thank you @DominicLavery appreciated. It seems some tests are failing now, not sure about why in the pylint job, it's a strange fail. Let me know if I need to approve the pipeline again so it runs. Or we might merge the previous version so you can run pipelines automatically on the refactor.

@jacobtylerwalls
Copy link
Member

jacobtylerwalls commented Dec 7, 2024

I think both failures are explained by #10114.

@DominicLavery
Copy link
Author

Thanks jacobtylerwalls! I've subscribed to the issues and will rebase when I see them closed

@jacobtylerwalls
Copy link
Member

Pardon my pushing to your branch: I just want to test out the proposed fix in the astroid repo and see how far that gets us.

@jacobtylerwalls

This comment was marked as resolved.

@jacobtylerwalls jacobtylerwalls modified the milestones: 3.3.3, 3.3.4 Dec 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crashes in containers on Cgroupsv2 based hosts
4 participants