Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named 'pkg_resources' #4756

Open
3 tasks
jkotas opened this issue Jan 8, 2025 · 25 comments
Open
3 tasks

ModuleNotFoundError: No module named 'pkg_resources' #4756

jkotas opened this issue Jan 8, 2025 · 25 comments

Comments

@jkotas
Copy link
Member

jkotas commented Jan 8, 2025

Build

https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=910866

Build leg reported

ComInterfaceGenerator.Tests.WorkItemExecution

Pull Request

dotnet/runtime#110558

Known issue core information

Fill out the known issue JSON section by following the step by step documentation on how to create a known issue

 {
    "ErrorMessage" : "ModuleNotFoundError: No module named 'pkg_resources'",
    "BuildRetry": false,
    "ErrorPattern": "",
    "ExcludeConsoleLog": false
 }

@dotnet/dnceng

Release Note Category

  • Feature changes/additions
  • Bug fixes
  • Internal Infrastructure Improvements

Release Note Description

Additional information about the issue reported

No response

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=910866
Error message validated: [ModuleNotFoundError: No module named 'pkg_resources']
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 1/8/2025 10:20:43 PM UTC

Report

Build Definition Test Pull Request
912698 dotnet/runtime System.Formats.Tar.Manual.Tests.WorkItemExecution dotnet/runtime#111259
912093 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#110736
912683 dotnet/runtime System.Diagnostics.TraceSource.Config.Tests.WorkItemExecution dotnet/runtime#111136
912677 dotnet/runtime System.Drawing.Primitives.Tests.WorkItemExecution dotnet/runtime#111257
912500 dotnet/runtime System.Xml.Schema.Extensions.Tests.WorkItemExecution dotnet/runtime#108542
912659 dotnet/runtime System.Diagnostics.Debug.Tests.WorkItemExecution dotnet/runtime#110676
912635 dotnet/runtime System.DirectoryServices.Protocols.Tests.WorkItemExecution dotnet/runtime#109378
912630 dotnet/runtime System.Formats.Asn1.Tests.WorkItemExecution dotnet/runtime#105004
2616941 dotnet-dotnet-monitor Microsoft.Diagnostics.Monitoring.WebApi.UnitTests-net8.0-arm64.WorkItemExecution
912604 dotnet/runtime System.Data.DataSetExtensions.Tests.WorkItemExecution
912586 dotnet/runtime System.Dynamic.Runtime.Tests.WorkItemExecution dotnet/runtime#107115
912573 dotnet/runtime System.Diagnostics.Tracing.Tests.WorkItemExecution
912570 dotnet/runtime System.Globalization.Calendars.Tests.WorkItemExecution
912534 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#111255
912516 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#111254
912509 dotnet/runtime System.Formats.Cbor.Tests.WorkItemExecution dotnet/runtime#111253
912442 dotnet/runtime System.Drawing.Primitives.Tests.WorkItemExecution dotnet/runtime#111108
911419 dotnet/runtime System.Diagnostics.DiagnosticSource.Switches.Tests.WorkItemExecution dotnet/runtime#110688
912421 dotnet/runtime System.Diagnostics.Tracing.Tests.WorkItemExecution
912412 dotnet/runtime System.Formats.Cbor.Tests.WorkItemExecution dotnet/runtime#108799
912382 dotnet/runtime System.Diagnostics.Contracts.Tests.WorkItemExecution dotnet/runtime#111170
912373 dotnet/runtime System.Formats.Cbor.Tests.WorkItemExecution dotnet/runtime#111214
912360 dotnet/runtime System.Formats.Cbor.Tests.WorkItemExecution dotnet/runtime#111245
912354 dotnet/runtime System.Formats.Tar.Manual.Tests.WorkItemExecution dotnet/runtime#104906
912090 dotnet/runtime jit64_2.WorkItemExecution dotnet/runtime#109493
912311 dotnet/runtime System.Diagnostics.TraceSource.Config.Tests.WorkItemExecution dotnet/runtime#111136
912307 dotnet/runtime System.Diagnostics.FileVersionInfo.Tests.WorkItemExecution dotnet/runtime#110676
912238 dotnet/runtime System.Diagnostics.Contracts.Tests.WorkItemExecution dotnet/runtime#111247
910866 dotnet/runtime System.Formats.Nrbf.Tests.WorkItemExecution dotnet/runtime#110558
912285 dotnet/runtime System.DirectoryServices.Protocols.Tests.WorkItemExecution dotnet/runtime#111218
912281 dotnet/runtime System.Diagnostics.TextWriterTraceListener.Tests.WorkItemExecution dotnet/runtime#111093
912271 dotnet/runtime System.Drawing.Primitives.Tests.WorkItemExecution dotnet/runtime#110818
912245 dotnet/runtime System.Diagnostics.TextWriterTraceListener.Tests.WorkItemExecution dotnet/runtime#111215
912224 dotnet/runtime System.Diagnostics.DiagnosticSource.Tests.WorkItemExecution dotnet/runtime#106309
912248 dotnet/runtime System.DirectoryServices.Protocols.Tests.WorkItemExecution dotnet/runtime#109530
912330 dotnet/runtime Microsoft.XmlSerializer.Generator.Tests.WorkItemExecution dotnet/runtime#111179
912206 dotnet/runtime System.Diagnostics.DiagnosticSource.Tests.WorkItemExecution dotnet/runtime#111244
912191 dotnet/runtime System.Diagnostics.StackTrace.Tests.WorkItemExecution dotnet/runtime#101264
912186 dotnet/runtime System.Diagnostics.DiagnosticSource.Tests.WorkItemExecution dotnet/runtime#101264
912163 dotnet/runtime System.Drawing.Primitives.Tests.WorkItemExecution dotnet/runtime#109091
912141 dotnet/runtime Regressions.WorkItemExecution dotnet/runtime#110268
912147 dotnet/runtime Methodical_d1.WorkItemExecution dotnet/runtime#110269
912144 dotnet/runtime Regression_2.WorkItemExecution dotnet/runtime#110271
912107 dotnet/runtime System.Diagnostics.TraceSource.Tests.WorkItemExecution dotnet/runtime#105004
912096 dotnet/runtime System.Formats.Tar.Manual.Tests.WorkItemExecution dotnet/runtime#111108
912074 dotnet/runtime System.Drawing.Primitives.Tests.WorkItemExecution dotnet/runtime#110281
911863 dotnet/runtime System.Xml.Linq.Streaming.Tests.WorkItemExecution dotnet/runtime#106309
912039 dotnet/runtime System.Formats.Cbor.Tests.WorkItemExecution dotnet/runtime#111237
912035 dotnet/runtime System.Diagnostics.Tools.Tests.WorkItemExecution dotnet/runtime#111236
911999 dotnet/runtime System.DirectoryServices.Protocols.Tests.WorkItemExecution dotnet/runtime#111235
911922 dotnet/runtime System.Globalization.CalendarsWithConfigSwitch.Tests.WorkItemExecution dotnet/runtime#111209
911878 dotnet/runtime System.Formats.Tar.Manual.Tests.WorkItemExecution dotnet/runtime#110856
911835 dotnet/runtime System.Text.Json.SourceGeneration.Roslyn4.4.Unit.Tests.WorkItemExecution
911855 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#107766
911813 dotnet/runtime System.Linq.Parallel.Tests.WorkItemExecution
911799 dotnet/runtime System.Net.Sockets.Tests.WorkItemExecution dotnet/runtime#110692
911851 dotnet/runtime System.Runtime.ReflectionInvokeEmit.Tests.WorkItemExecution dotnet/runtime#111213
911830 dotnet/runtime System.Formats.Asn1.Tests.WorkItemExecution
911816 dotnet/runtime System.Formats.Cbor.Tests.WorkItemExecution
911754 dotnet/runtime System.Formats.Asn1.Tests.WorkItemExecution dotnet/runtime#111229
911757 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#111230
911742 dotnet/runtime System.Diagnostics.StackTrace.Tests.WorkItemExecution dotnet/runtime#107283
911722 dotnet/runtime System.IO.FileSystem.DriveInfo.Tests.WorkItemExecution dotnet/runtime#107118
911717 dotnet/runtime System.Configuration.ConfigurationManager.Tests.WorkItemExecution dotnet/runtime#111178
911709 dotnet/runtime System.IO.UnmanagedMemoryStream.Tests.WorkItemExecution dotnet/runtime#111227
911613 dotnet/runtime System.IO.FileSystem.DriveInfo.Tests.WorkItemExecution dotnet/runtime#111149
911700 dotnet/runtime System.Net.Primitives.Functional.Tests.WorkItemExecution dotnet/runtime#111222
910999 dotnet/runtime System.Net.NameResolution.Functional.Tests.WorkItemExecution dotnet/runtime#111213
911665 dotnet/runtime System.Net.Primitives.Functional.Tests.WorkItemExecution dotnet/runtime#111220
911636 dotnet/runtime System.DirectoryServices.Protocols.Tests.WorkItemExecution dotnet/runtime#111226
911627 dotnet/runtime System.IO.Hashing.Tests.WorkItemExecution
911619 dotnet/runtime System.Diagnostics.Tools.Tests.WorkItemExecution dotnet/runtime#111225
911601 dotnet/runtime System.Diagnostics.Tools.Tests.WorkItemExecution dotnet/runtime#110780
911600 dotnet/runtime System.Diagnostics.DiagnosticSource.Tests.WorkItemExecution
911590 dotnet/runtime System.IO.FileSystem.DriveInfo.Tests.WorkItemExecution dotnet/runtime#106309
911561 dotnet/runtime System.Globalization.Calendars.Tests.WorkItemExecution dotnet/runtime#111223
911564 dotnet/runtime GC-scenarios1.WorkItemExecution dotnet/runtime#111145
911557 dotnet/runtime System.IO.UnmanagedMemoryStream.Tests.WorkItemExecution dotnet/runtime#111222
911501 dotnet/runtime System.Diagnostics.Tracing.Tests.WorkItemExecution dotnet/runtime#105004
911496 dotnet/runtime System.DirectoryServices.Protocols.Tests.WorkItemExecution dotnet/runtime#110935
911284 dotnet/runtime System.ComponentModel.TypeConverter.Tests.WorkItemExecution dotnet/runtime#111179
911179 dotnet/runtime JIT_ro.WorkItemExecution dotnet/runtime#109493
2616242 dotnet-dotnet-monitor Microsoft.Diagnostics.Monitoring.WebApi.UnitTests-net8.0-x64.WorkItemExecution
2616241 dotnet-dotnet-monitor Microsoft.Diagnostics.Monitoring.WebApi.UnitTests-net9.0-x64.WorkItemExecution
911297 dotnet/runtime System.Diagnostics.TraceSource.Config.Tests.WorkItemExecution dotnet/runtime#105004
911248 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#111218
911210 dotnet/runtime System.Diagnostics.Tracing.Tests.WorkItemExecution dotnet/runtime#111220
911182 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#110736
911186 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#111108
911131 dotnet/runtime System.Runtime.Extensions.Tests.WorkItemExecution
911128 dotnet/runtime System.Formats.Tar.Manual.Tests.WorkItemExecution dotnet/runtime#109207
2616084 dotnet-dotnet-monitor Microsoft.Diagnostics.Monitoring.WebApi.UnitTests-net9.0-x64.WorkItemExecution
911117 dotnet/runtime System.Drawing.Primitives.Tests.WorkItemExecution dotnet/runtime#111219
911114 dotnet/runtime System.Formats.Tar.Tests.WorkItemExecution dotnet/runtime#109378
911029 dotnet/runtime JIT.performance.WorkItemExecution
911052 dotnet/runtime System.Threading.Timer.Tests.WorkItemExecution dotnet/runtime#109364
911033 dotnet/runtime iOS.Simulator.Aot-Llvm.Test.WorkItemExecution
911068 dotnet/runtime System.Diagnostics.Tools.Tests.WorkItemExecution dotnet/runtime#111204
911046 dotnet/runtime System.Diagnostics.TraceSource.Tests.WorkItemExecution dotnet/runtime#111209
911020 dotnet/runtime System.Diagnostics.Tracing.Tests.WorkItemExecution dotnet/runtime#111215
Displaying 100 of 106 results

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 106
@dougbu
Copy link
Member

dougbu commented Jan 8, 2025

we're tracking work on this in dotnet/runtime#4751. we found the problem another way and noticed it seems to be specific to ArmArch Linux Docker containers just yesterday. hoping to move away from the deprecated and, now, apparently sometimes unavailable package in time to include the fix in our next rollout

am I correct the containers failing in your build are quite infrequently used❓

@jkotas
Copy link
Member Author

jkotas commented Jan 8, 2025

This is failing on many PRs in dotnet/runtime. The affected containers are used by default dotnet/runtime CI configuration.

@jkotas
Copy link
Member Author

jkotas commented Jan 8, 2025

This is failing on many PRs in dotnet/runtime

You can see it in the stats in the top post.

@jkotas
Copy link
Member Author

jkotas commented Jan 9, 2025

it seems to be specific to ArmArch Linux

This affects number of Linux and macOS variants.

For example, here is a log from macOS x64: https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-111218-merge-42cc8e62a78b4da997/Microsoft.Extensions.Configuration.Tests/1/console.47b8555d.log?helixlogtype=result

@akoeplinger
Copy link
Member

akoeplinger commented Jan 9, 2025

I think the common factor is having Python 3.12, e.g. if you look at https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-111218-merge-42cc8e62a78b4da997/ComInterfaceGenerator.Tests/1/console.aa96da09.log?helixlogtype=result from the same job as Jan posted above it worked, because the dci-mac-build-133 macOS machine is not using Python 3.12 (I guess because it wasn't updated yet)

@dougbu
Copy link
Member

dougbu commented Jan 9, 2025

I think the common factor is having Python 3.12, e.g. if you look at https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-111218-merge-42cc8e62a78b4da997/ComInterfaceGenerator.Tests/1/console.aa96da09.log?helixlogtype=result from the same job as Jan posted above it worked, because the dci-mac-build-133 macOS machine is not using Python 3.12 (I guess because it wasn't updated yet)

I agree. it also seems like the problem is only partially under our direct control. we use pkg_resources in a couple of places but so does the azure package version we rely on. other dependencies seem to have the right try / catch setup to fallback when pkg_resources isn't available.


unless we missed some older builds that failed, problems started just after our rollout began updating queues yesterday. the earliest error I saw had timestamp 2025-01-08T21:20:05.2091700Z and the queue rollout started real work at 2025-01-08T20:45:17.1221544Z (slightly late due to retries of earlier jobs in our pipeline). the rollout picked up new OS packages for Python3 on Linux machines as well as slightly changed Python requirements for our helix-scripts/ code. the second part might have impacted OSX machines

@dougbu
Copy link
Member

dougbu commented Jan 9, 2025

Python 3.12.8 released on 2024-12-03. it contains a bunch of fixes and dependency updates though nothing obviously linked to pkg_resources. it did upgrade its bundled pip to 24.3.1 but the pip changelog also doesn't include an obvious smoking gun. it also upgraded its libexpat dependency to 2.6.3 but that's written in C

I checked the setuptools changelog as well b/c we let that float as much as the Python and pip versions allow. nothing obvious there either

@dougbu
Copy link
Member

dougbu commented Jan 10, 2025

we're reverting yesterday's rollout due to the problems discussed in this issue. queues are getting updated as I type this note

@dougbu
Copy link
Member

dougbu commented Jan 10, 2025

revert is now complete but I don't see a clear signal that this particular problem has been resolved. please let us know

@akoeplinger
Copy link
Member

Seems to be working again. Though I noticed that e.g. a build on dci-mac-build-108 which failed before is still printing No module named 'pkg_resources', but it seems to not be an error anymore.

https://helixr1107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-111257-merge-ab67e12681af4b7b92/System.Collections.NonGeneric.Tests/1/console.05a00ae9.log?helixlogtype=result

@dougbu
Copy link
Member

dougbu commented Jan 10, 2025

Seems to be working again. Though I noticed that e.g. a build on dci-mac-build-108 which failed before is still printing No module named 'pkg_resources', but it seems to not be an error anymore.

that's very interesting. I can only guess why an exception turned into a simple message

@garath garath self-assigned this Jan 13, 2025
@ilyas1974
Copy link
Contributor

@jkotas with the rollout revert we performed, it appears that the issue has been mitigated (per the telemetry above, no new build failures have been detected in over a week). Is there any reason to continue to keep this issue open?

@jkotas
Copy link
Member Author

jkotas commented Jan 21, 2025

I agree - this can be closed.

@jkotas jkotas closed this as completed Jan 21, 2025
@garath
Copy link
Member

garath commented Jan 21, 2025

We're using this issue to track a proper fix to the deployments. (Though of course I may have missed a conversation, so let me know if this was resolved elsewhere.)

@garath garath reopened this Jan 21, 2025
@dougbu
Copy link
Member

dougbu commented Jan 22, 2025

We're using this issue to track a proper fix to the deployments. (Though of course I may have missed a conversation, so let me know if this was resolved elsewhere.)

totally agree. the revert seemed to avoid the problem but this issue tracks making sure our next rollout doesn't break Python scenarios again. we're close but not done

@dougbu
Copy link
Member

dougbu commented Jan 22, 2025

if you care about the details, our Helix wrapping code for Linux machines hit some issues using sudo python:

  1. sudo isn't available in all Docker images
  2. our invocation lost information about the venv most Docker images set up to isolate installations from the system environment. so, we weren't installing the components we needed or expected. setuptools (which contains pkg_resources) was part of this
  3. !46915 avoids the two issues above
  4. however, I need to test on a recent macOS machine to confirm there isn't another problem lurking in our code.
  5. in addition, it's difficult to tell what changed in the affected Docker images and machines leading to the problems. haven't found anything in the Python Changelog that should be related. put another way, the notes about 3.12.8 look innocuous but obviously something somewhere changed — likely a set of overlapping somethings since our revert helped

@dougbu
Copy link
Member

dougbu commented Jan 22, 2025

Machine dci-mac-build-294inosx.1200.amd64.open has been disabled for this testing. will re-enable when I'm done

@dougbu
Copy link
Member

dougbu commented Jan 23, 2025

that machine was unreachable. switched to using dci-macpro-20 in the staging osx.1200.amd64 queue. it's temporarily disabled…

@dougbu
Copy link
Member

dougbu commented Jan 25, 2025

not quite done w/ testing but I'm pretty sure the remaining No module named 'pkg_resources' message is completely unrelated to the original problem. that run.py file is part of the dotnet/runtime test infrastructure and seems to depend on something conditionally using pkg_resources, emitting a message when it's not found. I suspect the message started showing about when we began using a venv for our use — an effort to isolate our python usage from both the system Python environment and any test actions

overlapping this is the fact our venv use is incomplete b/c we don't re-image our on-premises machines that often

to avoid such messages about pkg_resources (if they indicate an actual problem), you probably need to bump your Python package versions or perhaps make sure you're not relying on our pip installations to provide your dependencies. a venv is probably a great idea for your use case too

@dougbu
Copy link
Member

dougbu commented Jan 25, 2025

ugh, I was wrong. reporter/run.py sometimes shows up in other Helix console logs

@dougbu
Copy link
Member

dougbu commented Jan 27, 2025

builds are expiring and getting deleted. here's an example of the original osx.1200.amd64 failure:

+ /usr/local/bin/python3.12 -u /tmp/helix/working/AF510960/w/AAD7090B/u/xharness-event-processor.py
Traceback (most recent call last):
  File "/tmp/helix/working/AF510960/w/AAD7090B/u/xharness-event-processor.py", line 8, in <module>
    from helix.public import request_reboot, request_infra_retry, send_metric, send_metrics
  File "/etc/helix/scripts/helix/public/__init__.py", line 5, in <module>
    import helix.event
  File "/etc/helix/scripts/helix/event.py", line 7, in <module>
    import helix.logs
  File "/etc/helix/scripts/helix/logs.py", line 11, in <module>
    from helix.azure_utils import get_auth_credential
  File "/etc/helix/scripts/helix/azure_utils.py", line 1, in <module>
    from azure.identity import ManagedIdentityCredential, CredentialUnavailableError
  File "/etc/helix/scripts/azure/__init__.py", line 5, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'
+ /usr/local/bin/python3.12 /tmp/helix/working/AF510960/p/reporter/run.py https://dev.azure.com/dnceng-public/ public 24025496 eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6IjdSd2F5dmlYRHFoZnN6MTZSNmxPbXNXWWxTQSJ9.eyJuYW1laWQiOiJjNzczZjJjMi01MTIwLTQyMDctYWZlMi1hZmFmMzVhOGJjMGEiLCJzY3AiOiJhcHBfdG9rZW4iLCJhdWkiOiI2OTY3ODM3OC0yYjMxLTQwZjAtYTZiYi0zMmViOGRkNzMyZTUiLCJzaWQiOiI2YzZlNjYyOC05OTg1LTQ2OTYtYjA1Ny1kMjAxYjhjYjQ1MjkiLCJCdWlsZElkIjoiY2JiMTgyNjEtYzQ4Zi00YWJiLTg2NTEtOGNkY2I1NDc0NjQ5OzkxMTAzMyIsIkRlZklkIjoiMTU0Iiwiam9icmVmIjoiYWY1MjgyMTEtNmMzZi00MDdjLTg1ZTctNDg1ZTA3MjU1YzJkOmEyYTM5ZDZiLTcwYzUtNTA0MC05MzNhLTRiOGI2ZTUyYmYxOSIsInBwaWQiOiJ2c3RmczovLy9CdWlsZC9CdWlsZC85MTEwMzMiLCJvcmNoaWQiOiJhZjUyODIxMS02YzNmLTQwN2MtODVlNy00ODVlMDcyNTVjMmQuYnVpbGQuYnVpbGRfbWFjY2F0YWx5c3RfeDY0X3JlbGVhc2VfYWxsc3Vic2V0c19tb25vLl9fZGVmYXVsdCIsInJlcG9JZHMiOiIiLCJpc3MiOiJhcHAudnN0b2tlbi52aXN1YWxzdHVkaW8uY29tIiwiYXVkIjoiYXBwLnZzdG9rZW4udmlzdWFsc3R1ZGlvLmNvbXx2c286NmZjYzkyZTUtNzNhNy00Zjg4LThkMTMtZDkwNDViNDVmYjI3IiwibmJmIjoxNzM2MzcxNjU2LCJleHAiOjE3MzYzODM2NTZ9.WM0rVtEnKSyuVNaconMyilHoFd8xtqB3880x3PMgRyiWPqZ13fRQHH_R18dT4wos1Wt2625WIcrr1nda_05HUljQ8HV4iHt6Xrd4oEO28KH4RJXOyTobdr7mVov_T0y_D3hMRfB1X4hEcMJZxW7BOpitv49L0PDVgOu2AXAAWlwGoIbibHGYFR5OCJ5RQnanGpSpyrCqe0Ky6fmjwvgCTNuwTdTDfZkRN3JQeLM1573AKBTQ7GDauNV79wHk_DHNUCKT9OfPGsVQM6VA5Z_RG_g-zTIR7a2-B7IhwEtezBYRhDgHsG7KWAevo1CQHqBZZ3byB3PwuxWHmfc4mLHz4g
Traceback (most recent call last):
  File "/tmp/helix/working/AF510960/p/reporter/run.py", line 13, in <module>
    from test_results_reader import read_results
  File "/private/tmp/helix/working/AF510960/p/reporter/test_results_reader/__init__.py", line 3, in <module>
    from helix.public import TestResult, TestResultAttachment
  File "/etc/helix/scripts/helix/public/__init__.py", line 5, in <module>
    import helix.event
  File "/etc/helix/scripts/helix/event.py", line 7, in <module>
    import helix.logs
  File "/etc/helix/scripts/helix/logs.py", line 11, in <module>
    from helix.azure_utils import get_auth_credential
  File "/etc/helix/scripts/helix/azure_utils.py", line 1, in <module>
    from azure.identity import ManagedIdentityCredential, CredentialUnavailableError
  File "/etc/helix/scripts/azure/__init__.py", line 5, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'

reporter/run.py and xharness-event-processor.py come from dotnet/arcade. run.py executes unconditionally as the last command in $(HelixPostCommands). xharness-event.processor.py is embedded in the Helix SDK and added to $(HelixPostCommands) when '$(EnableXHarnessTelemetry)' == 'true'. both are run using $HELIX_PYTHON_PATH, which will be the most recent python3.* available on the machine. these commands execute in a slightly different Python environment than the background python3.* processes. those background processes are from dotnet-helix-machines and are not associated with a particular work item

I suspect problems causing the user Python environment to be empty (lacking any of our usual dependencies) e.g., Python packages are installed per-version and a version bump can break things. but I searched through all of the logs above to find the macOS failures among the builds that remain. could only find three such logs, all running on just two machines — dci-mac-build-108 and dci-mac-build-147. dci-mac-build-147 seems to be in a sorry state at the moment and I need to file an ICM about it

I'm trying to work w/ dci-mac-build-108 today…

dougbu added a commit to dougbu/arcade that referenced this issue Jan 28, 2025
- see dotnet/dnceng#4756
- with this, executing `python3` should not result in `pkg_resources not found` error
  - note direct Python dependencies in dotnet/arcade include only default modules
  - change in Microsoft.DotNet.Build.Tasks.Installers likely less important
@dougbu
Copy link
Member

dougbu commented Jan 31, 2025

sorry for not reporting back here. we had a total of three problems leading to the No module named 'pkg_resources' failures — the two sudo problems listed near the top of #4756 (comment) plus the macOS issues due to having Python 3.12 or 3.13 on them.

the reason for the problems w/ Python 3.12+ was PEP 668 enforcement, which is particularly stringent in macOS. messing with the system or user Python environments was strongly discouraged and failed unless you removed some files that really shouldn't be touched (EXTERNALLY-MANAGED marker files). we stopped trying to fight Python and switched to brew install python-setuptools when configuring macOS machines

you'll notice these issues are all long-standing problems that went unnoticed. it's as if something introduced set -e somewhere. I haven't found that but continue to look…

@jkotas
Copy link
Member Author

jkotas commented Jan 31, 2025

@dougbu These failures started hitting dotnet/runtime CI heavily again earlier today. dotnet/runtime#112083 has the details. Could you please take a look?

@dougbu
Copy link
Member

dougbu commented Jan 31, 2025

new issue likely has a similar root cause to this problem i.e., something I can't find switching set -e on. I see errors about failing chmod commands. is it possible to see if warnings in similar runs yesterday occurred in that test run❓

regardless, I believe we (DNCEng) need to revert our rollout again 😦

@dougbu
Copy link
Member

dougbu commented Feb 3, 2025

using this issue to track the problems addressed to date. will track the latest problems in dotnet/runtime#112083

note we need another small PR to include the macOS configuration changes (adding the python-setuptools HomeBrew package) in future imaging procedures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants