Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test: bzlmod integration test on MacOS #1261

Closed
rickeylev opened this issue Jun 8, 2023 · 4 comments · Fixed by #1266 or #1541
Closed

flaky test: bzlmod integration test on MacOS #1261

rickeylev opened this issue Jun 8, 2023 · 4 comments · Fixed by #1266 or #1541
Labels
cleanup Tech debt, resolving it improves our own velocity

Comments

@rickeylev
Copy link
Contributor

rickeylev commented Jun 8, 2023

This is a tracking bug to collect info on the "bzlmod integration test on MacOS" to hopefully help figure out why it's failing. They usually pass after a couple retries

The error is typically something about the "Middleman" missing pyc files.

Example error:

ERROR: /Users/<snip>/rules-python-python/examples/<snip>: Middleman _middlemen/requirements_Utest-runfiles failed: missing input file '<snip>/big5freq.cpython-39.pyc.4316963824'

Current theories:

  • Issue with the build machines (wouldn't be the first time)

  • hosts with failures:

    • bk-imacpro-2 (x2)
    • bk-imacpro-3
    • mk-imacpro-9
    • bk-imacpro-11 (x3)
    • bk-imacpro-12
    • bk-imacpro-18
  • hosts with successes:

    • bk-imacpro-2
    • mk-imacpro-6
    • bk-imacpro-15 x2
    • bk-imacpro-18

Failing builds

https://buildkite.com/bazel/rules-python-python/builds/5036#01889b48-56c1-448c-bbac-cf37dad80b84

https://buildkite.com/bazel/rules-python-python/builds/5037#01889b60-d830-4827-ae25-a9c37e7c8ce8

https://buildkite.com/bazel/rules-python-python/builds/5029#

https://buildkite.com/bazel/rules-python-python/builds/5047#01889c11-eae2-4402-9ba5-81620511ea18

https://buildkite.com/bazel/rules-python-python/builds/5050#01889c3a-1c95-4f53-a538-0c7434773a5b

https://buildkite.com/bazel/rules-python-python/builds/5047#01889c11-eae2-4402-9ba5-81620511ea18

https://buildkite.com/bazel/rules-python-python/builds/5066#0188a254-986a-4bdc-a972-220424b5349a

https://buildkite.com/bazel/rules-python-python/builds/5067#0188a258-ed2a-455a-b5fe-3148f93d1231

https://buildkite.com/bazel/rules-python-python/builds/5084#0188b31f-dbab-4e48-9726-fe872304785c

Successful builds

https://buildkite.com/bazel/rules-python-python/builds/5039#01889b82-d0cc-4133-8712-cc11392043d0

https://buildkite.com/bazel/rules-python-python/builds/5029#01889bb5-4ed4-42c9-9816-4be86d9da129

https://buildkite.com/bazel/rules-python-python/builds/5067#_

https://buildkite.com/bazel/rules-python-python/builds/5084#0188b327-c85e-4f27-ac14-db9697815997

@rickeylev rickeylev added type: bug cleanup Tech debt, resolving it improves our own velocity and removed type: bug labels Jun 8, 2023
@rickeylev
Copy link
Contributor Author

It looks like all the files that are reported missing have the format:

@rules_python~override~internal_deps~pypi__pip//:pip/_vendor/<somepath>/__pycache__/<somename>.cpython-39.pyc.<someinteger>

i.e. it's always something with a file in pip's pycache directory, and always some integer as the suffix.

I can't tell what the integer is, though. Perhaps a hash? Some sample numbers are 4313755696, 4422117616, 4409426928, 4422117616, 4571366992, 4325443600, 4494442544. They aren't timestamps -- those are in the year 2112-ish.

This also only happens on Macs.

@rickeylev
Copy link
Contributor Author

All the failures are for "Middleman _middlemen/requirements.update-runfiles", too. This is the generated {name}.update that compile_pip_requirements generates, I think.

So, maybe this isn't flaky hosts afterall. That there is a pycache directory strikes me as odd; maybe some python is running, adding files, and then a glob is picking them up? And maybe it's happening during a build, or only turns into an error due to timing or ordering?

@rickeylev
Copy link
Contributor Author

rickeylev commented Jun 13, 2023

new theory: the root bug (the temp pyc.N files being inputs) has been present for awhile, but it wasn't until Bazel CI enabled remote caching back in May (thus causing the machine-specific pyc.N files to be listed as inputs that other machines see) that it triggered more regularly.

I wouldn't be surprised if there was a race bug, too -- that something creates pyc.N files, meanwhile something else globs the directory, then the pyc.N creator cleans up, and the globber has one of its inputs go missing.

I think the fix is to modify the files that are excluded from the whl_library directories. It already excludes *.pyc; it just needs to also exclude *.pyc.*

I think these spots:

https://github.com/bazelbuild/rules_python/blob/main/python/pip_install/tools/wheel_installer/wheel_installer.py#L228

https://github.com/bazelbuild/rules_python/blob/main/python/pip_install/repositories.bzl#L108

And probably also this one, for good measure:

https://github.com/bazelbuild/rules_python/blob/main/python/repositories.bzl#L221

The pyc.N files are somewhat expected. They're temp files created by python during pyc creation and use id(path) as a suffix: https://github.com/python/cpython/blob/main/Lib/importlib/_bootstrap_external.py#L195

rickeylev added a commit to rickeylev/rules_python that referenced this issue Jun 13, 2023
We ignore pyc files most everywhere (because they aren't
deterministic), but part of the pyc creation process involves creating
temporary files named `*.pyc.NNN`. Though these are supposed to be
temporary files nobody sees, they seem to get picked up by a glob
somewhere, somehow. I'm unable to figure out how that is happening, but
ignoring them in the glob expressions should also suffice.

Fixes bazelbuild#1261
@aignas
Copy link
Collaborator

aignas commented Jun 13, 2023

Thanks for detailing the investigation here.

gitlab-dfinity pushed a commit to dfinity/ic that referenced this issue Sep 22, 2023
… 'master'

Update rules_python to the latest version.

The version contains fix for
bazelbuild/rules_python#1261 that we also
observe from time to time: https://dash.sf1-idx1.dfinity.network/invocation/8093f9cb-633f-4c6f-a9d4-a8f899dd47bc 

See merge request dfinity-lab/public/ic!14948
rickeylev added a commit to rickeylev/rules_python that referenced this issue Nov 6, 2023
Part of the pyc compilation process is to create a temporary file named
`<name>.pyc.NNNN`, where `NNNN` is a timestamp. Once the pyc is entirely
written, this file is renamed to the regular pyc file name. These files
only exist for brief periods of time, but its possible for different
threads/processes to see the temporary files when computing the glob()
values. Later, since the file is gone, an error is raised about the file
missing.

PR bazelbuild#1266 mostly fixed this issue, except that the glob exclude for an
interpreter runtime's files was behind the `ignore_root_user_error`
flag, which meant it wasn't always applied. This changes it to always be
applied, which should eliminate the failures.

Fixes bazelbuild#1261

Work towards bazelbuild#1520
github-merge-queue bot pushed a commit that referenced this issue Nov 7, 2023
…1541)

Part of the pyc compilation process is to create a temporary file named
`<name>.pyc.NNNN`, where `NNNN` is a timestamp. Once the pyc is entirely
written, this file is renamed to the regular pyc file name. These files
only exist for brief periods of time, but its possible for different
threads/processes to see the temporary files when computing the glob()
values. Later, since the file is gone, an error is raised about the file
missing.

PR #1266 mostly fixed this issue, except that the exclude for the
`.pyc.NNNN` files for an
interpreter runtime's files was behind the `ignore_root_user_error`
flag, which meant it
wasn't always applied. This changes it to always be applied, which
should eliminate the
failures due to the missing NNNN files.

Fixes #1261

Work towards #1520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Tech debt, resolving it improves our own velocity
Projects
None yet
2 participants