LLVM lit sharding is non-deterministic

Repros with `main` as of ea32e86deed6e8a8cd6116dc275e358a901d2b50, where our submodule is https://github.com/llvm/llvm-project/commit/b8d38e8b4fcab071c5c4cb698e154023d06de69e.

We use LLVM lit's "sharding" feature to split our large test suite across 8 VMs per architecture:

https://github.com/microsoft/STL/blob/ea32e86deed6e8a8cd6116dc275e358a901d2b50/azure-devops/native-build-test.yml#L21

We expected that for a given set of tests, this would exactly partition them across the VMs, with no duplicates and no missed tests. That is, we expected the sharding algorithm to be *deterministic*, even across different machines (because these VMs run independently), as long as the set of tests is the same on each machine (which it is, because they've checked out the same commit). There should be no sensitivity to filesystem enumeration order, time of day, or anything else. (However, it's okay if adding/removing a single test radically changes the subset shards, and it's okay for each subset shard to be run in a totally randomized/shuffled order.)

We're observing non-deterministic behavior from lit. This was originally observed in #2793, whose initial version mistakenly had 2 tests that were XPASSing - but the XPASSes showed up only for x86, when nothing was architecture-sensitive. Looking at x64, we didn't see the affected tests running in *any* of the 8 shards, even though other tests from that subdirectory ran.

Eventually, we found that this repros locally! No machine variation is needed - simply two consecutive runs on the same machine, at the same commit.

<details>
<summary>Click to expand example:</summary>

```
D:\GitHub\STL\out\build\x64>python tests\utils\stl-lit\stl-lit.py ..\..\..\llvm-project\libcxx\test\std\language.support\support.limits\support.limits.general --num-shards=8 --run-shard=1 -o testing_x64.log
stl-lit.py: D:\GitHub\STL\llvm-project\llvm\utils\lit\lit\main.py:193: note: Selecting shard 1/8 = size 15/114 = tests #(8*k)+1 = [1, 9, 17, ...]
-- Testing: 15 of 114 tests, 15 workers --
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/version.version.pass.cpp:0 (1 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/tuple.version.pass.cpp:0 (2 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/algorithm.version.pass.cpp:0 (3 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/functional.version.pass.cpp:0 (4 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/numbers.version.pass.cpp:0 (5 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/latch.version.pass.cpp:0 (6 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/coroutine.version.pass.cpp:0 (7 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/cmath.version.pass.cpp:0 (8 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/semaphore.version.pass.cpp:0 (9 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/stack.version.pass.cpp:0 (10 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/map.version.pass.cpp:0 (11 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/barrier.version.pass.cpp:0 (12 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/queue.version.pass.cpp:0 (13 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/unordered_set.version.pass.cpp:0 (14 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/execution.version.pass.cpp:0 (15 of 15)

Testing Time: 2.70s
  Excluded         : 99
  Passed           : 11
  Expectedly Failed:  4

D:\GitHub\STL\out\build\x64>python tests\utils\stl-lit\stl-lit.py ..\..\..\llvm-project\libcxx\test\std\language.support\support.limits\support.limits.general --num-shards=8 --run-shard=1 -o testing_x64.log
stl-lit.py: D:\GitHub\STL\llvm-project\llvm\utils\lit\lit\main.py:193: note: Selecting shard 1/8 = size 15/114 = tests #(8*k)+1 = [1, 9, 17, ...]
-- Testing: 15 of 114 tests, 15 workers --
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/typeinfo.version.pass.cpp:0 (1 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/iterator.version.pass.cpp:0 (2 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/optional.version.pass.cpp:0 (3 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/string.version.pass.cpp:0 (4 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/algorithm.version.pass.cpp:0 (5 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/memory.version.pass.cpp:0 (6 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/cstddef.version.pass.cpp:0 (7 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/format.version.pass.cpp:0 (8 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/coroutine.version.pass.cpp:0 (9 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/array.version.pass.cpp:0 (10 of 15)
XFAIL: libc++ :: std/language.support/support.limits/support.limits.general/chrono.version.pass.cpp:0 (11 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/scoped_allocator.version.pass.cpp:0 (12 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/map.version.pass.cpp:0 (13 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/vector.version.pass.cpp:0 (14 of 15)
PASS: libc++ :: std/language.support/support.limits/support.limits.general/execution.version.pass.cpp:0 (15 of 15)

Testing Time: 2.69s
  Excluded         : 99
  Passed           :  7
  Expectedly Failed:  8
```
</details>

Note that each time, this is "Selecting shard 1/8", in a directory with variously PASSing and XFAILing tests. Yet the number of PASSes and XFAILs varies. Sorting and diffing the tests reveals how these are partially-overlapping subsets (e.g. both of them ran `execution.version.pass.cpp`, but `array.version.pass.cpp` and `barrier.version.pass.cpp` were run in only one of the subsets).

I haven't located exactly where in lit's Python implementation this is happening, but I would expect that after enumerating all available tests and before running the "select every Nth test" shard algorithm (I do see the latter in the code), there should be a step that sorts the available tests, so that the result is always deterministic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LLVM lit sharding is non-deterministic #2794

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LLVM lit sharding is non-deterministic #2794

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions