Skip to content

[CI Failure]: Distributed tests fail comparing -O1 and -O0 in compile/test_basic_correctness.py::test_compile_correctness[test_setting5] #26454

@ProExpertProg

Description

@ProExpertProg

Name of failing test

tests/compile/test_basic_correctness.py::test_compile_correctness[test_setting5]

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

It seems like the logprobs no longer match between compilation levels/modes 0 and 1:

[2025-10-08T05:24:19Z] FAILED compile/test_basic_correctness.py::test_compile_correctness[test_setting5] - AssertionError: Results for model='microsoft/Phi-3.5-vision-instruct' are not the same.
[2025-10-08T05:24:19Z] ref_args=['--enforce-eager', '--trust-remote-code', '--max-model-len', '2048', '-pp', '2', '-tp', '1', '-O0'] ref_envs={}
[2025-10-08T05:24:19Z] compare_args=['--enforce-eager', '--trust-remote-code', '--max-model-len', '2048', '-pp', '2', '-tp', '1', '-O1'] compare_envs={}
[2025-10-08T05:24:19Z] ref_result={'test': 'text_image', 'logprobs': [TopLogprob(token='reichen', bytes=[114, 101, 105, 99, 104, 101, 110], logprob=-10.37547492980957), TopLogprob(token='Serv', bytes=[83, 101, 114, 118], logprob=-10.375476837158203), TopLogprob(token='жи', bytes=[208, 182, 208, 184], logprob=-10.375476837158203), TopLogprob(token='Wars', bytes=[87, 97, 114, 115], logprob=-10.375476837158203), TopLogprob(token='grow', bytes=[103, 114, 111, 119], logprob=-10.375476837158203)]}
[2025-10-08T05:24:19Z] compare_result={'test': 'text_image', 'logprobs': [TopLogprob(token='reichen', bytes=[114, 101, 105, 99, 104, 101, 110], logprob=-10.375475883483887), TopLogprob(token='vars', bytes=[118, 97, 114, 115], logprob=-10.375476837158203), TopLogprob(token='Serv', bytes=[83, 101, 114, 118], logprob=-10.37547779083252), TopLogprob(token='grow', bytes=[103, 114, 111, 119], logprob=-10.37547779083252), TopLogprob(token='жи', bytes=[208, 182, 208, 184], logprob=-10.37547779083252)]}

📝 History of failing test

It seems like this started appearing this week. I can try to bisect tomorrow

CC List.

@zou3519 @houseroad

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions