Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASSERT tool.drcachesim.scattergather test: tracer.cpp:394: towrite <= ipc_pipe.get_atomic_write_size() && towrite > 0 #5329

Closed
derekbruening opened this issue Feb 5, 2022 · 9 comments · Fixed by #6313

Comments

@derekbruening
Copy link
Contributor

tool.drcachesim.scattergather failed:

https://github.com/DynamoRIO/dynamorio/runs/5077131960?check_suite_focus=true

ASSERT FAILURE: /home/runner/work/dynamorio/dynamorio/clients/drcachesim/tracer/tracer.cpp:394: towrite <= ipc_pipe.get_atomic_write_size() && towrite > 0 ()

I don't remember ever seeing that assert before on any drcachesim test.

@derekbruening
Copy link
Contributor Author

This was the 32-bit x86 test

@derekbruening
Copy link
Contributor Author

@abhinav92003
Copy link
Contributor

I ran the 32-bit test 1000 times on my machine, and couldn't reproduce the failure:

$ ctest -VV -R 'tool.drcachesim.scattergather' --repeat-until-fail 1000
...
1/1 Test #284: code_api|tool.drcachesim.scattergather ...   Passed    1.51 sec
 
The following tests passed:
	code_api|tool.drcachesim.scattergather
 
100% tests passed, 0 tests failed out of 1
 
Total Test time (real) = 1547.39 sec

I can try running it on the Github Actions runner using tmate. In the past, we've had failures which reproduce only on GA.

abhinav92003 added a commit that referenced this issue Feb 12, 2022
Add tmate to help reproduce this failure on a GA runner.

Issue: #5329
@abhinav92003
Copy link
Contributor

I ran this 1000 times on a GA runner, no failure encountered. Maybe it reproduces on only some runners? If that is so, I'll need to try this multiple times.

ctest -VV -R 'tool.drcachesim.scattergather' --repeat-until-fail 1000
...
1/1 Test #286: code_api|tool.drcachesim.scattergather ...   Passed    2.06 sec

The following tests passed:
        code_api|tool.drcachesim.scattergather

100% tests passed, 0 tests failed out of 1

Total Test time (real) = 2024.27 sec

@dolanzhao
Copy link
Contributor

My PR #5515 failed to pass test 286 multiple times on the x86_32 action.
https://github.com/DynamoRIO/dynamorio/runs/7048743130?check_suite_focus=true

@abhinav92003
Copy link
Contributor

Looks like the following tests are failing deterministically on my machine (and Derek's) now:

	221 - code_api|client.drx-scattergather (Failed)
	248 - code_api|sample.memval_simple_scattergather (Failed)
	307 - code_api|tool.drcachesim.scattergather (Failed)

Would make it easier to debug! Will take it up sometime soon.

@derekbruening
Copy link
Contributor Author

Xref a different failure in the same test: #5747

@abhinav92003
Copy link
Contributor

Looks like the following tests are failing deterministically on my machine (and Derek's) now:

The failure I found on my machine was due to a different error, which I fix in #5764. The original assert failure in this issue shows up on our Ubuntu22 CI though.

@derekbruening
Copy link
Contributor Author

This test can fail with this pipe assert 3x in a row, failing even with the retry-3x feature from #2204 for #5873

@abhinav92003 abhinav92003 self-assigned this Sep 18, 2023
abhinav92003 added a commit that referenced this issue Sep 18, 2023
Avoids going over the atomic write size while waiting for enough
entries to write to IPC pipe. If we find we went over, we just
write till the last ok-to-split ref.

Fixes: #5329
abhinav92003 added a commit that referenced this issue Sep 19, 2023
Avoids going over the atomic write size while waiting for enough entries
to write to IPC pipe. If we find we went over, we just write till the
last ok-to-split ref.

tool.drcachesim.scattergather-x86 worked fine 100x on local workstation
with this fix.

This issue manifests on this test because some scatter-gather
instruction has eight consecutive memrefs which puts us over the
threshold suddenly without any opportunity to write the buffer before
all the memrefs.

Fixes: #5329
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants