-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fabtests: Disable fi_rdm_tagged_peek for cleanup failure for psm3 and ucx #10124
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fi_rdm_tagged_peek fails to cleanup with "munmap_chunk(): invalid pointer" when trying to free hfi_nids in psm_ep.c:1161. This test is successful when FI_PROVIDER is unset and fails when it is set to "psm3" or "PSM3". There is an open issue in ofiwg/libfabric to track this bug. When it is resolved we can re-enable this test. Issue opened: 10123 Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>
fi_rdm_tagged_peek is failing on cleanup path. ft_free_res() -> ft_close_fids() -> fi_close() -> ucx_ep_close() -> ucp_worker_destroy() -> ucp_worker_discard_uct_ep_progress() -> ucp_ep_destroy_base() -> __funlockfile() The reported error is: "Segmentation fault: address not mapped to object at address 0x8" This is a race condition and does not occur every time. To reproduce run: server: fi_rdm_tagged_peek -p ucx -E client: fi_rdm_tagged_peek -p ucx -E server_address Issue 10126 is tracking this bug. Re-enable this test when it is resolved. Signed-off-by: Zach Dworkin <zachary.dworkin@intel.com>
zachdworkin
changed the title
fabtests/psm3: Disable fi_rdm_tagged_peek for cleanup failure
fabtests: Disable fi_rdm_tagged_peek for cleanup failure for psm3 and ucx
Jun 25, 2024
bot:aws:retest |
@zachdworkin AWS CI currently is broken due to a dependency issue. I will fix it shortly |
@shijin-aws Thanks for the head's up! Can you please replay this PR when its fixed? |
Yep, will do |
@shijin-aws since these changes are to the .exclude files for fabtests do we need to wait for aws ci? |
Yeah I think you can feel free to merge it. |
AWS CI doesn't run psm3 and ucx tests |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fi_rdm_tagged_peek fails to cleanup with "munmap_chunk(): invalid pointer" when trying to free hfi_nids in psm_ep.c:1161.
This test is successful when FI_PROVIDER is unset and fails when it is set to "psm3" or "PSM3". There is an open issue in ofiwg/libfabric to track this bug. When it is resolved we can re-enable this test.
Issue opened: #10123
fi_rdm_tagged_peek fails to cleanup with "segmentation failt" when trying to cleanup the endpoint.
This failure is a race condition and has no known 100% fail case.
Issue opened: #10126