Skip to content

Conversation

@Naghasan
Copy link

The KernelFusionCmd keeps a references to the queue on which it was submitted. However these commands are also kept on the side in order to ensure event validity and are only reset when start fusion is called again on the same queue.
This forces queue_impl and events to be kept alive as the command keeps the last reference until the sycl shutdown procedure.

The patch forces 1) the command to reset it pointers to the queue once enqueued and 2) to clean up the KernelFusionCmd attached to the queue to free up events.

Note: there is no new test as kernel fusion already cover this and I'm not too sure on how to approach it. I'm happy to try out any suggestion.

The KernelFusionCmd keeps a references to the queue on which it was submitted.
However these commands are also kept on the side in order
to ensure event validity and are only reset when start fusion is called
again on the same queue.
This forces queue_impl and events to be kept alive as the command keeps
the last reference until the sycl shutdown procedure.

The patch forces 1) the command to reset it pointers to the queue
once enqueued and 2) to clean up the KernelFusionCmd
attached to the queue to free up events.

Signed-off-by: Victor Lomuller <victor@codeplay.com>
@Naghasan Naghasan requested a review from a team as a code owner August 23, 2023 21:27
@Naghasan Naghasan requested a review from cperkinsintel August 23, 2023 21:27
@Naghasan Naghasan changed the title [SYCL][KernelFusion] Ensure queue_impl is not unnecessary referenced [SYCL][KernelFusion] Ensure queue_impl is not unnecessarily referenced Aug 24, 2023
Copy link
Contributor

@cperkinsintel cperkinsintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks good.

Question: mightn't it be better to hold weak pointers to the queue_impls? Then you might avoid the whole rigamarole.

@Naghasan
Copy link
Author

I don't think you can, the command needs the queue for its execution so it needs to stick around. This actually issue is the need for MFusionMap, but that should hopefully go away when we switch to the graph based version.

@Naghasan
Copy link
Author

@intel/llvm-gatekeepers this is now approved, can we have it merged ? thanks

@steffenlarsen steffenlarsen merged commit a33b0a9 into intel:sycl Sep 18, 2023
@jsji
Copy link
Contributor

jsji commented Sep 22, 2023

@Naghasan This is causing a memory corruption in ExtensionsTests.

tools/sycl/unittests/Extensions/./ExtensionsTests': **corrupted double-linked list:** 0x00000000011aa760 ***

It fails a lot in our internal test env, but can also be reproduced in ubuntu 22, just harder to reproduce.
But should be able to reproduce by running it around 100.

for i in {1..100} ; do echo "Running test $i... "; tools/sycl/unittests/Extensions/ExtensionsTests ; done

againull pushed a commit that referenced this pull request Sep 26, 2023
Test is a scheduler instance is alive before before attempting to
cleanup fusion commands.

Hopefully addresses
#10948 (comment) (I
cannot locally reproduce the issue)

Signed-off-by: Victor Lomuller <victor@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants