[EventPipe] Optimize ThreadSessionState Management and Cleanup #118029

mdh1418 · 2025-07-24T17:25:27Z

This PR implements a comprehensive cleanup and optimization of EventPipe's ThreadSessionState management to address performance issues in scenarios with many short-lived threads.

Background

Throughout an EventPipe Session's lifetime, threads that write events induce allocations of EventPipeThread, EventPipeThreadSessionState, EventPipeBufferList, EventPipeBuffer, and EventPipeSequencePoint objects. Many of these objects were held throughout the entire session lifetime instead of just their scope of utility, leading to:

Performance degradation due to bloated thread_session_state_list containing exhausted instances
Memory inefficiency from retaining objects for threads that have exited
Suboptimal event discovery as the buffer manager holds instances whose buffers are exhausted and will never receive new events

This PR does the following:

ThreadSessionState Cleanup: Added cleanup measures to remove ThreadSessionState instances whose buffer_list is empty and the corresponding thread has been unregistered, indicating the thread has exited and will not write new events.
Sequence Point Map Optimization: Moved sequence point map updates to occur in tandem with event processing rather than relying on the potentially trimmed thread_session_state_list. The update now happens after processing all available events from a thread.
Direct Thread ID Storage: Simplified sequence point handling by directly storing native thread IDs in the sequence point map instead of dynamically retrieving them through EventPipeThreadSessionState, eliminating associated ref counting complexity.
Buffer Manager State Tracking: Enhanced the buffer manager to track the current ThreadSessionState being processed, enabling proper cleanup while maintaining the ability to re-add instances that receive new events.
Code Quality Improvements: Updated loops to use foreach macros, consolidated sequence point logic, and improved code organization.

Testing

Running the repro provided in the original issue,

Before

Q:\source\mdh1418\TestApp> corerun .\bin\Debug\net10.0\win-x64\publish\TestApp.dll 10000 10000
Press any key to start spawning threads and throwing exceptions
Enabling CLR EventPipe

Throwing and catching 1000 exceptions (warm up JIT)

Throwing and catching 10000 exceptions
RSS: 79.5MiB, user CPU: 96.9%
40000 EventPipe events received, 10000 exceptions thrown

Spawning 10000 short-lived threads
RSS: 118.7MiB, user CPU: 195.8%
32485 EventPipe events received, 10000 exceptions thrown

Throwing and catching 10000 exceptions
RSS: 126.5MiB, user CPU: 199.7%
7128 EventPipe events received, 10000 exceptions thrown

Reconfiguring EventPipe with identical settings

Throwing and catching 10000 exceptions
RSS: 116.0MiB, user CPU: 104.1%
40014 EventPipe events received, 10000 exceptions thrown

After

Q:\source\mdh1418\TestApp> corerun .\bin\Debug\net10.0\win-x64\publish\TestApp.dll 10000 10000
Press any key to start spawning threads and throwing exceptions
Enabling CLR EventPipe

Throwing and catching 1000 exceptions (warm up JIT)

Throwing and catching 10000 exceptions
RSS: 79.3MiB, user CPU: 97.1%
40000 EventPipe events received, 10000 exceptions thrown

Spawning 10000 short-lived threads
RSS: 117.1MiB, user CPU: 121.9%
40000 EventPipe events received, 10000 exceptions thrown

Throwing and catching 10000 exceptions
RSS: 118.2MiB, user CPU: 103.6%
40000 EventPipe events received, 10000 exceptions thrown

Reconfiguring EventPipe with identical settings

Throwing and catching 10000 exceptions
RSS: 116.9MiB, user CPU: 104.7%
40000 EventPipe events received, 10000 exceptions thrown

Update thread_session_state_list loop to use foreach macro Unindent sequence point reloop block Consolidate sequence point loop timestamp boundary update

The event consuming thread advances to the next event using the buffer_manager's thread_session_state_list, which signals when an event has written to a buffer. Adding the capability to trim the list when tss instances have exhausted their events and buffers also requires re-adding a tss if it receives new events, which is demarcated by allocating a new buffer_list. To facilitate freeing and NULLing out buffer_lists, we need to pivot from logic that expects buffer_lists to persist. ep_buffer_manager_write_all_buffers_to_file_v4 adjusts sequence points tss:sequence_number mappings whenever events have been read from the tss. Instead of having the tss' underlying buffer_list track the last read sequence point, move the counter to the tss itself to allow buffer_lists to be NULL'ed. In order to associate tss with their last_read_sequence_number, the buffer_manager needs to keep track of the "current" tss from which events are being processed.

EventPipeSequencePoints store all alive EventPipeThreads IDs and their corresponding latest read sequence number in a map which is ultimately used to write the sequence point block. Previously, the native thread ID was dynamically retrieved through the EventPipeThreadSessionState, requiring ref counting the EventPipeThread so that it would not be freed before the immutable native thread id was written. To simplify, directly map the native thread ids and remove associated ref counting.

The delayed EventPipeThreadSessionState cleanup added to ep_buffer_manager_write_all_buffers_to_file_v4 does not address the in-proc EventListener scenario, which is also susceptible to performance issues should many short-lived threads emit any events. In addition, the ThreadSessionState cleanup induces a leakage of buffer_lists, which previously could only be removed by iterating over the buffer_manager's thread_session_state_list. In preparation for an alternative cleanup mechanism, revert the former cleanup logic.

mdh1418 · 2025-07-24T17:29:51Z

src/native/eventpipe/ep-buffer-manager.c

 	EventPipeEventInstance *next_event;
-
-	for (uint32_t i = 0; i < dn_vector_ptr_size (&buffer_array) && i < dn_vector_ptr_size (&buffer_list_array); ++i) {
+	for (uint32_t i = 0; i < dn_vector_ptr_size (&thread_session_state_array) && i < dn_vector_ptr_size (&buffer_list_array); ++i) {


I wanted to use the FOREACH macro with only tracking the thread_session_state_array and retrieving the associated buffer_list, but ep_thread_session_state_get_buffer_list is protected by the thread lock.

Sequence Points hold a mapping of native thread ids to the threads last read event sequence number. Previously, the buffer_manager's thread_session_state_list was used to update the last read event sequence number right before the sequence point is written. With plans to cleanup the thread_session_state_list, the invariant that all EventPipeThreadSessionStates will still be in the list after processing events is no longer guaranteed. Instead, move the update logic to be in tandem with processing events on a thread. ep_buffer_manager_write_all_buffers_to_file_v4 processes events by first finding the earliest event in all buffers and subsequently processing all events from that thread. Once all available events have been read from that thread, update the sequence point mapping.

Throughout an EventPipe Session's lifetime, any new thread that writes an Event will induce a sequence of allocations including EventPipeThread, EventPipeThreadSessionStates, EventPipeBufferList, EventPipeBuffer, EventPipeSequencePoint, etc. Some of these objects are held throughout the lifetime of the session instead of the scope of their utility. Such behavior leads to a performance hit. For example, the buffer_manager's thread_session_state_list is used to discover new events, and without the proper cleanup, it will hold instances whose buffers have been exhausted and may never received a new event. To address a bloated thread_session_state_list, we add cleanup measures to remove instances whose buffer_list is both empty and the corresponding thread has been unregistered, indicating that the thread has exited and will not write new events.

Copilot

Pull Request Overview

This PR optimizes EventPipe's ThreadSessionState management to address performance issues in scenarios with many short-lived threads. The changes focus on cleaning up exhausted thread session states and optimizing sequence point handling.

Key changes:

Implements cleanup of ThreadSessionState instances when their buffer_list is empty and the thread has exited
Simplifies sequence point handling by storing native thread IDs directly instead of using EventPipeThreadSessionState references
Enhances buffer manager to track current thread session state being processed for proper cleanup

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/native/eventpipe/ep-thread.h	Updates thread ID type and exposes buffer_manager getter for ThreadSessionState
src/native/eventpipe/ep-thread.c	Adds cleanup logic in thread unregistration and fixes session state memory management
src/native/eventpipe/ep-event-instance.c	Removes reference counting complexity from sequence point cleanup
src/native/eventpipe/ep-buffer-manager.h	Adds current_thread_session_state tracking and cleanup function declaration
src/native/eventpipe/ep-buffer-manager.c	Implements comprehensive cleanup logic and optimizes sequence point handling
src/native/eventpipe/ep-block.c	Updates sequence point serialization to use thread IDs directly
src/native/containers/dn-umap-t.h	Adds thread ID to uint32 map type for sequence point storage

Copilot · 2025-07-24T22:04:55Z

src/native/eventpipe/ep-buffer-manager.c

 				// get the next buffer
 				current_buffer = buffer_list->head_buffer;
+
+				if (!current_buffer && (ep_rt_volatile_load_uint32_t_without_barrier (ep_thread_get_unregistered_ref (ep_thread_session_state_get_thread (thread_session_state))) > 0))


[nitpick] This condition check is complex and hard to read. Consider extracting it into a helper function like is_thread_unregistered_and_buffer_list_empty to improve readability and maintainability.

Suggested change

if (!current_buffer && (ep_rt_volatile_load_uint32_t_without_barrier (ep_thread_get_unregistered_ref (ep_thread_session_state_get_thread (thread_session_state))) > 0))

if (is_thread_unregistered_and_buffer_list_empty(current_buffer, thread_session_state))

Copilot · 2025-07-24T22:04:56Z

src/native/eventpipe/ep-buffer-manager.c

+			// If a thread is able to drop more than 0x80000000 events in between sequence points then we will
+			// miscategorize it, but that seems unlikely.
+			uint32_t last_read_delta = sequence_number - thread_sequence_number;
+			if (0 < last_read_delta && last_read_delta < 0x80000000)


The magic number 0x80000000 should be defined as a named constant to clarify its purpose in sequence number overflow detection.

Suggested change

if (0 < last_read_delta && last_read_delta < 0x80000000)

if (0 < last_read_delta && last_read_delta < SEQUENCE_NUMBER_OVERFLOW_THRESHOLD)

noahfalk · 2025-07-25T07:56:43Z

src/native/eventpipe/ep-thread.h

 	// the buffer allocation logic is estimating how many
 	// buffers a given thread has used (see: ep_thread_session_state_get_buffer_count_estimate and its uses).
 	EventPipeBufferList *buffer_list;
-#ifdef EP_CHECKED_BUILD


I don't think we need to store this field? You can get the same value by calling ep_session_get_buffer_manager (tss->session)

noahfalk · 2025-07-25T07:57:19Z

src/native/eventpipe/ep-thread.h

 	// This field can be read outside the lock when
 	// the buffer allocation logic is estimating how many
 	// buffers a given thread has used (see: ep_thread_session_state_get_buffer_count_estimate and its uses).
 	EventPipeBufferList *buffer_list;


The comment on the field needs to be updated, its description of the lifetime is out-of-date.

noahfalk · 2025-07-25T08:11:10Z

src/native/eventpipe/ep-buffer-manager.c

+/*
+ *  buffer_manager_remove_thread_if_buffer_list_empty
+ *
+ *  This function should only be called when the thread is unregistered and the buffer_list is empty.


When we call this from thread_unregister the buffer list may not be empty and that should be fine. Part of the function's responsibility is to check.

noahfalk · 2025-07-25T08:59:37Z

src/native/eventpipe/ep-buffer-manager.c

+	EP_SPIN_LOCK_ENTER (&buffer_manager->rt_lock, section1)
+		buffer_manager_remove_thread_if_buffer_list_empty (buffer_manager, thread_session_state);
+	EP_SPIN_LOCK_EXIT (&buffer_manager->rt_lock, section1)
+


I think we should also be calling ep_thread_delete_session_state() here to remove the state cached on the thread.

Allowing it to linger until the end of a session creates potential for an unbounded memory leak if an app rapidly creates threads that each write at least one event to the session and then exit.

Annoyingly the session reader thread and the writer thread might be racing to delete it and they can't synchronize using the thread lock because whichever thread loses the race will AV trying to access the lock through a now deleted TSS pointer. I'm thinking about how to solve it.

noahfalk · 2025-07-25T09:08:20Z

src/native/eventpipe/ep-thread.c

-#ifdef EP_CHECKED_BUILD
 	for (uint32_t i = 0; i < EP_MAX_NUMBER_OF_SESSIONS; ++i) {
-		EP_ASSERT (thread->session_state [i] == NULL);
+		ep_thread_session_state_free (thread->session_state [i]);


I don't think thread->session_state[i] can ever be non-NULL here. The TSS has a ThreadHolder inside of it which maintains a ref count on the thread. As long as any TSS exists the ref count is > 0 and ep_thread_free() won't get called.

TSS objects are currently getting cleaned up during session cleanup and I proposed in another comment that we should also be doing it in remove_thread_if_buffer_list_empty().

noahfalk · 2025-07-25T10:38:05Z

src/native/eventpipe/ep-buffer-manager.c

-
-			// the sequence point captured a lower bound for sequence number on each thread, but iterating
-			// through the events we may have observed that a higher numbered event was recorded. If so we
-			// should adjust the sequence numbers upwards to ensure the data in the stream is consistent.


Lets preserve this comment by moving it to the same place you moved the sequence point update code above.

noahfalk · 2025-07-25T10:46:35Z

src/native/eventpipe/ep-buffer-manager.c

 				current_buffer = buffer_list->head_buffer;
+
+				if (!current_buffer && (ep_rt_volatile_load_uint32_t_without_barrier (ep_thread_get_unregistered_ref (ep_thread_session_state_get_thread (thread_session_state))) > 0))
+					buffer_manager_remove_thread_if_buffer_list_empty (buffer_manager, thread_session_state);


Lets delay this call until after exiting the buffer_manager lock below, then you can call the ep_ version of function which I suggested in another comment should also take the thread lock and cleanup the TSS state.

noahfalk · 2025-07-25T11:50:32Z

src/native/eventpipe/ep-buffer-manager.c

 			*events_written = events_written_for_thread || *events_written;
+
+			ep_rt_thread_id_t thread_id = ep_thread_get_os_thread_id (ep_thread_session_state_get_thread (current_thread_session_state));
+			dn_umap_it_t found = dn_umap_threadid_uint32_find (ep_sequence_point_get_thread_sequence_numbers (sequence_point), thread_id);


What if buffer_manager_try_peek_sequence_point() above returns NULL?

Previously last_read_sequence_number provided a place to record our reads that was always available. I suspect to make this work you will need to change the lifetime of sequence points so that there is always at least one of them in the list. This means you may need to allocate them and enqueue them before its time to timestamp them.

Also we need to preserve the invariant that sequence numbers for the same thread only go up (not counting integer overflow). Imagine if there were two sequence points in the queue both initialized with thread A=0 in the sequence number map. Now on the read thread we read two events from thread A's buffer and increase the current sequence point to report thread A=2. However when we get to the 2nd sequence point it is still storing thread A=0. If there are no events written on thread A while flushing the 2nd sequence point the code right here isn't going to run. Something else needs to. Perhaps when you write each sequence point you can forward any needed updates to the next one.

Thinking on this some more I think its doable, but the sequence point management and updates might get messy. It might be simpler to leave the sequence points and last_read_sequence_number field how they were before this PR and just accept that TSS lifetimes with sequence points will be a little different than without sequence points but I haven't worked through the details to get a good comparison.

Ah, I didn't realize that we don't have any EventPipeSequencePoints until the first remaining_sequence_point_alloc_budget has been exhausted. Would there be a problem with initializing the remaining_sequence_point_alloc_budget to 0 in ep_buffer_manager_alloc, so we guarantee there is one SequencePoint when the first event is written/buffer is allocated? I now realize that sequence points are intended to only be allocated after sufficient events have been written to buffers, and that the timestamp of the sequence point dictates which events can be processed.

I'm also realizing, could that be the reason why the SequencePoint's maps held a reference to the thread?
Right now, if we exhaust buffers and the thread is unregistered, if we delete the ThreadSessionStates in ep_buffer_manager_remove_thread_if_buffer_list_empty, that means we can hit this last_read_sequence_number update logic after the EPTSS is deleted.
Thats resolvable by replacing EventPipeThreadSessionState *current_thread_session_state = buffer_manager->current_thread_session_state; with ep_rt_thread_id_t current_thread_id = ep_thread_get_os_thread_id (ep_thread_session_state_get_thread (buffer_manager->current_thread_session_state) because we just need the "value" of the thread id to update the map, instead of the EPTSS itself.

Still, to properly update entries in the map, it sounds like there is an inherent requirement that ThreadSessionStates must live beyond all SequencePoints mapping the corresponding thread_ids in order to be able to update each entry. Unless we introduce another struct to track current_thread_id and last_read_sequence_number it doesn't seem like we can remove the thread ref-counting from the map.

lateralusX · 2025-07-28T08:14:35Z

src/native/containers/dn-umap-t.h

 }

 DN_UMAP_IT_KEY_T (ptr, void *)
+DN_UMAP_IT_KEY_T (threadid, ep_rt_thread_id_t)


You need to do this in ep code, since ep types are not shared between libraries that could use the container headers.

lateralusX · 2025-07-28T08:22:56Z

src/native/containers/dn-umap-t.h

 DN_UMAP_T (ptr, void *, int32, int32_t)
 DN_UMAP_T (ptr, void *, uint32, uint32_t)
 DN_UMAP_T (uint32, uint32_t, ptr, void *)
+DN_UMAP_T (threadid, ep_rt_thread_id_t, uint32, uint32_t)


Same as above, no ep types in container headers.

lateralusX · 2025-07-28T08:43:08Z

src/native/eventpipe/ep-buffer-manager.c

-		ep_thread_addref (ep_thread_holder_get_thread (ep_thread_session_state_get_thread_holder_ref (thread_session_state)));
+		ep_rt_thread_id_t os_thread_id = ep_thread_get_os_thread_id (ep_thread_session_state_get_thread (thread_session_state));
+
+		dn_umap_threadid_uint32_insert (ep_sequence_point_get_thread_sequence_numbers (sequence_point), os_thread_id, sequence_number);


What happens if a thread might die when being on this map? If that happens, the old code would at least preserve the internal representation of TSS through the addref and then access it through that data structure as a representation of the thread, but now when we use thread id as key, that could be reused by OS, so if everything related to the thread TSS has been freed while we hold the thread id -> sequence point representation in this map, then there might be duplicates. In theory I believe this could happen in the past as well, but we would end up with different TSS instance but with the same internal thread id, so not sure what consequences that would introduce later, just wanted to highlight this potential scenario when we introduce thread id is identification instead of TSS.

lateralusX · 2025-07-28T10:10:07Z

Taking a step back to understand why we need a bigger refactor/cleanup. Why couldn't we just detect and cleanup consumed TSS on the consumer threads, like we already do in ep_buffer_manager_write_all_buffers_to_file_v4? TSS and buffers need to be kept alive until consumed by threads, so putting the cleanup logic on the consumer, like we already do in ep_buffer_manager_write_all_buffers_to_file_v4, feels straightforward. Any reason why we couldn't simplify the fix to just do that?

The following should hold:

Everything gets removed when a session gets deleted.
EP needs to keep TSS and buffers for threads that get unregistered but not yet consumed.
When EP have consumed all buffers for an unregistered thread, it should remove all state.
There is only one parallel consumer thread per session, but many writers. Consumer thread can cleanup at select locations, like ep_buffer_manager_write_all_buffers_to_file_v4.
Original issue was with ep_buffer_manager_get_next_event since it doesn't include any capabilities to do cleanup when things have been consumed.

Could cleanup consumed state in ep_buffer_manager_get_next_event when detecting that there are empty buffers, like you do in ep_buffer_manager_remove_thread_if_buffer_list_empty, but call it in ep_buffer_manager_get_next_event and only when the call have detected empty buffers and maybe seen that the TSS thread is unregistered be an alternative? That way we shouldn't need to change ep_buffer_manager_write_all_buffers_to_file_v4 implementation and could probably fix the memory leak without additional regression risks.

mdh1418 · 2025-08-05T22:00:53Z

Closed in favor of #118415

mdh1418 added 4 commits July 22, 2025 16:03

[EventPipe] Cleanup logic

f1b7a75

Update thread_session_state_list loop to use foreach macro Unindent sequence point reloop block Consolidate sequence point loop timestamp boundary update

github-actions bot added the area-Tracing-coreclr label Jul 24, 2025

dotnet-policy-service bot assigned mdh1418 Jul 24, 2025

mdh1418 commented Jul 24, 2025

View reviewed changes

mdh1418 added 2 commits July 24, 2025 17:01

mdh1418 force-pushed the ep_cleanup_thread_session_states branch from 463e6f5 to 1fedd96 Compare July 24, 2025 21:01

mdh1418 marked this pull request as ready for review July 24, 2025 22:04

mdh1418 requested review from Copilot, lateralusX and noahfalk July 24, 2025 22:04

Copilot AI reviewed Jul 24, 2025

View reviewed changes

noahfalk reviewed Jul 25, 2025

View reviewed changes

lateralusX reviewed Jul 28, 2025

View reviewed changes

mdh1418 marked this pull request as draft July 29, 2025 20:32

mdh1418 closed this Aug 5, 2025

github-actions bot locked and limited conversation to collaborators Sep 5, 2025

	if (!current_buffer && (ep_rt_volatile_load_uint32_t_without_barrier (ep_thread_get_unregistered_ref (ep_thread_session_state_get_thread (thread_session_state))) > 0))
	if (is_thread_unregistered_and_buffer_list_empty(current_buffer, thread_session_state))

	if (0 < last_read_delta && last_read_delta < 0x80000000)
	if (0 < last_read_delta && last_read_delta < SEQUENCE_NUMBER_OVERFLOW_THRESHOLD)

[EventPipe] Optimize ThreadSessionState Management and Cleanup #118029

[EventPipe] Optimize ThreadSessionState Management and Cleanup #118029

Uh oh!

Conversation

mdh1418 commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Testing

Before

After

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lateralusX Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lateralusX commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdh1418 commented Aug 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mdh1418 commented Jul 24, 2025 •

edited

Loading

lateralusX Jul 28, 2025 •

edited

Loading

lateralusX commented Jul 28, 2025 •

edited

Loading