Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide thread name (and when possible gem name) on output #12

Merged
merged 18 commits into from
Jun 6, 2023

Conversation

benoittgt
Copy link
Contributor

@benoittgt benoittgt commented Apr 5, 2023

gvl-tracing handles properly handles thread name for basic usage like:

  Thread.new do
    Thread.current.name = "thread name"
  end

But on more complicated examples with external libraries, it is not able to properly handles thread name.
Jean had a good idea in #9 with using Thread.list.

This commit does this:

  • Store as list of json line, events we want to monitor
  • Store Thread.list before stopping gvl tracing
  • Iterate of events stored in json file and modify entries where we can find the id of a thread
  • Go a little bit further when we can and provide the gem name related to the thread

This is a very naive approach and probably not super efficient. But this is a first try.

Screenshot with the new examples file:
Screenshot 2023-04-05 at 21 23 45

Screenshot of a gvl-tracing on a Rails app with some multihread:

Screenshot 2023-04-05 at 21 45 01

Opening as a draft because code need probably some rework.

lib/gvl-tracing.rb Outdated Show resolved Hide resolved
lib/gvl-tracing.rb Outdated Show resolved Hide resolved
@benoittgt benoittgt requested a review from casperisfine April 6, 2023 10:28
lib/gvl-tracing.rb Show resolved Hide resolved
lib/gvl-tracing.rb Outdated Show resolved Hide resolved
lib/gvl-tracing.rb Outdated Show resolved Hide resolved

REGEX = /lib(?!.*lib)\/([a-zA-Z-]+)/
def thread_label(thread)
lib_name = thread.to_s.match(REGEX)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If your goal is to exact a gem name or something, then thread.backtrace_locations.last.path might be cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at it a little bit. But bracktrace is an array of strings.
The last element looks very similar to Thread instance to_s

thread.backtrace.last
=> "/Users/benoit.tigeot/.rbenv/versions/3.2.0/lib/ruby/gems/3.2.0/gems/concurrent-ruby-1.2.2/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333:in `block in create_worker'"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pointing at bracktrace_locations, not backtrace.

Also to avoid allocation too much useless stuff: t.backtrace_locations(1, 1).first.path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried an the main issue is we may do not have path.

So diff is

--- a/lib/gvl-tracing.rb
+++ b/lib/gvl-tracing.rb
@@ -81,7 +81,7 @@ module GvlTracing

     REGEX = /lib(?!.*lib)\/([a-zA-Z-]+)/
     def thread_label(thread)
-      lib_name = thread.to_s.match(REGEX)
+      lib_name = thread.backtrace_locations(1, 1).first&.path&.match(REGEX)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, could be a native method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should still change it to backtrace_locations? From outside I see limited gains.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this one (yet? xD), so let's go with what you picked for now :)


def aggreate_thread_list(list)
list.each_with_object({}) do |t, acc|
next unless t.name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip if it doesn't have a name?

The vast majority of threads aren't named. I think a good thing would be to use the name is available, and if not fallback to parsing its backtrace.

Copy link
Contributor Author

@benoittgt benoittgt Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing that surprise me is that
https://github.com/ivoanjo/gvl-tracing/blob/master/examples/example4.rb#L13
Is properly handled by the C code that look for thread name. But thread list is not able to get it. But in the scenario of the Rails app (like in my initial example), it works properly with Thread.list.

That why I skip Thread without name here. I think your proposal is a good idea, but Thread.list needs to report properly to Thread#name like for example4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you provide the answer bellow, and it is also mentioned in doc.

it may set the name to pthread and/or kernel.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, example4.rb doesn't work with the new code because the threads die before .stop -- thus they won't be in Thread.list by the time we call it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe when t == Thread.main it's worth printing "Main Thread" or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@@ -91,7 +91,7 @@ static inline void render_thread_metadata(void) {
#endif

fprintf(output_file,
" {\"ph\": \"M\", \"pid\": %u, \"tid\": %u, \"name\": \"thread_name\", \"args\": {\"name\": \"%lu %s\"}},\n",
" {\"ph\": \"M\", \"pid\": %u, \"tid\": %u, \"name\": \"thread_name\", \"args\": {\"name\": \"%lu %s\"}}\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we still need to provide thread name here.

Copy link
Contributor Author

@benoittgt benoittgt Apr 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required. For example the example4.rb works with this code. We are not able to get Thread#name via Thread.list

@benoittgt benoittgt marked this pull request as ready for review April 6, 2023 13:44
@benoittgt benoittgt force-pushed the provide-thread-name branch 3 times, most recently from d391de4 to 64494bd Compare April 6, 2023 15:44
@ivoanjo
Copy link
Owner

ivoanjo commented Apr 10, 2023

Hey there!

Big thanks for working on this 🎉🎉🎉

I was a bit slow to respond over Easter weekend but I'll make an effort to be more responsive to this PR from now on :) :)

I was doing a few experiments with the perfetto UI and I think there's a path to avoiding post-processing the output with the following insight: the perfetto UI doesn't mind where in the event sequence the thread_name appears.

It turns out the metadata event doesn't have to show up when we first see the thread (as I initially implemented). Furthermore, I just experimented a bit and it turns out that if we repeat the metadata events for a thread more than once, and perfetto only displays the last one.

Thus, I think we don't need to do the reparsing; instead, as part of stop, we could re-print the names of the threads to the trace file, e.g. by perhaps building up the thread names in Ruby as you're doing in the PR, and then passing it as a string that gets appended to the trace file before it gets closed. And for perfetto it seems like this is OK -- we don't even need to omit the duplicate names at the beginning.

What do you think? I'm suggesting this approach because it means that even if we run the tracing for a long time, the overhead when stopping would always be O(number of threads) which seems quite reasonable.


There is another downside with this approach of getting the thread names at the end, which I am totally ok with if we don't solve yet -- I didn't solve it in my first implementation either. I'm sharing it here more for discussion :)

The issue is because Ruby reuses native threads, it means that using only the native thread id to identify a thread and match it to a Ruby Thread object may not be enough -- two Ruby Thread objects can actually share the same native thread id.

This means that the name can be completely wrong, if we hit one of these cases :)

@casperisfine
Copy link

This means that the name can be completely wrong, if we hit one of these cases :)

Yeah, but it's hard to solve this. We'd need a callback on thread start to eagerly assign thread names.

Overall, code that spawn short lived threads are rare, so maybe it's an OK limitation?

@ivoanjo
Copy link
Owner

ivoanjo commented Apr 11, 2023

This means that the name can be completely wrong, if we hit one of these cases :)

Overall, code that spawn short lived threads are rare, so maybe it's an OK limitation?

Definitely it's OK to not solve it in this PR! I brought it up since it would be nice to fix at some point -- as it's a bit of a sharp edge that can confuse users -- and maybe the discussion here would shine a light on what we could do for that one :)

Yeah, but it's hard to solve this. We'd need a callback on thread start to eagerly assign thread names.

I've been wondering if we could approximate it during RUBY_INTERNAL_THREAD_EVENT_READY/RUBY_INTERNAL_THREAD_EVENT_RESUMED/RUBY_INTERNAL_THREAD_EVENT_SUSPENDED by checking if rb_thread_current() is still the last one we saw for this thread. Something like this?

@casperisfine
Copy link

Something like this?

The problem is that you can only generate the name when you hold the GVL. So you could detect that the name need to be generated again, but couldn't do it from these callbacks (except for RESUMED).

@ivoanjo
Copy link
Owner

ivoanjo commented Apr 11, 2023

The problem is that you can only generate the name when you hold the GVL. So you could detect that the name need to be generated again, but couldn't do it from these callbacks (except for RESUMED).

Right! I guess there's two parts of the problem:

  • Detecting if it is the same thread, giving it a new "tid" as needed (or current_thread_serial as it's currently called)
  • Getting the name for a given thread

The name could still come as a best-effort thing done later (as this PR does) -- worst case, we don't get a name.

@benoittgt
Copy link
Contributor Author

Thus, I think we don't need to do the reparsing; instead, as part of stop, we could re-print the names of the threads to the trace file, e.g. by perhaps building up the thread names in Ruby as you're doing in the PR, and then passing it as a string that gets appended to the trace file before it gets closed. And for perfetto it seems like this is OK -- we don't even need to omit the duplicate names at the beginning.

That's interesting. I'm gonna have a look at what is possible with metadatas and perfetto UI.

@benoittgt benoittgt force-pushed the provide-thread-name branch 2 times, most recently from df185f0 to 7b079a0 Compare April 14, 2023 14:55
@benoittgt
Copy link
Contributor Author

benoittgt commented Apr 14, 2023

@ivoanjo I had a quick look of your proposal and pushed a commit. It is much better. Good catch with perfetto trace format. The only issue I have is the thread_id provided by the C extension is different from the one I get in thread list. I didn't look a it yet.

Edit

Maybe going in this direction may be a good idea? It is working properly. What was the need of using serial instead of a native thread id?

--- a/ext/gvl_tracing_native_extension/gvl_tracing.c
+++ b/ext/gvl_tracing_native_extension/gvl_tracing.c
@@ -76,10 +76,11 @@ static inline void initialize_thread_id(void) {

 static inline void render_thread_metadata(void) {
   uint64_t native_thread_id = 0;
-  #ifdef HAVE_GETTID
-    native_thread_id = gettid();
-  #elif HAVE_PTHREAD_THREADID_NP
+
+  #ifdef HAVE_PTHREAD_THREADID_NP
     pthread_threadid_np(pthread_self(), &native_thread_id);
+  #elif HAVE_GETTID
+    native_thread_id = gettid();
   #else
     native_thread_id = current_thread_serial; // TODO: Better fallback for Windows?
   #endif
@@ -91,9 +92,8 @@ static inline void render_thread_metadata(void) {
   #endif

   fprintf(output_file,
-    "  {\"ph\": \"M\", \"pid\": %u, \"tid\": %u, \"name\": \"thread_name\", \"args\": {\"name\": \"%lu %s\"}},\n",
-    process_id, current_thread_serial, native_thread_id, native_thread_name_buffer
-  );
+    "  {\"ph\": \"M\", \"pid\": %u, \"tid\": %llu, \"name\": \"thread_name\", \"args\": {\"name\": \"%lu %s\"}},\n",
+    process_id, native_thread_id, native_thread_id, native_thread_name_buffer);
 }

 static VALUE tracing_start(VALUE _self, VALUE output_path) {
@@ -170,6 +170,15 @@ static void render_event(const char *event_name) {
   }

   unsigned int thread_id = current_thread_serial;
+  uint64_t native_thread_id = 0;
+
+  #ifdef HAVE_PTHREAD_THREADID_NP
+    pthread_threadid_np(pthread_self(), &native_thread_id);
+  #elif HAVE_GETTID
+    native_thread_id = gettid();
+  #else
+    native_thread_id = current_thread_serial; // TODO: Better fallback for Windows?
+  #endif

   // Each event is converted into two events in the output: one that signals the end of the previous event
   // (whatever it was), and one that signals the start of the actual event we're processing.
@@ -180,13 +189,13 @@ static void render_event(const char *event_name) {

   fprintf(output_file,
     // Finish previous duration
-    "  {\"ph\": \"E\", \"pid\": %u, \"tid\": %u, \"ts\": %f},\n" \
+    "  {\"ph\": \"E\", \"pid\": %llu, \"tid\": %u, \"ts\": %f},\n" \
     // Current event
-    "  {\"ph\": \"B\", \"pid\": %u, \"tid\": %u, \"ts\": %f, \"name\": \"%s\"},\n",
+    "  {\"ph\": \"B\", \"pid\": %llu, \"tid\": %u, \"ts\": %f, \"name\": \"%s\"},\n",
     // Args for first line
-    process_id, thread_id, now_microseconds,
+    process_id, native_thread_id , now_microseconds,
     // Args for second line
-    process_id, thread_id, now_microseconds, event_name
+    process_id, native_thread_id , now_microseconds, event_name
   );
 }

Copy link
Owner

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, the sequential tid! On hindsight, I did totally gloss over that in my suggestion, making it seem easier than it was -- sorry!

You can actually see the story of its introduction in #4 .
TL;DR:

  • gettid() is not supported on macOS
  • We discussed replacing it by rb_nativethread_self() but that got us long ids which the perfetto UI didn't like at the time
  • The sequential id ended up being adopted instead

It was only in a later change that I introduced the pthread_threadid_np for macOS as the tid.

So yes, I think at this point it's perfectly reasonable to switch back to using the thread_id as the tid, instead of the sequential value.

(At some point something like the sequential value may need to make a return to allow distinguishing thread reuse, but let's put that aside for now.)

lib/gvl-tracing.rb Outdated Show resolved Hide resolved
lib/gvl-tracing.rb Outdated Show resolved Hide resolved
list.each_with_object([]) do |t, acc|
next unless t.name

acc << {"ph": "M", "pid": Process.pid, "tid": t.native_thread_id, "name": "thread_name", "args": {"name": thread_label(t)}}.to_json
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: To be honest, this line is so close to the final output JSON, that I'm not sure it's worth using to_json ;)


def aggreate_thread_list(list)
list.each_with_object({}) do |t, acc|
next unless t.name
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, example4.rb doesn't work with the new code because the threads die before .stop -- thus they won't be in Thread.list by the time we call it.

@benoittgt
Copy link
Contributor Author

benoittgt commented Apr 24, 2023

So yes, I think at this point it's perfectly reasonable to switch back to using the thread_id as the tid, instead of the sequential value.

I am wondering if my change properly address your suggestion.

I tested the last code change on macOS with success with example4.rb (we fallback on thread name via native_thread_name_buffer), thread_name_with_extension.rb and Rails app code.

Sorry for the late reply. I was on vacations. 😊

@benoittgt benoittgt force-pushed the provide-thread-name branch from 647f086 to 7c4ae7c Compare April 26, 2023 15:15
gvl-tracing handles properly thread name for basic usage like:
```ruby
  Thread.new do
    Thread.current.name = "thread name
  end
```

But on more complicated with external libraries it is not able to
properly handles thread name.
Jean had a good idea in ivoanjo#9
about using `Thread.list`.

This commit does this:
- Store as list of json line, events we want to monitor
- Store `Thread.list` before stopping saving event related to the GVL
- Iterate of events store in json file and modify entries where we can
  find the name of the thread
- Go a little bit further when we can provide the gem name

This is a very naive approach and probably not super efficient. But this
is a first try.
To display the proper thread name in some situation we store Thread.list
before stoping the profiling and append thread name to the events file
for perfetto tool.
@benoittgt benoittgt force-pushed the provide-thread-name branch from 7c4ae7c to c1baf71 Compare April 26, 2023 15:15
Copy link
Owner

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work so far! I've left a bunch more suggestions, but I think after the next round I'll be more than happy to merge and put out a release :)

P.s.: I may be a bit slow to respond on the next few days, taking a bit of time off before RubyKaigi! ;)

ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved
ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved
ext/gvl_tracing_native_extension/gvl_tracing.c Outdated Show resolved Hide resolved
Comment on lines 86 to 89
fprintf(output_file,
" {\"ph\": \"M\", \"pid\": %u, \"tid\": %u, \"name\": \"thread_name\", \"args\": {\"name\": \"%lu %s\"}},\n",
process_id, current_thread_serial, native_thread_id, native_thread_name_buffer
);
" {\"ph\": \"M\", \"pid\": %u, \"tid\": %llu, \"name\": \"thread_name\", \"args\": {\"name\": \"%llu %s\"}},\n",
process_id, thread_id, thread_id, native_thread_name_buffer);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to use %llu is actually not correct for Linux -- if you look at the CI runs, it complains that it's not correct of a uint64_t. (But it is correct for macOS -- going back to %lu makes the macOS build complain).

I suggest using PRIu64, which should work correctly for both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to long int. Maybe it's better?


def aggreate_thread_list(list)
list.each_with_object({}) do |t, acc|
next unless t.name
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe when t == Thread.main it's worth printing "Main Thread" or something?


REGEX = /lib(?!.*lib)\/([a-zA-Z-]+)/
def thread_label(thread)
lib_name = thread.to_s.match(REGEX)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this one (yet? xD), so let's go with what you picked for now :)

examples/thread_name_with_extension.rb Outdated Show resolved Hide resolved
Comment on lines 5 to 17
{"ph": "M", "pid": 44888, "tid": 476842, "name": "thread_name", "args": {"name": "476842 "}},
{"ph": "E", "pid": 44888, "tid": 476842, "ts": 66.000000},
{"ph": "B", "pid": 44888, "tid": 476842, "ts": 66.000000, "name": "started"},
{"ph": "E", "pid": 44888, "tid": 476842, "ts": 69.000000},
{"ph": "B", "pid": 44888, "tid": 476842, "ts": 69.000000, "name": "wants_gvl"},
{"ph": "M", "pid": 44888, "tid": 476843, "name": "thread_name", "args": {"name": "476843 "}},
{"ph": "E", "pid": 44888, "tid": 476843, "ts": 82.000000},
{"ph": "B", "pid": 44888, "tid": 476843, "ts": 82.000000, "name": "started"},
{"ph": "E", "pid": 44888, "tid": 476843, "ts": 85.000000},
{"ph": "B", "pid": 44888, "tid": 476843, "ts": 85.000000, "name": "wants_gvl"},
{"ph": "M", "pid": 44888, "tid": 476844, "name": "thread_name", "args": {"name": "476844 "}},
{"ph": "E", "pid": 44888, "tid": 476844, "ts": 93.000000},
{"ph": "B", "pid": 44888, "tid": 476844, "ts": 93.000000, "name": "started"},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not anything this PR touches or changes, but it's interesting that we're not getting beyond microsecond precision on macOS, but we do so on Linux. At some point I need to perhaps look into the other options beyond CLOCK_MONOTONIC to see if we can make it tighter (I think there's a _RAW variant....)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.

There is probably a more elegant way of writing it. I had to use two
types because `pthread_self` doesn't like a `long int`.
Not visible on MacOS but we have a warning.

../../../../ext/gvl_tracing_native_extension/gvl_tracing.c:153:17: warning: old-style function definition [-Wold-style-definition]
  153 | static uint64_t native_thread_id() {
      |                 ^~~~~~~~~~~~~~~~
@benoittgt
Copy link
Contributor Author

benoittgt commented Apr 27, 2023

Thanks @ivoanjo for your review. Here is few screenshots of the new output. I have similar output on a debian 11 too.

I am not sure for the C changes. You will tell me. :)

Screenshot 2023-04-27 at 17 48 29

Screenshot 2023-04-27 at 17 48 24

@ivoanjo
Copy link
Owner

ivoanjo commented Jun 6, 2023

This looks great! I hope you don't take my long delay the wrong way -- I definitely I am very grateful for the contribution. It's just been an intense few weeks :)

And feel free to send more things this way :)

@ivoanjo
Copy link
Owner

ivoanjo commented Jun 6, 2023

@benoittgt I've just released this improvement as version 1.2.0 on rubygems 🎉

@benoittgt benoittgt deleted the provide-thread-name branch June 21, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants