Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profiler helper for detecting if we're on the main Ractor #2350

Merged
merged 2 commits into from
Nov 9, 2022

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Nov 3, 2022

What does this PR do?:

Add a new helper to private_vm_access.c that allows the caller thread to ask if it's running as part of the main Ractor.

On Rubies without Ractor support, as a simplification, we always return true so that callers don't need to probe for Ractor support.

Motivation:

This helper will be used to make sure the profiler code does not run on the wrong Ractor, since the profiler code is currently not built to be called in parallel from multiple Ractors.

Additional Notes:

Hey, I added the first spec that uses Ractors to our test suite!

How to test the change?:

Test coverage included.

**What does this PR do?**:

Add a new helper to `private_vm_access.c` that allows the caller
thread to ask if it's running as part of the main Ractor.

On Rubies without Ractor support, as a simplification, we always
return `true` so that callers don't need to probe for Ractor support.

**Motivation**:

This helper will be used to make sure the profiler code does not
run on the wrong Ractor, since the profiler code is currently not
built to be called in parallel from multiple Ractors.

**Additional Notes**:

Hey, I added the first spec that uses Ractors to our test suite!

**How to test the change?**:

Test coverage included.
@ivoanjo ivoanjo requested a review from a team November 3, 2022 11:01
@github-actions github-actions bot added the profiling Involves Datadog profiling label Nov 3, 2022
@ivoanjo ivoanjo changed the title Ivoanjo/add support for detecting ractors Add profiler helper for detecting if we're on the main Ractor Nov 3, 2022
@ivoanjo
Copy link
Member Author

ivoanjo commented Nov 3, 2022

Note: I may have found another Ruby bug (or at least something strange) that happens after the first Ractor gets created in a Ruby app. I suspect this PR will cause flaky tests because of that, so I'm moving it back to draft until I can ensure this will not be the case.

@ivoanjo ivoanjo marked this pull request as draft November 3, 2022 14:22
@ivoanjo
Copy link
Member Author

ivoanjo commented Nov 8, 2022

Update: I was able to track down the flake issue to a Ruby VM bug, which I've reported upstream https://bugs.ruby-lang.org/issues/19112 .

I've also deployed a workaround in #2354, and changed this PR to be on top of that one, to underline the implicit dependency -- the workaround needs to go in first so as to avoid introducing a flaky test.

Marking this PR as "ready for review" once more.

@ivoanjo ivoanjo changed the base branch from master to ivoanjo/workaround-ractor-tracepoint-issue November 8, 2022 12:31
@ivoanjo ivoanjo marked this pull request as ready for review November 8, 2022 12:32
Base automatically changed from ivoanjo/workaround-ractor-tracepoint-issue to master November 8, 2022 13:47
@@ -125,6 +125,9 @@ def add_compiler_flag(flag)
# On older Rubies, we need to use a backported version of this function. See private_vm_api_access.h for details.
$defs << '-DUSE_BACKPORTED_RB_PROFILE_FRAME_METHOD_NAME' if RUBY_VERSION < '3'

# On older Rubies, there are no Ractors
$defs << '-DNO_RACTORS' if RUBY_VERSION < '3'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no built in define around this? Interesting...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick search and there isn't any -- the closest I found has HAVE_RUBY_RACTOR_H which can maybe thought of as a way of doing that but... I decided to keep this one for now ;) (Unless you think it's a really bad idea)

I think the reason why there is no define is because you can't disable Ractors (that I know of), and thus you can always probe them by looking at the Ruby version, so that's probably why they don't have a more specific #define.

On our side, as you can kinda tell by this file, I've been somewhat avoiding doing something like $defs << "-DRUBY_#{RUBY_VERSION}" and then doing #ifdef RUBY_3.1 in the code, in favor of being a bit more semantic -- e.g. rather than saying "this is the codepath for Ruby 3.1", say "this is the codepath for when this structure has a member called foo".

Comment on lines +695 to +696
// This API and definition are exported as a public symbol by the VM BUT the function header is not defined in any public header, so we
// repeat it here to be able to use in our code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to request/perform this change upstream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this "public-ish" is actually on purpose. That is, the missing header is not a "oops we forgot", it's a "this isn't really a public API, but some things outside of Ruby make good use of it, and we don't mind exposing it for now but you're on your own".

For instance in ruby/ruby@15476c6 you see a similar API that was published so that fiddle could use it:

https://github.com/ruby/ruby/blob/c3de7a3c58bf9a138ff8720ed56c0045d2b8e01d/internal/thread.h#L48

(Out of curiosity, there was some openness during a few chats with core team members recently to perhaps promote a few more APIs that would be really useful for profilers to this "public-ish" state).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But... maybe I'm just assuming too much. I've been hesitating a bit on asking for these kinds of things from upstream because I wanted to first come up with our most important asks and then go down the list from there, rather than asking for all of them.

@ivoanjo ivoanjo merged commit 97885d7 into master Nov 9, 2022
@ivoanjo ivoanjo deleted the ivoanjo/add-support-for-detecting-ractors branch November 9, 2022 09:14
@github-actions github-actions bot added this to the 1.6.0 milestone Nov 9, 2022
ivoanjo added a commit that referenced this pull request Nov 9, 2022
**What does this PR do?**:

This PR builds on top of #2350 and changes the
`Collectors::CpuAndWallTimeWorker` to never trigger sampling if any
of its methods get called from outside the main ractor.

This means that the profiler will not have visibility beyond the
main ractor. Building support for that will come later, as we'll need
to deal with the complexity of the profiler potentially being called
from multiple ractors.

**Motivation**:

We don't yet support Ractors, but on an application using Ractors the
profiler should not fail (it just won't show the Ractors).

**Additional Notes**:

(Nothing)

**How to test the change?**:

I've included partial coverage, but testing the GC-related behavior
was harder than I expected, so for now I didn't add it.

To test this feature, you can enable profiling on a Ruby app using
ractors and confirm that no samples appear from Ractors, but otherwise
profiling works correctly (for threads in the main Ractor).

Here's a test app I've been using:

```ruby
puts RUBY_DESCRIPTION

def fib(n)
  return n if n <= 1
  fib(n-1) + fib(n-2)
end

Thread.current.name = "#{Thread.current.object_id} main"

ractor1 = Ractor.new do
  Thread.current.name = "#{Thread.current.object_id} ractor1 main"

  Thread.new do
    Thread.current.name = "#{Thread.current.object_id} ractor1 background"

    loop { fib(40) }
  end

  sleep 1
  puts "ractor1 => #{Thread.list}"

  loop { fib(40) }
end

ractor2 = Ractor.new do
  Thread.current.name = "#{Thread.current.object_id} ractor2 main"

  Thread.new do
    Thread.current.name = "#{Thread.current.object_id} ractor2 background"

    loop { fib(40) }
  end

  sleep 1
  puts "ractor2 => #{Thread.list}"

  loop { fib(40) }
end

puts "main => #{Thread.list}"

sleep
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants