Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal-based profiling produces 502 Bad Gateway response from Passenger #2976

Closed
elliterate opened this issue Jul 18, 2023 · 4 comments · Fixed by #3280
Closed

Signal-based profiling produces 502 Bad Gateway response from Passenger #2976

elliterate opened this issue Jul 18, 2023 · 4 comments · Fixed by #3280
Assignees
Labels
bug Involves a bug community Was opened by a community member profiling Involves Datadog profiling

Comments

@elliterate
Copy link
Contributor

Current behavior

When signal-based profiling is enabled (the default), Passenger returns a 502 Bad Gateway, claiming to have received an incomplete response from the application.

Setting DD_PROFILING_NO_SIGNALS_WORKAROUND_ENABLED=true fixes the application.

Expected behavior

Passenger should return the complete application response without setting DD_PROFILING_NO_SIGNALS_WORKAROUND_ENABLED=true.

Steps to reproduce

  1. Run a Ruby web application with Passenger.
  2. Install ddtrace.
  3. Enable profiling.

Example

See: https://github.com/elliterate/ddtrace-passenger-example

Usage

docker-compose start
curl -v http://localhost:8080

Results

@elliterate elliterate added bug Involves a bug community Was opened by a community member labels Jul 18, 2023
@ivoanjo ivoanjo self-assigned this Jul 18, 2023
@ivoanjo ivoanjo added the profiling Involves Datadog profiling label Jul 18, 2023
@ivoanjo
Copy link
Member

ivoanjo commented Jul 18, 2023

Hey 👋! Thanks for the report -- I can definitely reproduce it.

Indeed enabling the no signals workaround is the correct solution for this issue.

Here's the next steps from our side:

  1. Make sure to update the documentation at https://docs.datadoghq.com/profiler/profiler_troubleshooting/ruby/#unexpected-failures-or-errors-from-ruby-gems-that-use-native-extensions-in-dd-trace-rb-1110 [PROF-7966] Document Ruby profiler incompatibility with passenger documentation#19036
  2. Update the auto-detection of gems that need the workaround to include passenger. Future versions of dd-trace-rb will automatically enable the workaround when passenger is detected. Automatically enable profiler "no signals" workaround for passenger web server #2978
  3. Report upstream issue to passenger folks, try to get fix merged on their side so we can work without the workaround on passenger-based apps Passenger resets SIGPROF signal handlers causing profilers to crash application phusion/passenger#2489
  4. Add integration test coverage for passenger [PROF-7966] Add passenger testing to rack integration app #2998

ivoanjo added a commit that referenced this issue Jul 18, 2023
**What does this PR do?**:

This PR adds one more case where we automatically need to apply the
profiler "no signals" workaround (added in #2873): when the Ruby app
is running inside the passenger web server.

**Motivation**:

This makes sure the profiler does not cause issues out-of-the-box
for customers running passenger.

**Additional Notes**:

Passenger detection was inspired on
https://stackoverflow.com/a/12246652/4432562 .

This was reported in #2976
and I suspect at least one other customer has ran into this issue.

Separate from this PR, I plan to improve our documentation, as well as
try to get in touch with the passenger folks to get a fix upstream.
I'm tracking that work in the mentioned issue.

I'm also considering updating the integration apps to include passenger,
but I wanted to get in the fix asap, and will circle back to that one in
a separate PR if I can get it to work sanely.

**How to test the change?**:

Running the test-case submitted for #2976:
https://github.com/elliterate/ddtrace-passenger-example can be used
to validate that the profiler auto-enables the no signals workaround.
ivoanjo added a commit to DataDog/documentation that referenced this issue Jul 20, 2023
**What does this PR do?**:

This PR extends the Ruby profiler incompatibility list added in
#17370
with a new known issue with the Phusion Passenger web server, reported
by a customer in DataDog/dd-trace-rb#2976 .

**Motivation**:

Make sure the list is kept up-to-date, so customers can reference it.

**Additional Notes**:

Like the `rugged` gem and unlike `mysql2`, there's currently no known
versions of Phusion Passenger without this issue.
alai97 added a commit to DataDog/documentation that referenced this issue Jul 20, 2023
…9036)

* [PROF-7966] Document Ruby profiler incompatibility with passenger

**What does this PR do?**:

This PR extends the Ruby profiler incompatibility list added in
#17370
with a new known issue with the Phusion Passenger web server, reported
by a customer in DataDog/dd-trace-rb#2976 .

**Motivation**:

Make sure the list is kept up-to-date, so customers can reference it.

**Additional Notes**:

Like the `rugged` gem and unlike `mysql2`, there's currently no known
versions of Phusion Passenger without this issue.

* Copy Nit

---------

Co-authored-by: Austin Lai <76412946+alai97@users.noreply.github.com>
@ivoanjo
Copy link
Member

ivoanjo commented Jul 25, 2023

So surprisingly enough it turns out that unlike the previously-known incompatible gems because of signals, the issue with passenger is different!

I was able to pin it down to passenger resetting a bunch of signal handlers, including the one we use for profiling when it starts. Thus, the following happens:

  1. Profiler starts running, installing a SIGPROF signal handler used for data collection.
  2. Passenger resets signal handlers
  3. Profiler sends a SIGPROF signal, causing the Ruby app to crash because it no longer has a signal handler for this signal.

I've reported my findings in phusion/passenger#2489, hopefully we'll be able to integrate a fix in Passenger itself.

ivoanjo added a commit that referenced this issue Jul 25, 2023
**What does this PR do?**:

This PR adds one more webserver to the rack integration testing app:
Phusion Passenger.

**Motivation**:

In #2976 a customer reported an issue with profiling apps running on
passenger. I'm adding passenger to our test suite to avoid
any future regressions.

**Additional Notes**:

I've actually tested that the customer issue does show up on our
integration app if I revert the changes from #2978.

**How to test the change?**:

Validate that the rack CI app is running with passenger as well.
@ivoanjo
Copy link
Member

ivoanjo commented Jul 27, 2023

Btw @elliterate if you happen to have an enterprise support contract with Passenger, it may be worth dropping them a note that you're interested in that ticket. Perhaps that can help speed up stuff on their side :)

@ivoanjo
Copy link
Member

ivoanjo commented Sep 19, 2023

Update: I've submitted a fix to upstream passenger, and the fix has been accepted. I've opened a ticket asking them if they could release a 6.0.19 version including them ( phusion/passenger#2500 ).

After that version is released, I'll be able to update the profiler to auto-detect fixed passenger versions, and avoid the use of the "no signals workaround" for them 🎉

pull bot pushed a commit to astradot/dd-trace-rb that referenced this issue Nov 27, 2023
…0.19+

**What does this PR do?**

This PR is the last step to fully fixing the issue reported in DataDog#2976.

In that issue, we discovered that he passenger web server had an
incompatibility with the profiler. At the time, we added code to
detect when passenger was in use and apply the "no signals" workaround
to avoid this issue out-of-the-box.

Upstream passenger has since accepted our PR to fix the issue, allowing
us in this PR to change our detection logic to:

a) Not enable the "no signals" workaround on passenger 6.0.19+
b) Provide an actionable error message to customers on older passenger
   versions to tell them that the best option is to upgrade

**Motivation:**

Improve the out-of-the-box experience of profiler users that use the
passenger web server.

**Additional Notes:**

N/A

**How to test the change?**

I used the following `config.ru`:

```ruby
def self.fib(n)
  n <= 1 ? n : fib(n-1) + fib(n-2)
end

app = ->(env) {
  puts " ** Got request!"

  [200, {}, ["Hello, World! #{fib(30)}"]]
}
run app
```

and `Gemfile`:

```ruby
source 'https://rubygems.org'

gem 'ddtrace', path: 'path/to/local/repo'
 #gem 'passenger', '= 6.0.18'
gem 'passenger', '= 6.0.19'
gem 'puma'
```

to validate that the passenger versions are correctly detected +
request processing is not impacted on either.

Fixes DataDog#2976
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Involves a bug community Was opened by a community member profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants