-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very high performance overhead on JRuby #640
Comments
Thanks for the report @guizmaii! Right now our JRuby support is experimental, and doesn't receive routine attention. Before we'd move it to full support, we'll want to tackle this issue and confirm it's meeting performance expectations. That said, we're more than happy to tackle any specific performance issues you might be able to pinpoint in your own tests. Are there any particular integrations or combination of integrations that you've seen affect performance the most? Any issues you can zero in will most likely be investigated & fixed sooner, rather than later. Feel free to open any GitHub issues for each specific issue you find, and I'll work with you to figure them out! |
Hi @delner, Thanks for your answer. I'll answer your question later. Now I want to give more results of my different tests. As indicated in my initial message, one thing I still had to test was:
So, I've done these tests and here are the results ! First, Here's how I launch a Datadog OS agent on my computer: #!/usr/bin/env bash
API_KEY=<the_api_key>
docker run -d -v /var/run/docker.sock:/var/run/docker.sock:ro \
-v /proc/:/host/proc/:ro \
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
-e DD_API_KEY=${API_KEY} \
-e DD_APM_ENABLED=true \
-e DD_TAGS=env:jules_jruby_benchs \
-e DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true \
-e SD_JMX_ENABLE=true \
-p 8126:8126/tcp \
-p 8125:8125/udp \
--name datadog \
datadog/agent:latest-jmx I'm sure that this config is correct because I can see data in my Datadog dashboard 😉 Now the results of these tests with the OS agent launched ! Because I saw that with 16 Puma threads (the config I used in the initial message of this issue) the CPU wasn't used at 100% with
We can see that performances here are comparable to the ones reported in the initial message of this issue. It's important because I talked with the creators and maintainers of JRuby and their first guess was that the I'll continue to look for the reasons of these problems. Hopefully, I'll succeed to have an open-source project that reproduces these problems. Raw benchmark results for JRuby/Puma 16 threads with Raw benchmark results for JRuby/Puma 8 threads with Raw benchmark results for JRuby/Puma 4 threads with Raw benchmark results for JRuby/Puma 16 threads with See first message of this issue: #640 (comment) Raw benchmark results for JRuby/Puma 8 threads with |
It's possible there are still exceptions or Kernel#caller being used. Did the backtrace stuff go away in These numbers are rough...other than exception traces, maybe excessive locking? |
We do use a lot of locks, and there might be some places where it might not be needed. Maybe this doesn't affect MRI as much as it affects JRuby, but it might be worth experimenting with that locking to see if it makes a difference in throughput while maintaining thread safety. |
I didn't have the time today to test that. I'll do it tomorrow ! 🙂 |
@guizmaii Just glancing at datadog and I see in case of a log message with severity of ERROR it will call caller to provide more context. This will have same overhead of generating an exception (cost of exceptions in JRuby are based on having to construct a backtrace -- which is same basic cost for caller). Not saying it is this but replacing |
"Not saying it is this but replacing c = caller with c=1 in the gem and rerunning may be a super quick test"
Here are some answers I can give you after my day of work: @enebo "replacing c = caller with c=1" doesn't change anything @delner It seems that it's the When I remove the I have an open-source project where I try to reproduce the problem here: https://github.com/guizmaii/jruby_rails_datadog For now, it seems that I'm not able to reproduce it. The example is maybe too simplistic. I still have to work on that. I'm aslo experimenting https://github.com/jvm-profiling-tools/async-profiler to profile the execution with the following script: #!/usr/bin/env bash
trap "exit" INT # https://stackoverflow.com/a/32146079/2431728
timestamp() {
date +%Y-%m-%d_%H-%M-%S
}
echo "WARM UP"
wrk2 -t2 -c100 -d1m -R2000 -L http://localhost:3000/route/endpoint.json
wrk2 -t2 -c100 -d1m -R2000 -L http://localhost:3000/route/endpoint.json
wrk2 -t2 -c100 -d1m -R2000 -L http://localhost:3000/route/endpoint.json
wrk2 -t2 -c100 -d1m -R2000 -L http://localhost:3000/route/endpoint.json
wrk2 -t2 -c100 -d1m -R2000 -L http://localhost:3000/route/endpoint.json
wrk2 -t2 -c100 -d1m -R2000 -L http://localhost:3000/route/endpoint.json
echo
echo "*** Launch the profiler"
~/tools/async-profiler-1.4-macos-x64/profiler.sh start -b 50000000 $(cat /tmp/web_server.pid)
echo
echo "*** Profiler status:"
~/tools/async-profiler-1.4-macos-x64/profiler.sh status $(cat /tmp/web_server.pid)
echo
echo "========="
echo "BENCH 1"
echo "========="
echo
wrk2 -t2 -c100 -d1m -R2000 -L http://localhost:3000/route/endpoint.json
echo
echo "========="
echo "BENCH 2"
echo "========="
echo
wrk2 -t2 -c100 -d1m -R2000 -L http://localhost:3000/route/endpoint.json
echo
echo "========="
echo "BENCH 3"
echo "========="
echo
echo
echo "*** stop the profiler"
~/tools/async-profiler-1.4-macos-x64/profiler.sh stop -o summary,flat=200 -f traces-$(timestamp).txt $(cat /tmp/web_server.pid) |
Okay good to know. The Rails integration itself consists of a few different distinct parts, which we might want to activate independently from one another, to see if performance issues comes from one particular component or not. You should be able to activate Rack in one test using If we can figure out how each component contributes to the performance problems, it might help us identify some specific area we can improve upon. |
Hey, Thanks for the info @delner. I'll work on that today ! @enebo In fact, you were right about exceptions. Here's the result of a profiled execution with Show data
and here the result of a profiled execution with Show data
If we compare the two traces, here's what we immediately observe (the trace where |
There's a chance that I've found the culprit ! The line of code responsible: # Attempt to retrieve the connection from an object ID.
def self.connection_by_id(object_id)
return nil if object_id.nil?
ObjectSpace._id2ref(object_id)
rescue StandardError
nil
end |
@guizmaii Mmmm, interesting. Does this raise a lot of exceptions in JRuby? If so, what kind of exception? Maybe this isn't JRuby compatible and we should either short circuit Also, does it raise often (or at all) in MRI? |
I'm currently looking how all of this works by debugging my app. A thing I observe is that at the app boot, this function is called: def self.connection_config(object_id = nil)
object_id.nil? ? default_connection_config : connection_config_by_id(object_id)
end with def self.default_connection_config
return @default_connection_config if instance_variable_defined?(:@default_connection_config)
current_connection_name = if ::ActiveRecord::Base.respond_to?(:connection_specification_name)
::ActiveRecord::Base.connection_specification_name
else
::ActiveRecord::Base
end
connection_pool = ::ActiveRecord::Base.connection_handler.retrieve_connection_pool(current_connection_name)
connection_pool.nil? ? {} : (@default_connection_config = connection_pool.spec.config)
rescue StandardError
{}
end And this call works fine. The {:adapter=>"mysql2", :host=>"127.0.0.1", :database=>"mydb", :username=>"root", :password=>nil} But then, when an HTTP call arrives, the What is this |
Yeah, so you've encountered the challenge with ActiveRecord notifications, haha. This ActiveRecord code that we wrote is attempting to accomplish two major things:
I have to look at it again, but if I remember correctly, this If this always fails in JRuby because |
FYI, @headius said this:
Here: https://twitter.com/headius/status/1068188351012966402 Another information, thanks to this ugly fix ee61d24 and some additional tests, I can confirm that it's only the previously cited line of code that is problematic. |
Great, this has been really helpful. I'm glad we've been able to narrow this down to something very specific. Can I suggest you try forking |
Already done with the ugly fix I talked about. The performances are great ! Just the same as without |
Awesome! Sorry if I misunderstood that. Sounds like this is something we can/should fix right away then. |
I proposed an evolution to Rails here: rails/rails#34568. Maybe there's another way to solve that but I'm not an expert in Ruby nor in Rails to know this other way. 😕 |
At the very least, if the Rails guys can't/won't make a change like this, I could probably try introducing a monkey-patch that injects this behavior into Rails. That would be very sub-optimal though, given it'd probably be very brittle between Rails versions. |
IMO, Rails 3 and 4 can be monkey-patched without risks because they don't evolve anymore. For Rails >4, we could use the solution the Rails guys will propose us. Let's see what they propose. |
I wanted to clarify my statement about id2ref above... The problem with it in MRI is that the object_id is basically just the pointer to the object on the heap. Using a pointer to recover the object can fail spectacularly if the object has been collected and that pointer address now contains a new object. In addition, using object_id makes objects "shady" (I believe...@tenderlove or @ko1 can confirm) which interferes with generational garbage collection (but this is AR's fault for using it, not yours). The right fix is most definitely to move away from object_id as a key for anything ever and provide a proper weak map or registry for these sorts of things. And I have no idea what that would require in Rails 4 😀 |
Hopefully we'll not need to monkey-patch 🙂 |
I found a solution ! (See rails/rails#34568 (comment)) I'll implement the fix 😉 |
The fix is implemented. See #646. But tests fail for strange reasons: 1) Failure:
TracerActiveRecordTest#test_instantiation_tracing [/app/test/contrib/sinatra/tracer_activerecord_test.rb:145]:
--- expected
+++ actual
@@ -1,2 +1,2 @@
# encoding: UTF-8
-"sinatra"
+"defaultdb"
2) Failure:
TracerActiveRecordTest#test_request [/app/test/contrib/sinatra/tracer_activerecord_test.rb:86]:
--- expected
+++ actual
@@ -1,2 +1,2 @@
# encoding: UTF-8
-"sqlite"
+"sinatra" |
@guizmaii I don't think that's going to work as its currently implemented. Unfortunately it's not just a matter of grabbing The Active record tests fail accordingly:
I would recommend re-opening that Rails PR, since we do need access to the connection object or name, and looping through the list of all connections everytime the user does a query could be a sizeable impact on performance (probably more expensive than what we already have for MRI.) |
On the current MRI, |
@ko1 Thank you for clarifying! JRuby also sees little impact but only because we do not normally support _id2ref without a command-line flag. With id2ref enabled, however, the impact is very large. In the future, object_id needs to either go away (best) or become and opaque, idempotent value (very complicated). In any case...nobody should use object_id or id2ref ever. |
@delner I proposed a new, more conservative, fix here: #647 I still have the same errors in CI that I don't understand: 1) Failure:
TracerActiveRecordTest#test_instantiation_tracing [/app/test/contrib/sinatra/tracer_activerecord_test.rb:145]:
--- expected
+++ actual
@@ -1,2 +1,2 @@
# encoding: UTF-8
-"sinatra"
+"sqlite"
2) Failure:
TracerActiveRecordTest#test_request [/app/test/contrib/sinatra/tracer_activerecord_test.rb:86]:
--- expected
+++ actual
@@ -1,2 +1,2 @@
# encoding: UTF-8
-"sqlite"
+"sinatra"
18 runs, 140 assertions, 2 failures, 0 errors, 0 skips
``` |
@guizmaii Left a comment on your PR for this CI issue. |
Your solution is fun. Way too complex for a Scala dev like me to fully understand but I think I see what you're doing in the #649 patch. 🙂 I just ended something else I was working on this week and it's friday evening here. So, I'll test you patch on monday morning ! Thanks for your help and responsiveness 🙂 @headius does this kind of meta programming has a cost with JRuby ? |
Yeah, its slightly more complicated (not that complexity is ever preferable), but its meant to mimic that Rails PR behavior which adds Still, I'll be interested to see what kind of improvement you find it makes on your end, and for JRuby. |
Merged your PR to |
Likewise @guizmaii, you were extraordinarily helpful! |
Hi Datadog,
I'm migrating a Rails (4.2.10) app from MRI Ruby 2.3.8 and Unicorn to JRuby 9.1.17.0 and Puma.
The
dd-trace-rb
version used is0.17.2
While doing some performance tests, I saw a big difference between my benchmarks in dev mode and prod mode with JRuby:
I found that disabling the Datadog agent (by surrounding the
Datadog.configure do |c| ...
code in the Rails initializer with anif false
) restored the prod mode performances (~320req/s) !I also tested the differences in prod mode with MRI Ruby and Unicorn between Datadog agent enabled vs disabled.
While there's a difference, it's not as huge as with JRuby/ Puma:
So, let's summarise these numbers:
Another interesting thing to notice is that CPU usage with JRuby/Puma is not at 100% when
dd-trace-rb
is disabled while it's full 100% when enabled. 🤔I should precise that the Datadog agent is not installed in my OS, so all calls from the
dd-trace-rb
in my app to this OS agent failed.Maybe, this could be the reason of the problem. I don't know for now. I still have to test that.
Here's how I configure
dd-trace-rb
:Gemfile
myapp/config/initializers/datadog-tracer.rb
Here's the script I use to benchmark.
It uses this tool https://github.com/giltene/wrk2
Raw benchmark results for JRuby/Puma with
dd-trace-rb
disabled (sorry it's small)Show data
Raw benchmark results for JRuby/Puma with
dd-trace-rb
enabled (sorry it's small)Show data
Raw benchmark results for MRI/Unicorn with
dd-trace-rb
disabled (sorry it's small)Show data
Raw benchmark results for MRI/Unicorn with
dd-trace-rb
enabled (sorry it's small)Show data
The text was updated successfully, but these errors were encountered: