-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuring tracing on specific Mongoid client leads to clients not reconnecting on a stepdown #1211
Comments
👋 Hey @kitop , thank's for the report here. Sorry to hear that this is causing issues. A few quick questions I have to clarify things here so that I can try to reproduce this on my end.
|
Just following up here. Having a bit of trouble reproducing this, so if there's any above info you can provide it would be helpful. It's also occurred to me that if there's any error messages or stacktraces here there may be some sensitive info involved, so it may be best to open up a support ticket through |
Hi @ericmustin, thanks for the comments!
Correct. It only happens when having both enabled. We've used just
Sure! So, when a stepdown happens we usually see some connection errors for any MongoDB quries that are in flight in that moment, but the client reconnects on error and following queries succeed. Some of the errors we saw are: (changed hostnames and replica set names)
and
The stacktraces we have are from the web server timeout and not so much from ddtrace/mongo. MongoDB client mostly logged these kind of errors.
I'm working on getting a minimal reproduction case, but having trouble doing so. We can consistently reproduce in our staging and production environments, but I'm having trouble doing so locally. Will keep trying and update if any updates. We're gonna try with 0.38.0 and report back. We had done an upgrade to 0.41 this week but had to roll back due to memory leeks (will report that in a different issue) |
@kitop Ok, that's very helpful. Thanks for the patience here. I will try again given the above and see if I can narrow down what's causing this for mongo. In the meantime if you want to name a specific client a different service name, I'd suggest as a workaround that you may want to use a span processor, Regarding the memory leak, that's definitely not good and something we want to fix. The same suggestion as above applies here
Additionally, any info about the leak, as well as your dd-trace-rb config |
Oh that's interesting! Thanks for the pointer! Will share with the team. |
👋 @kitop, I did some investigation on this issue today, and I couldn't find any particular difference between The only thing I can think is that our instrumentation is throwing errors, which could cause commands to fail. I believe you already confirmed above that there are no log lines that trace back to In that case, we can try one thing: could you tell me what version of the tracer you have deployed? I can create a custom pre-release build with additional logging or changes to the Mongo instrumentation. If you'd be able to install those, at least in your staging environment that would be very helpful. |
Hi @marcotc thanks for looking at this! I can't commit to do that now, as we don't have the time. The way we've fixed it was by doing something like this: begin
Mongoid.override_client(:secondary_preferred)
client = Mongoid.client(:secondary_preferred)
pin = Datadog::Pin.get_from(client)
pin.service_name = "mongodb-secondary"
client.datadog_pin = pin
# do the work
ensure
Mongoid.override_client(nil)
end It's quite hacky, but it works. We were suspecting it could have something to do on how the pin is set for replicas, but couldn't really come to a conclusion. Thanks! |
hi guys, can this approach work? #1423 |
I believe this was fixed by #1423, please do let us know if there's something to be improved/fixed still :) |
When using Mongoid (v7.1.2) and overriding the global configuration for a certain client instance as detailed in the docs by doing
Datadog.configure(client, options)
, the mongo clients don't reconnect after a stepdown happens in MongoDB. This leads to downtime in the application since it can no longer communicate with the database until the process is restarted.We have the following configuration with Mongoid:
The tracing works fine this way, but it's causing reconnection issues when there's a stepdown. Looks like all clients have this issue and not just the one configured with
Datadog.configure
, this might be due to how Mongoid performs the connection, but tracing should not affect that at all.This was observed with dd-trace-rb version 0.37.0, Mongoid 7.1.2, Rails 6.0.3.3 and Ruby 2.6.6.
This looks somewhat related to #729
The text was updated successfully, but these errors were encountered: