Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Fix 16550 "not authorized", TypeError, and others #324

Merged
merged 5 commits into from
Oct 16, 2014

Conversation

zarqman
Copy link
Contributor

@zarqman zarqman commented Oct 4, 2014

This set of commits fixes a number of outstanding errors, including 16550: "not authorized for query" and TypeError: no implicit conversion of nil into String.

I believe this addresses #288, #278, #252, #247, #275, #290, and others.

Nearly everything here is triggered by unstable network environments, hence the difficulty in testing them. Thanks to MongoLab for their flipflop server which was a great help. I also injected random intentional DNS failures. The combination of the two, along with the helpful script in #323, were the basis for discovery. On to the issues:

DNS address resolution did not handle the case of DNS failure on the initial call to Address#resolve. This caused node.address.ip and node.address.resolved to be invalid which had a variety of cascading effects, including a memory leak in the seeds array and the dreaded "TypeError: no implicit conversion of nil into String" error.

There are multiple codepaths to trigger Connection#connect. Only some of them actually call apply_credentials(). Further, it was possible to call #connect without resetting @credentials, causing apply_credentials() to no-op. Each of these could individually trigger the 16550 errors. All of this has been addressed.

Using the flipflop server + random DNS failures and the script noted above, the only remaining error I can cause is '13435: "not master and slaveOk=false"'. The patches in #320 fix those as well and I recommend that set of patches. I have tested the below set of patches against both moped/master as well as the tree from #320.

Call to disconnect triggers call to pool, which uses address.resolved
as a hash key. This in turn causes pollution of the connection pool.
Test for equality of Node (==,#eql?) depends on node.address.resolved.
Therefore avoid such tests (such as .include?() ) until address is
resolved. This fixes unbounded growth (read: memory leak) of the seeds
array in unstable network environments.
If existing socket is reset, perhaps due to mongod failure,
credentials hash isn't reset. This causes them to not be sent upon
reconnection. Changed to reset at connect() instead of relying on a
disconnect() that may or may not happen.

Also cache last set of credentials so they can be reused when connection
loss is detected by with_connection().
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.11%) when pulling 695971d on zarqman:fix-auth-errors into 274610f on mongoid:master.

@leemhenson
Copy link

We just ran a test against this version, and we still received the "TypeError: no implicit conversion of nil into String" (#290).

/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/connection/socket/tcp.rb:20 in "initialize"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/connection/socket/tcp.rb:20 in "block in initialize"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:119 in "handle_socket_errors"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/connection/socket/tcp.rb:20 in "initialize"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:145 in "new"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:145 in "block in connect"
/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:91 in "block in timeout"
/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35 in "block in catch"
/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35 in "catch"
/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:35 in "catch"
/vendor/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb:106 in "timeout"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/connection/socket/connectable.rb:144 in "connect"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/connection.rb:52 in "connect"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:531 in "connect"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:178 in "block in ensure_connected"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:115 in "block in connection"
/vendor/bundle/ruby/2.1.0/gems/connection_pool-2.0.0/lib/connection_pool.rb:58 in "with"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:114 in "connection"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:176 in "ensure_connected"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:589 in "block in flush"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:617 in "block in logging"
/vendor/bundle/ruby/2.1.0/gems/activesupport-4.1.6/lib/active_support/notifications.rb:159 in "block in instrument"
/vendor/bundle/ruby/2.1.0/gems/activesupport-4.1.6/lib/active_support/notifications/instrumenter.rb:20 in "instrument"
/vendor/bundle/ruby/2.1.0/gems/activesupport-4.1.6/lib/active_support/notifications.rb:159 in "instrument"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/instrumentable.rb:31 in "instrument"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:616 in "logging"
/engines/music_glue/lib/newrelic_moped/instrumentation.rb:39 in "block in logging_with_newrelic_trace"
/engines/music_glue/lib/newrelic_moped/instrumentation.rb:56 in "call"
/engines/music_glue/lib/newrelic_moped/instrumentation.rb:56 in "logging_with_newrelic_trace"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:588 in "flush"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:391 in "process"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/operation/read.rb:48 in "execute"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:648 in "read"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:90 in "command"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/node.rb:432 in "refresh"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:182 in "block in refresh"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:396 in "call"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:396 in "block in refresh_peers"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:395 in "each"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:395 in "refresh_peers"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:187 in "block in refresh"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:194 in "each"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:194 in "refresh"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:151 in "nodes"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:264 in "with_secondary"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/read_preference/secondary_preferred.rb:45 in "block in with_node"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:292 in "call"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/cluster.rb:292 in "with_retry"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/read_preference/secondary_preferred.rb:43 in "with_node"
/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e/lib/moped/query.rb:127 in "first"

@mhoran
Copy link

mhoran commented Oct 6, 2014

Testing this branch with the MongoidRnR script has shown the issue to be resolved. We're about to push this out to production to see what happens.

While this does seem to resolve the eventual failure of DNS resolution (our connection works for some time, but eventually fails), there still does seem to be some low level Resolv failure which is leading to the trouble. We added some debugging and found that in Address, resolution was failing due to a timeout. DNS seems fine for us so there could be some other lingering issue.

@zarqman
Copy link
Contributor Author

zarqman commented Oct 6, 2014

@leemhenson Thanks for testing. Your bundle path (/vendor/bundle/ruby/2.1.0/bundler/gems/moped-4c2a0a29122e) shows a commit for the simplybusiness/moped/fix-retries-and-failover tree, not mine (zarqman/moped/fix-auth-errors). Of course, you may have manually applied one patchset on top of the other, so could you confirm which patches you're running before we dig in further?

@zarqman
Copy link
Contributor Author

zarqman commented Oct 6, 2014

@mhoran It's possible the issue is with Resolv itself. In the past, I did extensive DNS work and found the Resolv library to be quite unreliable. Most of that was on Ruby 1.8. As I recall, we ended up using an alternate library instead.

My own tests when building the above patches were set to cause DNS to fail 2/3s of the time, at random. I was able to run tests for several hours without it puking, so I'm hopeful these patches at least accommodate Resolv's behavior.

@leemhenson
Copy link

@zarqman whoopsadaisy! we'll give it another go, this time with the right commit! sorry for the noise.

@leemhenson
Copy link

Ok, we've pushed the right commit this time. We'll leave it overnight to see if we get any #290 -s, but one thing I did try was trigger a stepdown on our compose.io replica set. Moped immediately failed with the error below, and did not recover without an app restart:

2014-10-06T17:20:56.554405+00:00 app[web.1]: Moped::Errors::OperationFailure (The operation: #<Moped::Protocol::Command
2014-10-06T17:20:56.554406+00:00 app[web.1]:   @length=89
2014-10-06T17:20:56.554408+00:00 app[web.1]:   @request_id=118
2014-10-06T17:20:56.554409+00:00 app[web.1]:   @response_to=0
2014-10-06T17:20:56.554410+00:00 app[web.1]:   @op_code=2004
2014-10-06T17:20:56.554412+00:00 app[web.1]:   @flags=[]
2014-10-06T17:20:56.554413+00:00 app[web.1]:   @full_collection_name="test-database.$cmd"
2014-10-06T17:20:56.554414+00:00 app[web.1]:   @skip=0
2014-10-06T17:20:56.554416+00:00 app[web.1]:   @limit=-1
2014-10-06T17:20:56.554417+00:00 app[web.1]:   @selector={:getlasterror=>1, "w"=>"majority"}
2014-10-06T17:20:56.554418+00:00 app[web.1]:   @fields=nil>
2014-10-06T17:20:56.554419+00:00 app[web.1]: failed with error 10107: "Not primary while updating test-database.users"
2014-10-06T17:20:56.554421+00:00 app[web.1]:
2014-10-06T17:20:56.554422+00:00 app[web.1]: See https://github.com/mongodb/mongo/blob/master/docs/errors.md
2014-10-06T17:20:56.554423+00:00 app[web.1]: for details about this error.):
2014-10-06T17:20:56.554425+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/operation/read.rb:50:in `block in execute'
2014-10-06T17:20:56.554427+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:594:in `[]'
2014-10-06T17:20:56.554428+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:594:in `block (2 levels) in flush'
2014-10-06T17:20:56.554430+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:593:in `map'
2014-10-06T17:20:56.554431+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:593:in `block in flush'
2014-10-06T17:20:56.554432+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:617:in `block in logging'
2014-10-06T17:20:56.554434+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/activesupport-4.1.6/lib/active_support/notifications.rb:159:in `block in instrument'
2014-10-06T17:20:56.554435+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/activesupport-4.1.6/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
2014-10-06T17:20:56.554437+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/activesupport-4.1.6/lib/active_support/notifications.rb:159:in `instrument'
2014-10-06T17:20:56.554438+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/instrumentable.rb:31:in `instrument'
2014-10-06T17:20:56.554439+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:616:in `logging'
2014-10-06T17:20:56.554445+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.5.251/lib/new_relic/agent/method_tracer.rb:69:in `block in trace_execution_scoped'
2014-10-06T17:20:56.554446+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.5.251/lib/new_relic/agent/method_tracer_helpers.rb:80:in `trace_execution_scoped'
2014-10-06T17:20:56.554448+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/newrelic_rpm-3.9.5.251/lib/new_relic/agent/method_tracer.rb:67:in `trace_execution_scoped'
2014-10-06T17:20:56.554450+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:587:in `flush'
2014-10-06T17:20:56.554451+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:358:in `pipeline'
2014-10-06T17:20:56.554453+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/operation/write.rb:47:in `execute'
2014-10-06T17:20:56.554454+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:665:in `write'
2014-10-06T17:20:56.554455+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:502:in `update'
2014-10-06T17:20:56.554456+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/query.rb:427:in `block in update'
2014-10-06T17:20:56.554457+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/cluster.rb:249:in `block in with_primary'
2014-10-06T17:20:56.554458+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:204:in `block in ensure_primary'
2014-10-06T17:20:56.554460+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/executable.rb:25:in `execute'
2014-10-06T17:20:56.554461+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/node.rb:203:in `ensure_primary'
2014-10-06T17:20:56.554462+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/cluster.rb:248:in `with_primary'
2014-10-06T17:20:56.554463+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/bundler/gems/moped-695971d5470c/lib/moped/query.rb:426:in `update'
2014-10-06T17:20:56.554464+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/mongoid-4.0.0/lib/mongoid/query_cache.rb:117:in `update_with_clear_cache'
2014-10-06T17:20:56.554465+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/mongoid-4.0.0/lib/mongoid/persistable.rb:212:in `persist_atomic_operations'
2014-10-06T17:20:56.554467+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/mongoid-4.0.0/lib/mongoid/persistable.rb:195:in `persist_or_delay_atomic_operation'
2014-10-06T17:20:56.554468+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/mongoid-4.0.0/lib/mongoid/persistable.rb:146:in `prepare_atomic_operation'
2014-10-06T17:20:56.554472+00:00 app[web.1]:   vendor/bundle/ruby/2.1.0/gems/mongoid-4.0.0/lib/mongoid/persistable/settable.rb:23:in `set'

@leemhenson
Copy link

Thanks @zarqman, #290 seems fixed for us with this branch. ☀️

@mhoran
Copy link

mhoran commented Oct 7, 2014

After 24 hours and not a single "no implicit conversion of nil into String" error in production, I'd say this is a reasonable fix for #290.

Of course, DNS resolution is still failing sporadically, but that's something else. We tried adding "resolv-replace" to our project, which uses C to re-implement Resolv, but that didn't help. We also raised the problem with Heroku, given there is only a single nameserver in /etc/resolv.conf and their DNS is a black box.

@@ -46,6 +46,7 @@ def alive?
#
# @since 1.0.0
def connect
credentials.clear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we clear credentials on connect now , instead of on disconnect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arthurnn If the socket disconnects (network issues, mongod failover, etc.), with_connection() can trigger a reconnect (sometimes without disconnect() being called). In this situation, credentials haven't been cleared and so they won't be resent, which causes the 16550 not authorized errors.

Resetting credentials ensures they're are always resent on reconnection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was brought up in my pull #286 which was rejected due to it causing some issues on 2.0. @zarqman hits the nail on the head on the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@arthurnn
Copy link
Contributor

@jonhyman , is this related to your PRs too?

arthurnn added a commit that referenced this pull request Oct 16, 2014
Fix 16550 "not authorized", TypeError, and others
@arthurnn arthurnn merged commit 3285c0a into mongoid:master Oct 16, 2014
@arthurnn
Copy link
Contributor

Thanks... pr looks good

@sschepens
Copy link

This is awesome! @arthurnn is it possible to create a new minor version so we can start deploying this into production without relying on git?

@arthurnn
Copy link
Contributor

http://rubygems.org/gems/moped/versions/2.0.1 . version 2.0.1 released with this fix..

thanks all

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants