Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty channel ID allocator after network failure #533

Closed
ArneZsng opened this issue Dec 8, 2017 · 8 comments
Closed

Empty channel ID allocator after network failure #533

ArneZsng opened this issue Dec 8, 2017 · 8 comments

Comments

@ArneZsng
Copy link

ArneZsng commented Dec 8, 2017

On our development and production servers, we are observing a rare but recurring issue that seems to happen around redeployments when calling session.create_channel:
undefined method "next_channel_id" for nil:NilClass

We are running on bunny v2.7.1, ruby v2.4.2 and RabbitMQ v3.6.8.

Here is the stack trace:

NoMethodError: undefined method `next_channel_id' for nil:NilClass
  from bunny/session.rb:951:in `next_channel_id'
  from bunny/channel.rb:171:in `initialize'
  from bunny/session.rb:357:in `new'
  from bunny/session.rb:357:in `block in create_channel'
  from monitor.rb:214:in `mon_synchronize'
  from bunny/session.rb:353:in `create_channel'
  from lib/message_queue.rb:73:in `channel'

From the stack trace, it appears that @channel_id_allocator in https://github.com/ruby-amqp/bunny/blob/master/lib/bunny/session.rb#L951 is nil.

After a little research, our current hypothesis is that there is a network failure due to the redeployment that causes the @channel_id_allocator to be nil while @channels are not empty. Thus, @channel_id_allocator is not set in https://github.com/ruby-amqp/bunny/blob/master/lib/bunny/session.rb#L1200 when re-opening the connection.

Now here comes the question that is beyond our current understanding of the issue:
It looks like https://github.com/ruby-amqp/bunny/blob/master/lib/bunny/session.rb#L1200 is the only place where @channel_id_allocator would get set. So in case it is nil, it would not be set anywhere else. Based on 6f0cfff, the goal of the if-clause is to not overwrite an existing ChannelIdAllocator. Wouldn't it make sense however to have a check like the following to set a new ChannelIdAllocator in case @channel_id_allocator is not set?

      if @channels.empty? || @channel_id_allocator.nil?
        @channel_id_allocator = ChannelIdAllocator.new(@channel_max)
      end
@michaelklishin
Copy link
Member

I'd accept a PR that makes the call more defensive but in 99% of cases this particular exception is due to Bunny::Session#start not being invoked: #504.

Channel ID allocator and the map of channels should not get out of sync, so we either should dig deeper and provide evidence or consider this to be an app-specific issue (I don't recall it being reported in scenarios other than #504). Simply initializing a new object isn't necessarily going to be safe.

@michaelklishin
Copy link
Member

Alternatively a new channel ID allocator can be, ahem, allocated and "prepopulated" with the channel IDs that are already in the map. This likely means making the entire Bunny::Session#create_channel method a critical section, which given that it goes over the network is not particularly great :/

@ArneZsng
Copy link
Author

ArneZsng commented Dec 8, 2017

@michaelklishin Hmm yeah, I see your point about keeping the channels and the channel ID allocator in sync.
Regarding the 99% of cases: Shouldn't that be prevented as of v2.7.1 through the error message you added in 1c117c5?
Thus, the error above should be much more rare now.

Let me see if we can do anything to get a better picture of what is happening in these cases. I imagine the content of @channels to be valuable. Anything else you would recommend to keep track of?

@michaelklishin
Copy link
Member

Concurrent access to Bunny::Session instances and channels map/channel ID allocator being in sync. That's about it.

@lordofthelake
Copy link

@michaelklishin Is Bunny::Session supposed to be thread-safe or not? Because at the moment our application works under the assumption the a session can be shared among threads, while a channel is thread specific.

Extract of our wrapper:

module MessageQueue
  # Message queue configuration object
  #
  # @!attribute url RabbitMQ url
  #   @return [String] URL to use for the connection. Format: `"amqp://user:password@host"`
  # @!attribute adapter_class Name of the class to use as client (default: `:Bunny`)
  #   @return [Symbol, String] Class name to be used as client
  #
  # @see MessageQueue.configure
  Configuration = Struct.new(:url, :adapter_class)

  class << self
    # Initializes the configuration for the message queue.
    # Must be called at least once before using the MQ. Invoking it multiple times
    # results in the modified keys being overwritten.
    #
    # @example
    #   MessageQueue.configure do |config|
    #     config.url = "amqp://user:password@rabbitmq.local"
    #     config.adapter_class = :Bunny
    #   end
    #
    # @yieldparam config [MessageQueue::Configuration] configuration object where to store the values
    def configure
      @configuration ||= Configuration.new.tap { |c| c .adapter_class = :Bunny }
      yield @configuration

      # Clean the slate, will be reconnected at the next use
      disconnect
    end

    # Returns the current configuration in use.
    # @return [MessageQueue::Configuration] current configuration
    attr_reader :configuration

    # Closes the current session and channel.
    # The next time {#session}, {#channel} or {#queue} get used, a new connection will be enstablished automatically.
    # @return [void]
    def disconnect
      return unless @session
      @session.close if @session.open?
      Thread.current[:rabbitmq_channel] = nil
      @session = nil
    end

    # Opens (if necessary) a new connection to the RabbitMQ server and returns a session object.
    # @return [Bunny::Session] an open session to the RabbitMQ server
    def session
      @session ||= configuration.adapter_class.to_s.constantize.new(
        configuration.url,
        logger: Rails.logger
      )
      unless @session.open?
        @session.start
        Thread.current[:rabbitmq_channel] = nil
      end
      @session
    end

    # Opens (if necessary) a channel for the current thread and returns the channel object.
    # Channels should not be shared between threads (they're not thread-safe).
    #
    # @return [Bunny::Channel] a channel to be used in the current thread
    def channel
      Thread.current[:rabbitmq_channel] ||= session.create_channel
    end
end

Is this assumption correct or we should have one session per thread?

@michaelklishin
Copy link
Member

Thread safety is a property of a workload (what exactly are the operations performed). Bunny::Session#create_channel is supposed to be safe to use from multiple threads. Channels are not supposed to be shared (in user code, consumer dispatch is an internal concern).

Your code naively uses ||=. Even with thread-local variables without synchronization that's not a safe thing to do.

@michaelklishin
Copy link
Member

I'm going to spend some time on this this week but unless I can reproduce the concurrency hazard described or find an obvious case of missing synchronisation by looking at the code, there isn't enough information to work with here. So if there's an isolated way to reproduce this, please post it some time soon.

@michaelklishin
Copy link
Member

I'm afraid I can't reproduce. The change discussed here would be welcome if anyone has a way and would like to contribute a PR but I have to close this due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants