Empty channel ID allocator after network failure #533

ArneZsng · 2017-12-08T15:44:31Z

On our development and production servers, we are observing a rare but recurring issue that seems to happen around redeployments when calling session.create_channel:
undefined method "next_channel_id" for nil:NilClass

We are running on bunny v2.7.1, ruby v2.4.2 and RabbitMQ v3.6.8.

Here is the stack trace:

NoMethodError: undefined method `next_channel_id' for nil:NilClass
  from bunny/session.rb:951:in `next_channel_id'
  from bunny/channel.rb:171:in `initialize'
  from bunny/session.rb:357:in `new'
  from bunny/session.rb:357:in `block in create_channel'
  from monitor.rb:214:in `mon_synchronize'
  from bunny/session.rb:353:in `create_channel'
  from lib/message_queue.rb:73:in `channel'

From the stack trace, it appears that @channel_id_allocator in https://github.com/ruby-amqp/bunny/blob/master/lib/bunny/session.rb#L951 is nil.

After a little research, our current hypothesis is that there is a network failure due to the redeployment that causes the @channel_id_allocator to be nil while @channels are not empty. Thus, @channel_id_allocator is not set in https://github.com/ruby-amqp/bunny/blob/master/lib/bunny/session.rb#L1200 when re-opening the connection.

Now here comes the question that is beyond our current understanding of the issue:
It looks like https://github.com/ruby-amqp/bunny/blob/master/lib/bunny/session.rb#L1200 is the only place where @channel_id_allocator would get set. So in case it is nil, it would not be set anywhere else. Based on 6f0cfff, the goal of the if-clause is to not overwrite an existing ChannelIdAllocator. Wouldn't it make sense however to have a check like the following to set a new ChannelIdAllocator in case @channel_id_allocator is not set?

      if @channels.empty? || @channel_id_allocator.nil?
        @channel_id_allocator = ChannelIdAllocator.new(@channel_max)
      end

The text was updated successfully, but these errors were encountered:

michaelklishin · 2017-12-08T15:48:52Z

I'd accept a PR that makes the call more defensive but in 99% of cases this particular exception is due to Bunny::Session#start not being invoked: #504.

Channel ID allocator and the map of channels should not get out of sync, so we either should dig deeper and provide evidence or consider this to be an app-specific issue (I don't recall it being reported in scenarios other than #504). Simply initializing a new object isn't necessarily going to be safe.

michaelklishin · 2017-12-08T15:50:38Z

Alternatively a new channel ID allocator can be, ahem, allocated and "prepopulated" with the channel IDs that are already in the map. This likely means making the entire Bunny::Session#create_channel method a critical section, which given that it goes over the network is not particularly great :/

ArneZsng · 2017-12-08T16:10:20Z

@michaelklishin Hmm yeah, I see your point about keeping the channels and the channel ID allocator in sync.
Regarding the 99% of cases: Shouldn't that be prevented as of v2.7.1 through the error message you added in 1c117c5?
Thus, the error above should be much more rare now.

Let me see if we can do anything to get a better picture of what is happening in these cases. I imagine the content of @channels to be valuable. Anything else you would recommend to keep track of?

michaelklishin · 2017-12-08T16:15:15Z

Concurrent access to Bunny::Session instances and channels map/channel ID allocator being in sync. That's about it.

lordofthelake · 2017-12-08T16:26:38Z

@michaelklishin Is Bunny::Session supposed to be thread-safe or not? Because at the moment our application works under the assumption the a session can be shared among threads, while a channel is thread specific.

Extract of our wrapper:

module MessageQueue
  # Message queue configuration object
  #
  # @!attribute url RabbitMQ url
  #   @return [String] URL to use for the connection. Format: `"amqp://user:password@host"`
  # @!attribute adapter_class Name of the class to use as client (default: `:Bunny`)
  #   @return [Symbol, String] Class name to be used as client
  #
  # @see MessageQueue.configure
  Configuration = Struct.new(:url, :adapter_class)

  class << self
    # Initializes the configuration for the message queue.
    # Must be called at least once before using the MQ. Invoking it multiple times
    # results in the modified keys being overwritten.
    #
    # @example
    #   MessageQueue.configure do |config|
    #     config.url = "amqp://user:password@rabbitmq.local"
    #     config.adapter_class = :Bunny
    #   end
    #
    # @yieldparam config [MessageQueue::Configuration] configuration object where to store the values
    def configure
      @configuration ||= Configuration.new.tap { |c| c .adapter_class = :Bunny }
      yield @configuration

      # Clean the slate, will be reconnected at the next use
      disconnect
    end

    # Returns the current configuration in use.
    # @return [MessageQueue::Configuration] current configuration
    attr_reader :configuration

    # Closes the current session and channel.
    # The next time {#session}, {#channel} or {#queue} get used, a new connection will be enstablished automatically.
    # @return [void]
    def disconnect
      return unless @session
      @session.close if @session.open?
      Thread.current[:rabbitmq_channel] = nil
      @session = nil
    end

    # Opens (if necessary) a new connection to the RabbitMQ server and returns a session object.
    # @return [Bunny::Session] an open session to the RabbitMQ server
    def session
      @session ||= configuration.adapter_class.to_s.constantize.new(
        configuration.url,
        logger: Rails.logger
      )
      unless @session.open?
        @session.start
        Thread.current[:rabbitmq_channel] = nil
      end
      @session
    end

    # Opens (if necessary) a channel for the current thread and returns the channel object.
    # Channels should not be shared between threads (they're not thread-safe).
    #
    # @return [Bunny::Channel] a channel to be used in the current thread
    def channel
      Thread.current[:rabbitmq_channel] ||= session.create_channel
    end
end

Is this assumption correct or we should have one session per thread?

michaelklishin · 2017-12-08T17:02:50Z

Thread safety is a property of a workload (what exactly are the operations performed). Bunny::Session#create_channel is supposed to be safe to use from multiple threads. Channels are not supposed to be shared (in user code, consumer dispatch is an internal concern).

Your code naively uses ||=. Even with thread-local variables without synchronization that's not a safe thing to do.

michaelklishin · 2017-12-18T08:00:26Z

I'm going to spend some time on this this week but unless I can reproduce the concurrency hazard described or find an obvious case of missing synchronisation by looking at the code, there isn't enough information to work with here. So if there's an isolated way to reproduce this, please post it some time soon.

michaelklishin · 2018-04-05T23:14:26Z

I'm afraid I can't reproduce. The change discussed here would be welcome if anyone has a way and would like to contribute a PR but I have to close this due to inactivity.

michaelklishin mentioned this issue Dec 18, 2017

Connection Parameters Handling ruby-amqp/amq-protocol#70

Closed

michaelklishin closed this as completed Apr 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty channel ID allocator after network failure #533

Empty channel ID allocator after network failure #533

ArneZsng commented Dec 8, 2017 •

edited

Loading

michaelklishin commented Dec 8, 2017

michaelklishin commented Dec 8, 2017

ArneZsng commented Dec 8, 2017 •

edited

Loading

michaelklishin commented Dec 8, 2017

lordofthelake commented Dec 8, 2017

michaelklishin commented Dec 8, 2017

michaelklishin commented Dec 18, 2017

michaelklishin commented Apr 5, 2018

Empty channel ID allocator after network failure #533

Empty channel ID allocator after network failure #533

Comments

ArneZsng commented Dec 8, 2017 • edited Loading

michaelklishin commented Dec 8, 2017

michaelklishin commented Dec 8, 2017

ArneZsng commented Dec 8, 2017 • edited Loading

michaelklishin commented Dec 8, 2017

lordofthelake commented Dec 8, 2017

michaelklishin commented Dec 8, 2017

michaelklishin commented Dec 18, 2017

michaelklishin commented Apr 5, 2018

ArneZsng commented Dec 8, 2017 •

edited

Loading

ArneZsng commented Dec 8, 2017 •

edited

Loading