Skip to content
This repository has been archived by the owner on Nov 30, 2024. It is now read-only.

ThreadError: Attempt to unlock a mutex which is locked by another thread/fiber #501

Closed
ioquatix opened this issue Apr 25, 2021 · 22 comments · Fixed by #503 or #504
Closed

ThreadError: Attempt to unlock a mutex which is locked by another thread/fiber #501

ioquatix opened this issue Apr 25, 2021 · 22 comments · Fixed by #503 or #504

Comments

@ioquatix
Copy link

Subject of the issue

image

Your environment

  • Ruby version: 3.0
  • rspec-support version: 3.10.2

Steps to reproduce

This is a tricky issue to reproduce.

Essentially it looks like some of the assumptions made by reentrant_mutex.rb are no longer true in Ruby 3.0. I will review the code and try to give an update, but essentially it looks like trying to lock a ReentrantMutex from a different fiber on the same thread might be a problem.

@ioquatix
Copy link
Author

On Ruby 3 here is an example of how to produce the issue:

def mutex_failure
	m = Mutex.new
	
	f1 = Fiber.new do
		m.lock
	end
	
	f2 = Fiber.new do
		m.unlock
	end
	
	f1.resume
	f2.resume
end

mutex_failure

# ./test.rb:39:in `unlock': Attempt to unlock a mutex which is locked by another thread/fiber (ThreadError)
#	from ./test.rb:39:in `block in mutex_failure'

@ioquatix
Copy link
Author

Okay, with that basic semantic, I managed to figure out the sequence of events which cause this to fail:

#!/usr/bin/env ruby

require 'rspec/support/reentrant_mutex'

def rspec_failure
	m = RSpec::Support::ReentrantMutex.new
	
	f1 = Fiber.new do
		m.synchronize do
			puts "f1 A"
			Fiber.yield
			puts "f1 B"
		end
	end
	
	f2 = Fiber.new do
		m.synchronize do
			puts "f2 A"
			# Fiber.yield
			f1.resume
			puts "f2 B"
		end
	end
	
	f1.resume
	f2.resume
	f2.resume
end

rspec_failure

gives

f1 A
f2 A
f1 B
f2 B
/Users/samuel/.gem/ruby/3.0.0/gems/rspec-support-3.10.2/lib/rspec/support/reentrant_mutex.rb:40:in `unlock': Attempt to unlock a mutex which is locked by another thread/fiber (ThreadError)
	from /Users/samuel/.gem/ruby/3.0.0/gems/rspec-support-3.10.2/lib/rspec/support/reentrant_mutex.rb:40:in `exit'
	from /Users/samuel/.gem/ruby/3.0.0/gems/rspec-support-3.10.2/lib/rspec/support/reentrant_mutex.rb:25:in `synchronize'
	from ./test.rb:17:in `block in rspec_failure'

@ioquatix
Copy link
Author

ioquatix commented Apr 25, 2021

cc @eregon I believe we discussed that there should be no use case where a mutex is locked from one fiber and released on another, but maybe this is a valid use case? Does this mean we should relax the condition, or can you see some other solution?

@ioquatix
Copy link
Author

#!/usr/bin/env ruby

require 'monitor'

def monitor_failure
	m = Monitor.new
	
	f1 = Fiber.new do
		m.synchronize do
			puts "f1 A"
			Fiber.yield
			puts "f1 B"
		end
	end
	
	f2 = Fiber.new do
		m.synchronize do
			puts "f2 A"
			# Fiber.yield
			f1.resume
			puts "f2 B"
		end
	end
	
	f1.resume
	f2.resume
	f2.resume
end

monitor_failure

Maybe not a general bug with RSpec, we should consider moving to bugs.ruby-lang.org.

@eregon
Copy link
Contributor

eregon commented Apr 25, 2021

The m.synchronize should probably block (forever in this example) because f1 has the monitor.
Might a bug of the monitor stdlib.
if you replace Monitor.new with Mutex.new then CRuby detects a deadlock (correct behavior), and would otherwise wait if it does not detect the deadlock.

The example code here does have an intrinsic problem: a mutex/monitor section is supposed to be atomic but here the atomicity is broken by switching fibers (specifically f1 A happens without f1 B, a data structure could become corrupted if those puts are replaced by mutating state/ivars).

@eregon
Copy link
Contributor

eregon commented Apr 25, 2021

Fiber.yield inside synchronize is probably best to avoid in general, but I can imagine that's not always possible.

@eregon
Copy link
Contributor

eregon commented Apr 25, 2021

After more thoughts, this example does not acquire the lock in a reentrant way (=on the same execution stack, fibers have different stacks), so it should behave exactly the same as Mutex, and the difference is a bug of Monitor (and ReentrantMutex).

So it should raise a deadlock error, or if there is a Fiber scheduler active then schedule another Fiber.

@eregon
Copy link
Contributor

eregon commented Apr 25, 2021

@ioquatix Could you create an issue on bugs.ruby-lang.org?

@ioquatix
Copy link
Author

@ioquatix
Copy link
Author

The solution here is to change Thread.current to Fiber.current right?

@ioquatix
Copy link
Author

Okay so here is the initial problem:

image

I tried adding the following code as a hack:

require 'rspec/support/reentrant_mutex'
require 'fiber'

module RSpec
	module Support
		class ReentrantMutex
			def enter
				@mutex.lock if @owner != Fiber.current
				@owner = Fiber.current
				@count += 1
			end
		end
	end
end

Now I get other issues I need to figure out.

@ioquatix
Copy link
Author

Okay, I actually think using Fiber.current works correctly.

ioquatix added a commit to ioquatix/rspec-support that referenced this issue Apr 25, 2021
@pirj
Copy link
Member

pirj commented Apr 26, 2021

By the way, on Ruby 2.7.2 I'm getting a dead fiber called (FiberError) instead of Attempt to unlock a mutex which is locked by another thread/fiber. This is exactly what I would expect from the second f2.resume call. Is the second call necessary? On Ruby 3.0 it fails without the second f2.resume.

And I don't see that this situation changes if we replace Thread.current with Fiber.current in ReentrantMutex.

To illustrate this:

require 'fiber'

class ReentrantMutex
  def initialize
    @owner = nil
    @count = 0
    @mutex = Mutex.new
  end

  def synchronize
    enter
    yield
  ensure
    exit
  end

private

  def enter
    @mutex.lock if @owner != Fiber.current
    @owner = Fiber.current
    @count += 1
  end

  def exit
    @count -= 1
    return unless @count == 0
    @owner = nil
    @mutex.unlock
  end
end

m = ReentrantMutex.new

f1 = Fiber.new do
  m.synchronize do
    puts "f1 A"
    Fiber.yield
    puts "f1 B"
  end
  puts "f1 finish"
end

f2 = Fiber.new do
  m.synchronize do
    puts "f2 A"
    # Fiber.yield
    f1.resume
    puts "f2 B"
  end
  puts "f2 finish"
end

f1.resume
f2.resume

With Fiber.current:

$ ruby --version
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin18]
$ ruby 1.rb
f1 A
1.rb:29:in `unlock': Attempt to unlock a mutex which is locked by another thread/fiber (ThreadError)
        from 1.rb:29:in `exit'
        from 1.rb:14:in `ensure in synchronize'
        from 1.rb:14:in `synchronize'
        from 1.rb:45:in `block in <main>'
1.rb:20:in `lock': No live threads left. Deadlock? (fatal)
1 threads, 1 sleeps current:0x00007fec1b409480 main thread:0x00007fec1b409480
* #<Thread:0x00007fec1c07c0f0 sleep_forever>
   rb_thread_t:0x00007fec1b409480 native:0x000000010ca9f5c0 int:0

        from 1.rb:20:in `enter'
        from 1.rb:11:in `synchronize'
        from 1.rb:45:in `block in <main>'
$ ruby --version
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin18]
$ ruby 1.rb
f1 A
Traceback (most recent call last):
        3: from 1.rb:45:in `block in <main>'
        2: from 1.rb:11:in `synchronize'
        1: from 1.rb:20:in `enter'
1.rb:20:in `lock': deadlock; recursive locking (ThreadError)

With Thread.current:

$ ruby --version
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin18]
$ ruby 1.rb
f1 A
f2 A
f1 B
f1 finish
f2 B
1.rb:29:in `unlock': Attempt to unlock a mutex which is locked by another thread/fiber (ThreadError)
        from 1.rb:29:in `exit'
        from 1.rb:14:in `synchronize'
        from 1.rb:45:in `block in <main>'
$ ruby --version
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin18]
$ ruby 1.rb
f1 A
f2 A
f1 B
f1 finish
f2 B
f2 finish

So from my point of view, replacing Thread.current with Fiber.current:

  • breaks what used to work in Ruby 2.7.2
  • has no tangible effect in Ruby 3.0.0

Am I missing something?

@ioquatix
Copy link
Author

The piece you are missing is that with a fiber scheduler, locking a mutex on one fiber will allow other fibers to be scheduled.

@pirj
Copy link
Member

pirj commented Apr 27, 2021

Good point. I struggle to understand it from rdoc what is the default behaviour for the default fiber scheduler, is it non-blocking or blocking?

@ioquatix
Copy link
Author

ioquatix commented Apr 27, 2021

There is no default fiber scheduler, trying to lock a mutex, transferring to a different fiber and then trying to take the lock again is an error without a scheduler in place.

@pirj
Copy link
Member

pirj commented Apr 27, 2021

Also, speaking of re-entrance, it's puzzling why we nullify @owner. In a scenario like:

m.synchronize do
  m.synchronize { ... } # ok, re-entry of the same fiber
  Fiber.yield # other fiber locks the mutex. if the scheduled and the fiber are non-blocking, we proceed to the next line
  m.synchronize { ... } # BOOM, we lock again, because the `@owner` is the other fiber now
end

@pirj
Copy link
Member

pirj commented Apr 27, 2021

transferring to a different fiber and then trying to take the lock again is an error without a scheduler in place

We don't set_scheduler in the snippet above. Is Attempt to unlock a mutex which is locked by another thread/fiber an expected error in this case?

The sequence is:

  • f1 locks m
  • f1 yields (transfers to f2)
  • f2 locks m

Abstracting out from ReentrantMutex implementation, I would expect:

  1. for non-blocking scheduler/fibers the control to be returned to f1. When in releases the lock, f2.resume would resume it. No errors.
  2. for blocking, this is a deadlock, and that highlights the point that it's weird to call yield inside synchronize

@eregon
Copy link
Contributor

eregon commented Apr 27, 2021

The owner is only reset if count == 0, so it's not reset in your example:

return unless @count == 0
@owner = nil

Your example is probably just a variant of @ioquatix's, Attempt to unlock a mutex which is locked by another thread/fiber is a bug of ReentrantMutex is lock/unlock are paired (or synchronize is used).

#502 (comment) is the proper fix.

eregon added a commit to eregon/rspec-support that referenced this issue Apr 27, 2021
…ent Thread or Fiber

* In Ruby >= 3, Mutex are held per Fiber, not per Thread.
* Fixes rspec#501
eregon added a commit to eregon/rspec-support that referenced this issue Apr 28, 2021
…ent Thread or Fiber

* In Ruby >= 3, Mutex are held per Fiber, not per Thread.
* Fixes rspec#501
eregon added a commit to eregon/rspec-support that referenced this issue Apr 28, 2021
…ent Thread or Fiber

* In Ruby >= 3, Mutex are held per Fiber, not per Thread.
* Fixes rspec#501
pirj pushed a commit to eregon/rspec-support that referenced this issue Apr 28, 2021
…ent Thread or Fiber

* In Ruby >= 3, Mutex are held per Fiber, not per Thread.
* Fixes rspec#501
pirj pushed a commit that referenced this issue Apr 28, 2021
…ent Thread or Fiber

* In Ruby >= 3, Mutex are held per Fiber, not per Thread.
* Fixes #501
@ioquatix
Copy link
Author

Just to follow up with everyone here, https://bugs.ruby-lang.org/issues/17827 was back ported and should be in the next 3.0.x release.

yujinakayama pushed a commit to yujinakayama/rspec-monorepo that referenced this issue Oct 6, 2021
…y the current Thread or Fiber

* In Ruby >= 3, Mutex are held per Fiber, not per Thread.
* Fixes rspec/rspec-support#501

---
This commit was imported from rspec/rspec-support@0d64895.
yujinakayama pushed a commit to yujinakayama/rspec-monorepo that referenced this issue Oct 19, 2021
…y the current Thread or Fiber

* In Ruby >= 3, Mutex are held per Fiber, not per Thread.
* Fixes rspec/rspec-support#501

---
This commit was imported from rspec/rspec-support@a4be9d5.
@eregon
Copy link
Contributor

eregon commented Jan 6, 2023

@ioquatix See #553, maybe you could help to write a test or describe how to that represents the Fiber scheduler case + ReentrantLock? The current test doesn't use a Fiber scheduler and so doesn't really test anything interesting on Ruby 3.2+.

@ioquatix
Copy link
Author

ioquatix commented Jan 6, 2023

Thanks, I can take a look.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants