Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove fastwrite mutex #3643

Closed
wants to merge 1 commit into from
Closed

Remove fastwrite mutex #3643

wants to merge 1 commit into from

Conversation

ryao
Copy link
Contributor

@ryao ryao commented Jul 28, 2015

The fast write mutex is intended to protect accounting, but it
introduces a scaling regression by serializing all metaslab IO behind a
mutex. The accounting done by fast writes is done via atomic operations,
which renders the fast write mutex unnecessary.

Signed-off-by: Richard Yao ryao@gentoo.org

@behlendorf
Copy link
Contributor

@ryao can you share any data on the severity of the contention or how it impacted performance? I agree we should be able to safely drop this lock but it would be good try and quantify any potential gain.

@behlendorf behlendorf added this to the 0.6.5 milestone Jul 28, 2015
@ryao
Copy link
Contributor Author

ryao commented Jul 28, 2015

@behlendorf The data is still being gathered. I pushed this to a pull request so that the buildbot could verify the safety before the person running benchmarks tries it. I will update this with the benchmark data afterward.

@behlendorf behlendorf removed this from the 0.6.5 milestone Jul 29, 2015
@behlendorf
Copy link
Contributor

@ryao and updates on this?

@ryao
Copy link
Contributor Author

ryao commented Aug 30, 2015

@behlendorf The tests did not show any performance benefit. If anything, performance became slightly worse, although it was well within the margin of error. I had thought that locking would pose a problem for concurrent IO regardless of the number of vdevs, but that was wrong.

I had not had much time to think about the results until today. Having thought about them, I think that the tests themselves are not comprehensive enough because they were conducted on a single disk pool on a SSD.

That said, the mutex is unnecessary because we us atomic operations and it has caused merge conflicts when merging code from Illumos, so I would like to see it go. I am going to revise the commit message to reflect this.

The fast write mutex is intended to protect accounting, but it is
redundant because all accounting is performed through atomic operations.
It also serializes all metaslab IO behind a mutex, which introduces a
theoretical scaling regression that the Illumos developers did not like
when we showed this to them. Removing it makes the selection of the
metaslab_group lock free as it is on Illumos. The selection is not quite
the same without the lock because the loop races with IO completions,
but any imbalances caused by this are likely to be corrected by
subsequent metaslab group selections.

Signed-off-by: Richard Yao <ryao@gentoo.org>
goulvenriou pushed a commit to Alyseo/zfs that referenced this pull request Jan 17, 2016
The fast write mutex is intended to protect accounting, but it is
redundant because all accounting is performed through atomic operations.
It also serializes all metaslab IO behind a mutex, which introduces a
theoretical scaling regression that the Illumos developers did not like
when we showed this to them. Removing it makes the selection of the
metaslab_group lock free as it is on Illumos. The selection is not quite
the same without the lock because the loop races with IO completions,
but any imbalances caused by this are likely to be corrected by
subsequent metaslab group selections.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#3643
goulvenriou pushed a commit to Alyseo/zfs that referenced this pull request Feb 4, 2016
The fast write mutex is intended to protect accounting, but it is
redundant because all accounting is performed through atomic operations.
It also serializes all metaslab IO behind a mutex, which introduces a
theoretical scaling regression that the Illumos developers did not like
when we showed this to them. Removing it makes the selection of the
metaslab_group lock free as it is on Illumos. The selection is not quite
the same without the lock because the loop races with IO completions,
but any imbalances caused by this are likely to be corrected by
subsequent metaslab group selections.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#3643
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants