Remove fastwrite mutex #3643

ryao · 2015-07-28T14:26:26Z

The fast write mutex is intended to protect accounting, but it
introduces a scaling regression by serializing all metaslab IO behind a
mutex. The accounting done by fast writes is done via atomic operations,
which renders the fast write mutex unnecessary.

Signed-off-by: Richard Yao ryao@gentoo.org

behlendorf · 2015-07-28T17:46:43Z

@ryao can you share any data on the severity of the contention or how it impacted performance? I agree we should be able to safely drop this lock but it would be good try and quantify any potential gain.

ryao · 2015-07-28T18:21:30Z

@behlendorf The data is still being gathered. I pushed this to a pull request so that the buildbot could verify the safety before the person running benchmarks tries it. I will update this with the benchmark data afterward.

behlendorf · 2015-08-24T18:43:49Z

@ryao and updates on this?

ryao · 2015-08-30T15:49:46Z

@behlendorf The tests did not show any performance benefit. If anything, performance became slightly worse, although it was well within the margin of error. I had thought that locking would pose a problem for concurrent IO regardless of the number of vdevs, but that was wrong.

I had not had much time to think about the results until today. Having thought about them, I think that the tests themselves are not comprehensive enough because they were conducted on a single disk pool on a SSD.

That said, the mutex is unnecessary because we us atomic operations and it has caused merge conflicts when merging code from Illumos, so I would like to see it go. I am going to revise the commit message to reflect this.

The fast write mutex is intended to protect accounting, but it is redundant because all accounting is performed through atomic operations. It also serializes all metaslab IO behind a mutex, which introduces a theoretical scaling regression that the Illumos developers did not like when we showed this to them. Removing it makes the selection of the metaslab_group lock free as it is on Illumos. The selection is not quite the same without the lock because the loop races with IO completions, but any imbalances caused by this are likely to be corrected by subsequent metaslab group selections. Signed-off-by: Richard Yao <ryao@gentoo.org>

The fast write mutex is intended to protect accounting, but it is redundant because all accounting is performed through atomic operations. It also serializes all metaslab IO behind a mutex, which introduces a theoretical scaling regression that the Illumos developers did not like when we showed this to them. Removing it makes the selection of the metaslab_group lock free as it is on Illumos. The selection is not quite the same without the lock because the loop races with IO completions, but any imbalances caused by this are likely to be corrected by subsequent metaslab group selections. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#3643

behlendorf added this to the 0.6.5 milestone Jul 28, 2015

behlendorf removed this from the 0.6.5 milestone Jul 29, 2015

angstymeat mentioned this pull request Jul 31, 2015

Freeze with latest git (night of July 30) #3654

Closed

ryao force-pushed the fastwrite branch from 5dab5fa to 48d2d40 Compare August 30, 2015 16:04

ryao mentioned this pull request Oct 2, 2015

atomic operations hurt performance on large-scale NUMA systems #3752

Closed

behlendorf closed this in b10695c Jan 15, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove fastwrite mutex #3643

Remove fastwrite mutex #3643

ryao commented Jul 28, 2015

behlendorf commented Jul 28, 2015

ryao commented Jul 28, 2015

behlendorf commented Aug 24, 2015

ryao commented Aug 30, 2015

Remove fastwrite mutex #3643

Remove fastwrite mutex #3643

Conversation

ryao commented Jul 28, 2015

behlendorf commented Jul 28, 2015

ryao commented Jul 28, 2015

behlendorf commented Aug 24, 2015

ryao commented Aug 30, 2015