multithreaded buse #11

divinity76 · 2016-11-04T05:21:00Z

its messy, but seems to work fine with the busexmp.c (not tested extensively, but no problems encountered on ext2 and btrfs)

it has no safeguards against creating excessive amount of threads. pretty sure it has over 10 threads running (for a split second) during mkfs.btrfs - but again, no problems encountered. fsck can't detect any problems on ext2, "btrfs check --repair" can't detect any errors on btrfs.
reading write requests from nbd is still single-threaded (but processing/writing them are multithreaded) - not sure how to fix that, i guess i'll need several sockets to nbd
writing responses to nbd is mutex locked (effectively single-threaded), again, not sure how to fix that, i guess i'll need several sockets to nbd

i totally understand if you don't want to merge this, but i'd like an opinion of it either way ^^

its messy, but seems to work fine with the busexmp.c (not tested extensively, but no problems encountered on ext2 and btrfs) - it has no safeguards against creating excessive amount of threads. pretty sure it has over 10 threads running (for a split second) during mkfs.btrfs - but again, no problems encountered. fsck can't detect any problems on ext2, "btrfs check --repair" can't detect any errors on btrfs. - reading write requests from nbd is still single-threaded (but processing/writing them are multithreaded) - not sure how to fix that, i guess i'll need several sockets to nbd - writing responses to nbd is mutex locked (effectively single-threaded), again, not sure how to fix that, i guess i'll need several sockets to nbd i totally understand if you don't want to merge this, but i'd like an opinion of it either way ^^

bandi13 · 2016-11-04T15:01:03Z

I have concerns about your implementation. First it doesn't ensure that the replies are sent in the order the requests come in. You'll need to store the threads in a queue and handle the responses in order. Second are you sure that there are actually multiple requests coming in to NBD at once? What kinds of performance improvements did you see with multi- vs single-threading?

divinity76 · 2016-11-04T15:09:35Z

thank you for taking a look! :)

First it doesn't ensure that the replies are sent in the order the requests come in. - no, and i don't need to, the reply.handle is used to identify which request im responding to.
You'll need to store the threads in a queue and handle the responses in order.
no, i don't believe that is the case. see above.

Second are you sure that there are actually multiple requests coming in to NBD at once
yes i am. nbd does not wait for 1 request to finish before issuing more requests. for instance, using mkfs.btrfs will create a lot of simultaneous requests.

What kinds of performance improvements did you see with multi- vs single-threading?
i need more time to test that, but the theory is that it can be much faster at handling multiple slow read/write requests at once

bandi13 · 2016-11-04T17:10:31Z

Cool! You're right. Thanks for explaining. I'll have to play with that too.

divinity76 · 2016-11-04T20:50:36Z

you asked about performance. note that i believe there's several things that can be improved on the multithreaded code, for instance, starting a new thread for every little minor request is probably crazy, having a thread pool would probably be faster, remember that actually creating a thread isn't free, threads should probably be resued, etc, anyway,

i don't have anything interesting to test with at the moment, and i believe busexmp.c won't benefit much (if any) from multithreading, still:

(warning: sorry, i do not have access to a completely quiet system to test on, there will be some noise. feel free to do your own tests ofc);

creating a btrfs filesystem 100 times, single threaded buse.c running at /dev/ndb1 , and multithreaded busemt.c running at /dev/ndb0

root@Deb9DEtestX:/home/hanshenrik/BUSE# ./bench.php >/dev/null
single_timeused: 1.6004650592804
multi_timeused: 1.7799780368805
winner: single won!
margin: 0.1795129776001
root@Deb9DEtestX:/home/hanshenrik/BUSE# ./bench.php >/dev/null
single_timeused: 1.5808670520782
multi_timeused: 1.6301081180573
winner: single won!
margin: 0.049241065979004
root@Deb9DEtestX:/home/hanshenrik/BUSE# ./bench.php >/dev/null
single_timeused: 1.5171689987183
multi_timeused: 1.6563220024109
winner: single won!
margin: 0.13915300369263
root@Deb9DEtestX:/home/hanshenrik/BUSE# ./bench.php >/dev/null
single_timeused: 1.52512383461
multi_timeused: 1.5709359645844
winner: single won!
margin: 0.045812129974365
root@Deb9DEtestX:/home/hanshenrik/BUSE# ./bench.php >/dev/null
single_timeused: 1.538360118866
multi_timeused: 1.5972349643707
winner: single won!
margin: 0.058874845504761
root@Deb9DEtestX:/home/hanshenrik/BUSE# ./bench.php >/dev/null
single_timeused: 1.5681219100952
multi_timeused: 1.7586491107941
winner: single won!
margin: 0.19052720069885

bench.php:

#!/usr/bin/php
<?php
$tests=100;
$starttime=microtime(true);
for($i=0;$i<$tests;++$i){
system("mkfs.btrfs /dev/nbd0 -f");
}
$endtime=microtime(true);
$multi_timeused=($endtime-$starttime);
$starttime=microtime(true);
for($i=0;$i<$tests;++$i){
system("mkfs.btrfs /dev/nbd1 -f");
}
$endtime=microtime(true);
$single_timeused=($endtime-$starttime);
fwrite(STDERR,"single_timeused: ".$single_timeused.PHP_EOL);
fwrite(STDERR,"multi_timeused: ".$multi_timeused.PHP_EOL);
fwrite(STDERR,"winner: ");
if($single_timeused===$multi_timeused){
fwrite(STDERR,"IT's a draw!".PHP_EOL);
}elseif($single_timeused<$multi_timeused){
fwrite(STDERR,"single won!".PHP_EOL);
}else{
fwrite(STDERR,"multi won!".PHP_EOL);
}
fwrite(STDERR,"margin: ".abs($multi_timeused-$single_timeused).PHP_EOL);

(and if you wanna complain about how shitty PHP is, please do it elsewhere, like my email or /r/lolphp )

hdparm -Tt:

root@Deb9DEtestX:/mt# hdparm -Tt /dev/nbd0

/dev/nbd0:
 Timing cached reads:   17142 MB in  2.00 seconds = 8577.03 MB/sec
 Timing buffered disk reads: 128 MB in  0.16 seconds = 818.95 MB/sec
root@Deb9DEtestX:/mt# hdparm -Tt /dev/nbd0

/dev/nbd0:
 Timing cached reads:   17624 MB in  2.00 seconds = 8818.65 MB/sec
 Timing buffered disk reads: 128 MB in  0.17 seconds = 766.66 MB/sec
root@Deb9DEtestX:/mt# hdparm -Tt /dev/nbd0

/dev/nbd0:
 Timing cached reads:   17380 MB in  2.00 seconds = 8696.67 MB/sec
 Timing buffered disk reads: 128 MB in  0.17 seconds = 775.21 MB/sec
root@Deb9DEtestX:/mt#

root@Deb9DEtestX:/mt# hdparm -Tt /dev/nbd1

/dev/nbd1:
 Timing cached reads:   16590 MB in  2.00 seconds = 8301.21 MB/sec
 Timing buffered disk reads: 128 MB in  0.12 seconds = 1071.70 MB/sec
root@Deb9DEtestX:/mt# hdparm -Tt /dev/nbd1

/dev/nbd1:
 Timing cached reads:   16882 MB in  2.00 seconds = 8446.80 MB/sec
 Timing buffered disk reads: 128 MB in  0.10 seconds = 1247.83 MB/sec
root@Deb9DEtestX:/mt# hdparm -Tt /dev/nbd1

/dev/nbd1:
 Timing cached reads:   17108 MB in  2.00 seconds = 8560.67 MB/sec
 Timing buffered disk reads: 128 MB in  0.11 seconds = 1168.79 MB/sec

dd WRITE test:

root@Deb9DEtestX:/mt# dd if=/dev/zero of=/dev/nbd0 bs=1M
dd: error writing '/dev/nbd0': No space left on device
129+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.08685 s, 1.5 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/zero of=/dev/nbd0 bs=1M
dd: error writing '/dev/nbd0': No space left on device
129+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0737198 s, 1.8 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/zero of=/dev/nbd0 bs=1M
dd: error writing '/dev/nbd0': No space left on device
129+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0751926 s, 1.8 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/zero of=/dev/nbd0 bs=1M
dd: error writing '/dev/nbd0': No space left on device
129+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0694624 s, 1.9 GB/s

root@Deb9DEtestX:/mt# dd if=/dev/zero of=/dev/nbd1 bs=1M
dd: error writing '/dev/nbd1': No space left on device
129+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0897694 s, 1.5 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/zero of=/dev/nbd1 bs=1M
dd: error writing '/dev/nbd1': No space left on device
129+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0733665 s, 1.8 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/zero of=/dev/nbd1 bs=1M
dd: error writing '/dev/nbd1': No space left on device
129+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0672646 s, 2.0 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/zero of=/dev/nbd1 bs=1M
dd: error writing '/dev/nbd1': No space left on device
129+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0733574 s, 1.8 GB/s

dd READ test:

root@Deb9DEtestX:/mt# dd if=/dev/nbd0 of=/dev/null bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0422518 s, 3.2 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/nbd0 of=/dev/null bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0493657 s, 2.7 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/nbd0 of=/dev/null bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0415744 s, 3.2 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/nbd0 of=/dev/null bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0433726 s, 3.1 GB/s

root@Deb9DEtestX:/mt# dd if=/dev/nbd1 of=/dev/null bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.041047 s, 3.3 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/nbd1 of=/dev/null bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0481382 s, 2.8 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/nbd1 of=/dev/null bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0356341 s, 3.8 GB/s
root@Deb9DEtestX:/mt# dd if=/dev/nbd1 of=/dev/null bs=1M
128+0 records in
128+0 records out
134217728 bytes (134 MB, 128 MiB) copied, 0.0448003 s, 3.0 GB/s

i guess i should test actual mounted filesystem performance too, suggestions?

divinity76 · 2016-11-04T21:12:20Z

hmm, just occurred to me that the kernel's I/O caches may have screwed up my tests bigtime, i'm not sure. maybe try dd oflag=sync ?

bandi13 · 2016-11-05T13:46:03Z

Nice work on the testing. Yes, you're right about the kernel's I/O caches. They can get pretty big. I've been working on some test programs to validate random accesses as well as filesystem level tests. Take a look and see if they're of use. They can also help validate your system to make sure what you write in is what is read back out.

(For the record, the PHP thing didn't even cross my mind. Whatever gets the job done. You can always change it later if it's a problem.)

bandi13 · 2016-11-07T00:51:04Z

I was thinking about this: what happens if there's a read on a sector followed by a write. Two threads are started, and the write thread is executed first, then the read. You may lose data if it doesn't handle the requests in sequence no?

divinity76 · 2016-11-07T01:35:29Z

yeah that would be bad. however, i believe the kernel won't do that..?

it would already be in memory, the kernel could probably get those bytes from there, from it's own I/O caches, and it'd be much faster. and you know how the kernel devs love to micro-optimize the shit out of everything? (except, ahm, /proc ),

i asked this question on a linux support channel (##Linux @ freenode) , here's what i got (noise removed):

<hanshenrik> may the kernel send a write request to a block device, then send a read request to the same block device BEFORE the write request has finished?  
<hanshenrik> and if so, does the kernel expect the new data (not yet written), or the old data? 
<hanshenrik> or a horribly broken mix of the 2?
<hanshenrik> err, i mean, a read request to the same sectors*
<[R]> hanshenrik: the kenrel has cache
<hanshenrik> [R], im making a block device, it will have different speeds for reads and writes, namely, reads will be much faster. i guess i shouldn't worry about the situation of handling a read request to sectors that are currently being written by another request? 
<hanshenrik> (i won't crash or anything, but the data returned from such a request would be a random-ish combination of both)
<[R]> hanshenrik: the kernel handles all of that
<hanshenrik> thanks [R]

seems promising :) should probably ask on the Linux Kernel Mailing List too. (PS, I trust [R] , he's an ##linux oldtimer who's proven himself knowledgeable plenty times for years..)

divinity76 · 2016-11-07T01:53:21Z

im just GUESSING, and, testing needs to be done to be sure.

proc1 want to read sector 1-10
kernel sends a read request to the BD
(request not yet finished) proc2 want to read sector 1-10 (or 5-10);
kernel notice that a request to read those sectors has already been scheduled, and does not send a new request.
proc3 want to read sector 7-15
kernel notice a request to read sector 7-10 has already started, and will issue a request to
read sector 11-15...

now, what if, all the time, there was a process 0 already writing to sector 1-8? i believe the kernel would have just issued a request to read sector 9-10 instead of 1-10 for proc1-2, and 11-15 for proc3

but if there's then a proc 4 wanting to write those sectors before the read requests have been finished?

hmmmmmmmmmmmmmmmmmmmmmm idk. testing should definitely be done.

divinity76 · 2016-11-07T01:56:27Z

is it possible the kernel would just have lied to the other programs and given them what proc 4 wanted to write, rather than what was actually on the BD? or would the kernel stall proc 4 write? idk. my guess goes to a stall of proc 4 write. should test.

divinity76 · 2016-11-07T02:07:22Z

(but, if the kernel just issues the write request for proc 4 instantly, and don't want to lie to the other processes about what was actually on the BD at the time of the read request versus what is scheduled to be written, you're right, we have a problem. should test to be sure)

nixomose · 2017-03-23T19:47:21Z

All reads and writes (except for directio) go through the page cache.
So the data from the write request will be in cache when all the read requests come in and they will be satisfied by the cache. The kernel will send the writes to the block device and when the read request for the blocks it doesn't have in cache comes in, it will ask the block device for them. There's no overlap. If there was, it would be in page cache.

fruffy · 2018-02-04T19:41:50Z

This looks like an interesting contribution, is it still in consideration?

force bool on likely/unlikely

9e9dbd1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multithreaded buse #11

multithreaded buse #11

divinity76 commented Nov 4, 2016

bandi13 commented Nov 4, 2016

divinity76 commented Nov 4, 2016 •

edited

Loading

bandi13 commented Nov 4, 2016

divinity76 commented Nov 4, 2016 •

edited

Loading

divinity76 commented Nov 4, 2016

bandi13 commented Nov 5, 2016

bandi13 commented Nov 7, 2016

divinity76 commented Nov 7, 2016 •

edited

Loading

divinity76 commented Nov 7, 2016 •

edited

Loading

divinity76 commented Nov 7, 2016 •

edited

Loading

divinity76 commented Nov 7, 2016 •

edited

Loading

nixomose commented Mar 23, 2017

fruffy commented Feb 4, 2018

multithreaded buse #11

Are you sure you want to change the base?

multithreaded buse #11

Conversation

divinity76 commented Nov 4, 2016

bandi13 commented Nov 4, 2016

divinity76 commented Nov 4, 2016 • edited Loading

bandi13 commented Nov 4, 2016

divinity76 commented Nov 4, 2016 • edited Loading

divinity76 commented Nov 4, 2016

bandi13 commented Nov 5, 2016

bandi13 commented Nov 7, 2016

divinity76 commented Nov 7, 2016 • edited Loading

divinity76 commented Nov 7, 2016 • edited Loading

divinity76 commented Nov 7, 2016 • edited Loading

divinity76 commented Nov 7, 2016 • edited Loading

nixomose commented Mar 23, 2017

fruffy commented Feb 4, 2018

divinity76 commented Nov 4, 2016 •

edited

Loading

divinity76 commented Nov 4, 2016 •

edited

Loading

divinity76 commented Nov 7, 2016 •

edited

Loading

divinity76 commented Nov 7, 2016 •

edited

Loading

divinity76 commented Nov 7, 2016 •

edited

Loading

divinity76 commented Nov 7, 2016 •

edited

Loading