-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop ReadFromWithConcurrency submitting out of order writes #480
Conversation
Before this change, ReadFromWithConcurrency would very often submit writes to the server out of order due to small timing differences when lots of blocks were queued and ready to send. This causes more work for the server as seeking about the file isn't free. Writing chunks of a file out of order can also cause fragmentation and consequently loss of performance on that file. This patch causes the write to the channel to be handled synchronously but the acks are received asynchronously allowing for many write packets to be in flight as before.
I speed tested this over a long wide pipe and I couldn't see any difference in speed before and after the patch, both about 40 MiB/s falling all the way back to 900KiB/s with concurrent writes disabled as you'd expect. |
Thinking about this some more
|
Hi, thanks for the contribution, it seems fine to me, however this code was written by @puellanivis, at the moment she has some personal issues, please wait some time for her to have a chance to have a look to the code, thanks for your patience |
Indeed, sequential ordering of requests is probably a good idea, and I think I was writing it into the client updates. There are some things I think that should be addressed, but as @drakkan mentioned, I'm currently out. (In hospital) and haven't been able to really do anything yet. This also happened as I was moving into a new apartment, and now missed the appointment for connecting my internet. 😫 |
Thanks for taking a look and happy to make changes when you are up to it. No hurry though and look after yourself. |
Oh, I did want to mention, I did similar work already in PR 443, it might help to review that PR, and compare to what you're doing here. That'll end up being one of the first things I do myself. |
Any thoughts on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had recommended reviewing PR-443 and paralleling what was done there, as it would be the first thing I would do. Doing so helped me spot a bug in the existing code, that we can fix as part of this PR.
Also, a number of little “trivial” things could have been caught ahead of review that would nevertheless would make the code inconsistent between two highly similar tasks.
@@ -1424,31 +1424,44 @@ func (f *File) Write(b []byte) (int, error) { | |||
return n, err | |||
} | |||
|
|||
func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) { | |||
typ, data, err := f.c.sendPacket(ch, &sshFxpWritePacket{ | |||
func (f *File) startWriteChunkAt(ch chan result, b []byte, off int64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function no longer provides any further functionality than just calling the f.c.dispatchRequest
itself. Consider how in PR-443, I inlined the calls to dispatchRequest
primarily because I also needed to get the ID
used, in order to pass it into the later unmarshalStatus
.
🤔 Actually, the existing code has a bug, that lead you astray here. See the later comment.
return nil | ||
} | ||
|
||
func (f *File) writeChunkAt(ch chan result, b []byte, off int64) (int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure I see the benefits of reimplementing writeChunkAt
in terms of the new functions defined. Almost all of this is still already handled by sendPacket
.
And, the existing writeChunkAt
does not return len(b)
if err != nil
. This new implementation would return len(b) regardless of if there were an error or not.
This is one of the main reasons I didn’t bother refactoring readChunkAt
in PR-443. Refactoring comes with risks, and it’s easy to introduce bugs accidentally. I prefer better to give a new path, and when all references to the old path are gone, then you can remove it safely. Until then, who knows what functionality people are accidentally depending on…
@@ -1628,9 +1642,11 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64 | |||
n, err := r.Read(b) | |||
if n > 0 { | |||
read += int64(n) | |||
ch := make(chan result, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use newResChanPool(concurrency)
as shown in PR-443, to avoid unnecessarily allocation of new channels for every write, rather than reusing ones that have already been received on. (The chan result
used by dispatchRequest
is a “guaranteed one receive, and only one receive” design, which allows us pool them.)
@@ -1628,9 +1642,11 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64 | |||
n, err := r.Read(b) | |||
if n > 0 { | |||
read += int64(n) | |||
ch := make(chan result, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ch
is a bit too generic of a name. There a number of channels already in scope here, and using “ch” suggests that this is The Channel—the singular one available in scope. Since this is one of many relevant channels in scope, a more specific name is desirable. Since this is the channel we will be receiving our results
on, using res
is a more appropriate name, same as PR-443.
if err != nil { | ||
return 0, err | ||
return err | ||
} | ||
|
||
switch typ { | ||
case sshFxpStatus: | ||
id, _ := unmarshalUint32(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow. There’s a bug in this existing code. (My bad.) This id
value here has to be the same as the ID
field in the packet above. This is why the id
is passed in PR-443 through the work channel, so that it can be properly tested for.
This mostly works because we’re never supposed to get an id mismatch here, but it’s still possible, and we should be handling it correctly if it does. (Basically, everyone has just been lucky so far.)
I noticed this bug while wondering why this PR didn’t also include the id
in the channel, like PR-443
did. Answer: you probably weren’t aware of how important it was, since all you had was this buggy code I wrote. 😆
|
||
select { | ||
case workCh <- work{b, n, off}: | ||
case workCh <- work{b, n, off, ch}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR-443 ordered the parameters here differently. I would like to see code parallel each other as much as possible for things like this. (i.e. having a “standard” order for these fields is good, even if the original choice is entirely arbitrary.)
Also, note as per the bug I discovered in my code, we need to be passing the id
properly as well, the same as PR-443.
Finally, the field n
is no unused, since there is no reference to the field after this assignment. Please remove it the same as we would any other unused variable assignment.
@@ -1601,6 +1614,7 @@ func (f *File) ReadFromWithConcurrency(r io.Reader, concurrency int) (read int64 | |||
b []byte | |||
n int | |||
off int64 | |||
ch chan result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, ch
here is too non-specific when there are more than one relevant channels in scope.
Functionality was merged in with #482 |
This is for discussion. This is causing a real problem in rclone rclone/rclone#5806 but there may be a better way of fixing.
Before this change, ReadFromWithConcurrency would very often submit writes to the server out of order due to small timing differences when lots of blocks were queued and ready to send.
This causes more work for the server as seeking about the file isn't free. Writing chunks of a file out of order can also cause fragmentation and consequently loss of performance on that file.
This patch causes the write to the channel to be handled synchronously but the acks are received asynchronously allowing for many write packets to be in flight as before.