Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BufRead: Only flush the internal buffer if seeking outside of it. #46832

Merged
merged 2 commits into from
Jan 14, 2018

Conversation

Diggsey
Copy link
Contributor

@Diggsey Diggsey commented Dec 19, 2017

Fixes #31100

r? @dtolnay

@dtolnay dtolnay added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Dec 19, 2017
@Diggsey
Copy link
Contributor Author

Diggsey commented Dec 19, 2017

Actually, don't merge this: there's an issue with the behaviour when seeking within a buffer and then reading past the end.

@dtolnay to fix this, I'd need to store an extra "offset" field, and then when refilling the buffer, it will have to perform an extra seek to get back to the right position if the offset is non-zero.

@kennytm kennytm added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Dec 20, 2017
@Diggsey
Copy link
Contributor Author

Diggsey commented Dec 24, 2017

So I think I'm going to need some input on this: the original code has a comment explaining that BufReader guarantees that if you call seek() followed immediately by unwrap(), the underlying reader will be at that point.

There doesn't seem to be any justification for this guarantee, and it is not clear whether this should apply to get_mut(), get_ref(), etc. which also allow access to the inner reader. If we can just drop the guarantee, the implementation becomes very straightforward. If not, there are a variety of trade-offs to consider.

@kballard it looks like you wrote the original implementation here - can you provide any input?

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have preferred a separate method specifically for seeking the underlying reader to self.pos, but at this point I believe we need to keep the guarantee about seek followed by into_inner. What are the trade-offs to consider?

@Diggsey
Copy link
Contributor Author

Diggsey commented Dec 25, 2017

To start with, all BufRead<T: Seek> will need to store a corrective offset to apply. The question is then when the offset is applied:

  1. When a seek is performed within the buffer, no seek instruction is passed to the inner reader: the offset is stored, and applied only in into_inner - in this case, while the original guarantee technically still holds, accessing the inner reader via get_mut() or get_ref() will not see the reader at the seek location.
  2. The offset is applied before performing any further reads from the inner buffer - in this case, the original guarantee holds, and also get_mut() and get_ref() will see the buffer at the seek location. However, these seeks will be completely redundant, and seek must be called a second time on the inner reader when it must next be read from, and these two extra seeks may eliminate any benefit to this optimisation in the first place.

Option 1 makes more sense to me, except that the original guarantee is not super useful if it only applies to into_inner: why not just seek after unwrapping the reader? It seems more likely that if anyone is relying on this at all, they would be relying on the implicit suggestion that the underlying reader be at the seek position immediately after a seek is performed, regardless of how that reader is accessed. If this assumption is being broken by implementing option 1, it would make more sense to go with option 0 and just drop the guarantee entirely.

@dtolnay dtolnay added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 3, 2018
@dtolnay dtolnay assigned sfackler and unassigned dtolnay Jan 5, 2018
@dtolnay
Copy link
Member

dtolnay commented Jan 5, 2018

Handing this off for a fresh set of eyes, but I think reasonable next steps would be:

  • Agree on what aspects of the behavior are acceptable to be changed -- in particular around get_ref and get_mut seeing the seeked location but also any other observable consequences of BufReader should provide a way to seek without dumping the buffer #31100.

  • Track down some use cases that involve seeking a BufReader in real code and understand how they would be impacted by seeks that do not dump the buffer.

@sfackler
Copy link
Member

sfackler commented Jan 5, 2018

@alexcrichton Do you remember why the original behavior was used?

@dtolnay
Copy link
Member

dtolnay commented Jan 5, 2018

Dumping the buffer on every seek, and guaranteeing that into_inner sees the underlying reader at the seeked position? Doesn't look like there was any discussion about that in #24176.

@alexcrichton
Copy link
Member

Oh dear unfortunately I have no recollection of adding this beyond rubber-stamping it. I don't recall any particular discussion about this particular point.

After trying to understand this for 20 minutes I also had to give up myself. I'm not familiar enough with everything at this point in time to give a recommendation myself.

@carols10cents
Copy link
Member

triage ping for you @sfackler!

@alexcrichton
Copy link
Member

The @rust-lang/libs team discussed this PR during triage today and the conclusion was that we probably can't change the behavior here (as it's explicitly documented as not doing this), but would it be a possibility to perhaps add a new inherent method and mention it in the documentation?

@Diggsey Diggsey force-pushed the bufread-cheaper-seek branch 9 times, most recently from 7119b6d to db78a33 Compare January 13, 2018 16:52
@Diggsey Diggsey force-pushed the bufread-cheaper-seek branch from db78a33 to 562ba04 Compare January 13, 2018 18:39
@Diggsey
Copy link
Contributor Author

Diggsey commented Jan 13, 2018

@alexcrichton I've updated the PR accordingly.

@alexcrichton
Copy link
Member

@bors: r+

@bors
Copy link
Contributor

bors commented Jan 13, 2018

📌 Commit c96f302 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Jan 14, 2018

⌛ Testing commit c96f302 with merge 80e2e67...

bors added a commit that referenced this pull request Jan 14, 2018
BufRead: Only flush the internal buffer if seeking outside of it.

Fixes #31100

r? @dtolnay
@bors
Copy link
Contributor

bors commented Jan 14, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 80e2e67 to master...

@bors bors merged commit c96f302 into rust-lang:master Jan 14, 2018
@Diggsey Diggsey deleted the bufread-cheaper-seek branch October 25, 2020 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants