Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for io::Cursor::{split, split_mut} #86369

Open
1 of 4 tasks
soerenmeier opened this issue Jun 16, 2021 · 28 comments
Open
1 of 4 tasks

Tracking Issue for io::Cursor::{split, split_mut} #86369

soerenmeier opened this issue Jun 16, 2021 · 28 comments
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@soerenmeier
Copy link
Contributor

soerenmeier commented Jun 16, 2021

Feature gate: #![feature(cursor_split)]

impl<T> Cursor<T>
where
    T: AsRef<[u8]>,
{
    pub fn split(&self) -> (&[u8], &[u8]);
}

impl<T> Cursor<T>
where
    T: AsMut<[u8]>,
{
    pub fn split_mut(&mut self) -> (&mut [u8], &mut [u8]);
}

Steps / History

Unresolved Questions/Issues

@soerenmeier soerenmeier added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 16, 2021
@jonas-schievink jonas-schievink added the A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` label Jun 16, 2021
@yaahc
Copy link
Member

yaahc commented Jun 16, 2021

One additional concern that @mystor pointed out is that stabilizing these APIs might prevent us from adding support to non-contiguous buffers/sets of buffers to Cursor in the future. The remaining method wouldn't translate, since it returns a single slice.

We should be absolutely certain that we don't intend to extend Cursor this way in the future or we may wish to just stick with the usize returning version for now, since that one would have solved the original motivating example and it can be extended to work for multiple non-contiguous buffers.

@soerenmeier
Copy link
Contributor Author

Wouldn't that still work though? Since those contiguous buffers won't implement AsRef<[u8]>.
To make it clear that this function only returns a slice we could rename it as suggested to remaining_slice and then the implementation for contiguous buffers could have a method named for example remaining_slices.

@yaahc
Copy link
Member

yaahc commented Jun 16, 2021

Wouldn't that still work though? Since those contiguous buffers won't implement AsRef<[u8]>.

Oh, I had forgotten that remaining also had the as_ref bound. Yea, that should work fine.

To make it clear that this function only returns a slice we could rename it as suggested to remaining_slice and then the implementation for contiguous buffers could have a method named for example remaining_slices.

Sounds like a plan!

@soerenmeier
Copy link
Contributor Author

I renamed remaining to remaining_slice and added:

fn remaining(&self) -> u64;

Is that ok or should i remove it again?

@yaahc
Copy link
Member

yaahc commented Jun 17, 2021

Is that ok or should i remove it again?

sounds great

@soerenmeier soerenmeier changed the title Tracking Issue for io::Cursor::{remaining, is_empty} Tracking Issue for io::Cursor::{remaining, remaining_slice, is_empty} Jun 17, 2021
JohnTitor added a commit to JohnTitor/rust that referenced this issue Jun 21, 2021
Add `io::Cursor::{remaining, remaining_slice, is_empty}`

Tracking issue: rust-lang#86369

I came across an inconvenience when answering the following [Stack Overflow](https://stackoverflow.com/questions/67831170) question.
To get the remaining slice you have to call `buff.fill_buf().unwrap()`. Which in my opinion doesn't really tell you what is returned (in the context of Cursor). To improve readability and convenience when using Cursor i propose adding the method `remaining`.

The next thing i found inconvenient (unnecessary long) was detecting if the cursor reached the end. There are a few ways this can be achieved right now:
- `buff.fill_buf().unwrap().is_empty()`
- `buff.position() >= buff.get_ref().len()`
- `buff.bytes().next().is_none()`

Which all seem a bit unintuitive, hidden in trait documentations or just a bit long for such a simple task.
Therefor i propose another method called `is_empty`, maybe with another name, since this one may leave room for interpretation on what really is empty (the underlying slice, the remaining slice or maybe the position).

Since it seemed easier to create this PR instead of an RFC i did that, if an RFC is wanted, i can close this PR and write an RFC first.
JohnTitor added a commit to JohnTitor/rust that referenced this issue Jun 21, 2021
Add `io::Cursor::{remaining, remaining_slice, is_empty}`

Tracking issue: rust-lang#86369

I came across an inconvenience when answering the following [Stack Overflow](https://stackoverflow.com/questions/67831170) question.
To get the remaining slice you have to call `buff.fill_buf().unwrap()`. Which in my opinion doesn't really tell you what is returned (in the context of Cursor). To improve readability and convenience when using Cursor i propose adding the method `remaining`.

The next thing i found inconvenient (unnecessary long) was detecting if the cursor reached the end. There are a few ways this can be achieved right now:
- `buff.fill_buf().unwrap().is_empty()`
- `buff.position() >= buff.get_ref().len()`
- `buff.bytes().next().is_none()`

Which all seem a bit unintuitive, hidden in trait documentations or just a bit long for such a simple task.
Therefor i propose another method called `is_empty`, maybe with another name, since this one may leave room for interpretation on what really is empty (the underlying slice, the remaining slice or maybe the position).

Since it seemed easier to create this PR instead of an RFC i did that, if an RFC is wanted, i can close this PR and write an RFC first.
@seanmonstar
Copy link
Contributor

I'll just note that this change conflicts with bytes::Buf that is implemented for Cursor<impl AsRef<[u8]>>. The proposed remaining method here has a different signature, so shadowing won't work. While adding this method may be desired and ultimately the right decision (I haven't thought enough to say myself), it will have an impact on the ecosystem using bytes::Buf.

I've also noticed an inconvenience that won't affect stable: since the method is "unstable" on nightly, it triggers an error, even though the trait is in scope.

@soerenmeier
Copy link
Contributor Author

That seems unfortunate. I'd be okay with changing the signature to return a usize, I chose u64 since position is u64 but since the underlying slice cannot be bigger than usize it would be possible to change the signature. The only point to consider is if support for non-contiguous buffers is added the length could be bigger than usize, but that also could be a problem with u64.

About the error in nightly, I don't know how this is handled normally in std. But the impact should be rather small, since Cursor is only scarcely used, mostly in tests. Am I wrong?

@seanmonstar
Copy link
Contributor

I don't mean an error in libstd or it's test suite. But rather the greater ecosystem, anywhere using bytes::Buf and also running tests on nightly. For example, this PR for h2: hyperium/h2#547

@soerenmeier
Copy link
Contributor Author

I didn't either. I meant the use of Cursor in combination with bytes::Buf on nightly calling remaining. After your first comment i made a few short searches concluding with the statement above, interpreting your reaction, that was probably the wrong assessment. Imho failing nightly tests should be acceptable but addressed (if stable is the target).
Do you have a suggestion to solve this?

@seanmonstar
Copy link
Contributor

Yea, I see two solutions (so far, maybe there's a better): either libstd changes to so that nightly doesn't include the conflict, or the ecosystem that depends on bytes::Buf and runs tests on nightly all need to change.

(I agree failing nightly tests is acceptable to a degree, but at the same time, if they're all set to allow failure, it's very easy to miss that nightly is broken, and if they block the PR, then it disrupts library developers. 🤷 )

@soerenmeier
Copy link
Contributor Author

@yaahc can remaining be removed from nightly without removing it from this proposal? Or can we rename it just for nightly (ex. remaining_len)?

@yaahc
Copy link
Member

yaahc commented Jul 1, 2021

@yaahc can remaining be removed from nightly without removing it from this proposal? Or can we rename it just for nightly (ex. remaining_len)?

Yes, to either option, whichever you prefer. I'm not confident the remaining_len ident will pass muster during stabilization, so if I had to lean one way I'd say just remove it since you only really needed the slice method afaik, and it's not a long jump from getting a slice to getting the len.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Jul 12, 2021
Remove unstable `io::Cursor::remaining`

Adding `io::Cursor::remaining` in rust-lang#86037 caused a conflict with the implementation of `bytes::Buf` for `io::Cursor`, leading to an error in nightly, see rust-lang#86369 (comment).

This fixes the error by temporarily removing the `remaining` function.

r? `@yaahc`
@jkugelman
Copy link
Contributor

jkugelman commented Sep 25, 2021

It would be nice if remaining_slice could return a slice with the lifetime of the underlying buffer, rather than of the cursor.

Sometimes I want to avoid copying a large chunk of data and get a slice of the byte buffer instead. remaining_slice would be great for that, except the slice it returns doesn't live long enough.

Let's say I've read a file into memory, and now I'm pulling out individual assets:

#![feature(cursor_remaining)]

use std::io::{self, BufRead, Cursor, Read};

struct Asset<'file> {
    data: &'file [u8],
}

impl<'file> Asset<'file> {
    fn load(cursor: &mut Cursor<&'file [u8]>) -> io::Result<Self> {
        let mut len = [0u8; 2];
        cursor.read_exact(&mut len[..])?;
        let len = u16::from_le_bytes(len) as usize;

        let data = cursor.remaining_slice().get(..len)
            .ok_or(io::ErrorKind::UnexpectedEof)?;
        cursor.consume(len);

        Ok(Self { data })
    }
}

This fails to compile because the slice from remaining_slice has the lifetime of the cursor and the cursor doesn't live long enough:

error[E0621]: explicit lifetime required in the type of `cursor`
  --> src/lib.rs:19:19
   |
10 |     fn load(cursor: &mut Cursor<&'file [u8]>) -> io::Result<Self> {
   |                     ------------------------ help: add explicit lifetime `'file` to the type of `cursor`: `&'file mut std::io::Cursor<&'file [u8]>`
...
19 |         Ok(Self { data })
   |                   ^^^^ lifetime `'file` required

Playground

@soerenmeier soerenmeier changed the title Tracking Issue for io::Cursor::{remaining, remaining_slice, is_empty} Tracking Issue for io::Cursor::{remaining_slice, is_empty} Sep 27, 2021
@soerenmeier
Copy link
Contributor Author

That would be nice, but sadly i see no way to do this. I tried to annotate all lifetimes that exist when T = &'a [u8] to explain my reasoning:

impl<T: AsRef<[u8]>> Cursor<&'a [u8]> {
    pub fn remaining_slice<'b>(&'b self) -> &'b [u8] {

        let slice: &'b &'a [u8] = AsRef::as_ref(&'b self.inner);

        let slice: &'a [u8] = *slice;
        // where 'a: 'b

        slice
    }
}

It is not possible to change the returned lifetime to 'a since 'a needs to outlive 'b because &'a is borrowed with &'b.
i hope this is understandable.

@soerenmeier
Copy link
Contributor Author

@yaahc do you think we can start an FCP? Its been about 3 month since the initial PR.

@yaahc
Copy link
Member

yaahc commented Oct 5, 2021

sure.

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Oct 5, 2021

Team member @yaahc has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Oct 5, 2021
@BurntSushi
Copy link
Member

Hmmm, so I have a couple small thoughts here:

  1. If we have an is_empty method, we probably should have a len method. (And it looks like that's what remaining was before it was removed. So I wonder whether that should be added back as len.)
  2. Alternatively, we might just remove is_empty altogether. Folks can do cursor.remaining_slice().is_empty() (and cursor.remaining_slice().len() for the length).

If we went with (2), I like the name remaining better than remaining_slice, but it sounds like that would cause ecosystem issues with bytes. But remaining_slice is not too bad...

@soerenmeier
Copy link
Contributor Author

My initial PR was with remaining returning a slice, so I'm in favor of that, but it's probably not worth it.
About len, I don't know if that could be ambiguous since it could mean the len of the underlying slice or the remaining len.

What about adding remaining with the same signature as bytes::Buf and making it insta-stable (to not trigger unstable warnings)?

fn remaining(&self) -> usize;

@seanmonstar
Copy link
Contributor

I personally think cursor.remaining_slice().is_empty() is clearer than cursor.is_empty(). With a Cursor<Vec<u8>>, with a bunch of bytes in the vector, it's not immediately clear to me that the cursor itself could be empty...

@m-ou-se
Copy link
Member

m-ou-se commented Oct 27, 2021

@rfcbot concern is_empty

As pointed out above, the meaning of is_empty can be confusing:

#![feature(cursor_remaining)]

use std::io::Write;

fn main() {
    let mut buf = [0u8; 10];
    let mut cursor = std::io::Cursor::new(&mut buf[..]);
    dbg!(cursor.is_empty()); // false
    cursor.write_all(b"1234567").unwrap();
    dbg!(cursor.is_empty()); // false
    cursor.write_all(b"890").unwrap();
    dbg!(cursor.is_empty()); // true
}

Here, is_empty returns false when nothing is written yet, and true when the buffer is completely full.

@m-ou-se
Copy link
Member

m-ou-se commented Oct 27, 2021

This API seems focussed on the Read side of things. When Seeking around and Writeing data, is_empty becomes confusing and remaining_slice is not very useful. When writing, the part of the slice before the cursor is much more interesting. Maybe we should have functions for both? Or a single function that returns both slices? (If we also have a mut version, a single function would be useful to allow mutable access to both sides at once.)

@m-ou-se
Copy link
Member

m-ou-se commented Oct 27, 2021

If we have an is_empty method, we probably should have a len method. (And it looks like that's what remaining was before it was removed. So I wonder whether that should be added back as len.)

Note that the length of the underlying buffer doesn't change when reading or writing or seeking through it. The only thing that changes is the position of the cursor, and the bytes in the buffer (when writing). (E.g. seeking to the beginning will 'unempty' it and produce the already read data again.)

@jkugelman
Copy link
Contributor

jkugelman commented Oct 28, 2021

I really like that, looking at the cursor symmetrically. You're talking about something like this?

pub struct Cursor<T> {
    pub fn remaining(&self) -> &[u8];
    pub fn remaining_mut(&mut self) -> &mut [u8];
    pub fn preceding(&self) -> &[u8];
    pub fn preceding_mut(&mut self) -> &mut [u8];
    pub fn split_at(&self) -> (&[u8], &[u8]);
    pub fn split_at_mut(&mut self) -> (&mut [u8], &mut [u8]);
}

@m-ou-se
Copy link
Member

m-ou-se commented Jan 12, 2022

It doesn't look like this will go through in its current form, so I'm cancelling the proposed FCP.

@rfcbot cancel

@jkugelman Yeah, something like that. Or before_cursor, after_cursor and split_at_cursor or something in that direction. As long as it makes sense from both a Read and Write perspective (+Seek).

@rfcbot
Copy link

rfcbot commented Jan 12, 2022

@m-ou-se proposal cancelled.

@rfcbot rfcbot removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Jan 12, 2022
@glandium
Copy link
Contributor

glandium commented Jul 8, 2022

Stumbled upon these functions today. Considering the concerns, would it be worth removing the current implementation?

@soerenmeier
Copy link
Contributor Author

I once made a commit with possible alternatives but wasn't sure about if the implementation should not be generic over T: AsRef<[u8]>. That would allow to return different lifetimes for example for Cursor<&'a [u8]> so the function could look like fn split(&self) -> (&'a [u8], &'a [u8]);. That was one concern raised in #86369 (comment).

My Implementation was based on these functions:

fn split(&self) -> (&[u8], &[u8]);
fn before(&self) -> &[u8];
fn after(&self) -> &[u8];
fn split_mut(&mut self) -> (&mut [u8], &mut [u8]);
fn before_mut(&mut self) -> &mut [u8];
fn after_mut(&mut self) -> &mut [u8];

Commit: soerenmeier@eab83e3

@dtolnay dtolnay changed the title Tracking Issue for io::Cursor::{remaining_slice, is_empty} Tracking Issue for io::Cursor::{split, split_mut} Jul 28, 2024
tgross35 added a commit to tgross35/rust that referenced this issue Jul 28, 2024
Replace `io::Cursor::{remaining_slice, is_empty}`

This is a late follow up to the concerns raised in rust-lang#86369.

rust-lang#86369 (comment)
> This API seems focussed on the `Read` side of things. When `Seek`ing around and `Write`ing data, `is_empty` becomes confusing and `remaining_slice` is not very useful. When writing, the part of the slice before the cursor is much more interesting. Maybe we should have functions for both? Or a single function that returns both slices? (If we also have a `mut` version, a single function would be useful to allow mutable access to both sides at once.)

New feature name: `cursor_remaining` > `cursor_split`.
Added functions:
```rust
fn split(&self) -> (&[u8], &[u8]);
// fn before(&self) -> &[u8];
// fn after(&self) -> &[u8];
fn split_mut(&mut self) -> (&mut [u8], &mut [u8]);
// fn before_mut(&mut self) -> &mut [u8];
// fn after_mut(&mut self) -> &mut [u8];
```

A question was raised in rust-lang#86369 (comment) about whether to return a lifetime that would reflect the lifetime of the underlying bytes (`impl Cursor<&'a [u8]> { fn after(&self) -> &'a [u8] }`). The downside of doing this would be that it would not be possible to implement these functions generically over `T: AsRef<[u8]>`.

## Update
Based on the review, before* and after* methods where removed.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jul 29, 2024
Replace `io::Cursor::{remaining_slice, is_empty}`

This is a late follow up to the concerns raised in rust-lang#86369.

rust-lang#86369 (comment)
> This API seems focussed on the `Read` side of things. When `Seek`ing around and `Write`ing data, `is_empty` becomes confusing and `remaining_slice` is not very useful. When writing, the part of the slice before the cursor is much more interesting. Maybe we should have functions for both? Or a single function that returns both slices? (If we also have a `mut` version, a single function would be useful to allow mutable access to both sides at once.)

New feature name: `cursor_remaining` > `cursor_split`.
Added functions:
```rust
fn split(&self) -> (&[u8], &[u8]);
// fn before(&self) -> &[u8];
// fn after(&self) -> &[u8];
fn split_mut(&mut self) -> (&mut [u8], &mut [u8]);
// fn before_mut(&mut self) -> &mut [u8];
// fn after_mut(&mut self) -> &mut [u8];
```

A question was raised in rust-lang#86369 (comment) about whether to return a lifetime that would reflect the lifetime of the underlying bytes (`impl Cursor<&'a [u8]> { fn after(&self) -> &'a [u8] }`). The downside of doing this would be that it would not be possible to implement these functions generically over `T: AsRef<[u8]>`.

## Update
Based on the review, before* and after* methods where removed.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Jul 29, 2024
Rollup merge of rust-lang#109174 - soerenmeier:cursor_fns, r=dtolnay

Replace `io::Cursor::{remaining_slice, is_empty}`

This is a late follow up to the concerns raised in rust-lang#86369.

rust-lang#86369 (comment)
> This API seems focussed on the `Read` side of things. When `Seek`ing around and `Write`ing data, `is_empty` becomes confusing and `remaining_slice` is not very useful. When writing, the part of the slice before the cursor is much more interesting. Maybe we should have functions for both? Or a single function that returns both slices? (If we also have a `mut` version, a single function would be useful to allow mutable access to both sides at once.)

New feature name: `cursor_remaining` > `cursor_split`.
Added functions:
```rust
fn split(&self) -> (&[u8], &[u8]);
// fn before(&self) -> &[u8];
// fn after(&self) -> &[u8];
fn split_mut(&mut self) -> (&mut [u8], &mut [u8]);
// fn before_mut(&mut self) -> &mut [u8];
// fn after_mut(&mut self) -> &mut [u8];
```

A question was raised in rust-lang#86369 (comment) about whether to return a lifetime that would reflect the lifetime of the underlying bytes (`impl Cursor<&'a [u8]> { fn after(&self) -> &'a [u8] }`). The downside of doing this would be that it would not be possible to implement these functions generically over `T: AsRef<[u8]>`.

## Update
Based on the review, before* and after* methods where removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io Area: `std::io`, `std::fs`, `std::net` and `std::path` C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants