Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: BufferedReader function for reading without returning delimeter #11404

Closed
WebeWizard opened this issue Jan 8, 2014 · 18 comments

Comments

@WebeWizard
Copy link
Contributor

I keep running into situations where I need to read (BufferedReader) from a list of comma separated values or newline separated values and then storing the result. Should I have to trim off the delimiter every time? I feel like this is a common enough use case to be included in libstd.

//reading until newline and trimming off the newline chars.
//probably a better way to do this.
let tempvalue = lineIter.next().unwrap().to_str();
let length = tempvalue.len();
let value = tempvalue.as_slice().slice_to( length - 2 );
@bachm
Copy link

bachm commented Jan 8, 2014

+1.

@adrientetar
Copy link
Contributor

I agree. I think that this mod should be applied at least to read_line() which just reads a single line (iterators can be a different story eventually — maybe we want to keep it for the line iterator so that the total of its content is exactly equal to the originating input? I don't know).

Current behavior forces to trim EOL chars when for example casting stdin input to a numbered type.

We probably need a read_until() function (that's what read_line() is made of, and .lines() is itself made of the former) variant that pushes everything but the byte character you want to stop at.
Hope that this proposal makes sense.

cc @alexcrichton

@alexcrichton
Copy link
Member

My initial thoughts in designing the read_until function this was were that I did not want to lose data here and there. Without returning the delimiter, you have no method of knowing whether there actually was a delimiter or not (which may be useful sometimes)

That being said, this is a convenience method, so correctness/completeness may not be paramount. I think I based this off Go's interface, but I would also be curious about what other languages do as well.

@adrientetar
Copy link
Contributor

@alexcrichton We could just return an Option with None if the delimiter wasn't found.

steveklabnik/rust_for_rubyists#48 is related; the current read_line must also deal with Windows line endings:

let num = from_str::<int>(input.trim_right_chars(& &['\n', '\r']));

@davbo
Copy link
Contributor

davbo commented Feb 4, 2014

Since @alexcrichton seemed interested in what other languages do; in the Python world this would typically be handled by reading the entire file into a string and calling splitlines. Which takes advantage of an underlying Python TextIO feature of "universal newlines" in which the File IO layer hides the different types of newlines from the user; all instances of '\n', '\r' and '\r\n' are returned as '\n'. This was introduced in PEP3116.

This would be similar to using Rust's AnyLineIterator.

Of course reading the whole file in as a string isn't always a great idea. Generally I'd guess (as with @WebeWizard here) you'd be dealing with CSV's. In Python's case this is handled by a separate library. I did see one rust CSV library which looked to be struggling slightly with newlines itself.

I wonder if Rust needs higher level File IO libraries (such as csv) or if extending the BufferedReader as suggested here is a good idea for the meantime? It's also worth considering if introducing something like "universal newlines" could be easier now than later.

@arjantop
Copy link
Contributor

@alexcrichton The other point is efficiency. If you need an owned pointer you have to take a slice without the delimiter and then covert that to owned. In Go you can just slice it and you are done.

@sfackler
Copy link
Member

Why do you have to convert to owned?

@arjantop
Copy link
Contributor

@sfackler So I can send it to a channel for example

@mneumann
Copy link
Contributor

Ruby also keeps the newline characters intact:

"abc\ndef".lines # => ["abc\n", "def"]
STDIN.readline # => "the text you enter\n"

But in Ruby you can easily chop them off using String#chomp:

"abc\n".chomp # => "abc"
"abc\r\n".chomp # => "abc"
"abc".chomp # => "abc"
"abc\n\n".chomp # => "abc\n" -- Only one newline is chomped off!

I think that we should introduce a convenience function like Ruby's String#chomp.

@mneumann
Copy link
Contributor

But I would not do this directly in the reader.

@sfackler
Copy link
Member

The trim, trim_right, and trim_left functions already exist, but return slices.

@sfackler
Copy link
Member

We could make a variant of trim_right that did an in-place modification of an owned string, but I'd shy away from doing the same for trim and trim_left since that'll be a pretty expensive operation compared to slicing.

@mneumann
Copy link
Contributor

@sfackler: Chopping off the newline character is so common that there should be a utility function for this purpose. I expect chomp to also return a slice.

@sfackler
Copy link
Member

@mneumann
Copy link
Contributor

@sfackler: Yes, but they also trim whitespaces, unless you want to write input.trim_right_chars(& &['\n', '\r'])), which is very verbose and would chop off as many newline characters as there are, and in regardless which order ("\r\n" or "\n\r"). Of course the latter cannot happen when using read_line, but still I prefer a specialized "strip the newline off" method.

@sfackler
Copy link
Member

Ah, gotcha

@mneumann
Copy link
Contributor

But I would neither add any new method to BufferedReader nor to StrOwned, only one specialized funtion to StrSlice, whatever it's name may be (something like trim_line_end(), but of course I'd prefer chomp() as it's short and I know it from Ruby :))

@steveklabnik
Copy link
Member

We're now using the RFC process to deal with standard library changes, and indeed, the IO RFC is currently active. If anyone still cares about this, that's the right place to get involved.

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 25, 2022
fix: Work around Code bug with empty diagnostics

Closes rust-lang#11404
flip1995 pushed a commit to flip1995/rust that referenced this issue Sep 7, 2023
fix "derivable_impls: attributes are ignored"

*Please write a short comment explaining your change (or "none" for internal only changes)*

changelog: [`derivable_impls`]: allow the lint when the trait-impl methods has any attribute.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants