Unify "str::lines()" and "str::lines_any()" and make it Unicode-correct #1557

kud1ing · 2012-01-18T14:45:44Z

str::lines() appears to split strings only on \n
str::lines_any() appears to split on \n and on \r\n.
The Unicode line separator \u2028 is currently not handled.

I think, we want only one function doing "the right thing (tm)".

See

section 5.8 in http://unicode.org/versions/Unicode5.2.0/ch05.pdf
http://unicode.org/reports/tr14/#Algorithm

The text was updated successfully, but these errors were encountered:

marijnh · 2012-01-18T14:50:14Z

There is no 'right thing' when it comes to line splitting, which I assume is exactly the reason Kevin implemented both of these functions. The document you're linking appears to have very little to do with the problem these functions solve -- it relates to breaking lines, not splitting by line.

marijnh · 2012-01-18T15:13:54Z

Feel free to discuss, or even reopen if you feel I missed something. Programming languages can not solve the problem of there being different line ending conventions or the fact that programmers will have to think about how they want to be splitting lines. Thus, I think the current provisions in the str module for this are perfectly fine as they stand.

kud1ing · 2012-01-18T15:21:46Z

The document says: "Otherwise, the following recommendations specify how to cope with an NLF [...] when interpreting characters in text". And Haskell's lines does the right thing, imho.

Since i lack the rights to reopen, i leave it up to the others.

marijnh · 2012-01-18T15:24:56Z

Haskell's lines, as far as I can find, does precisely what str::lines does.

kud1ing · 2012-01-18T15:36:01Z

Your are correct, lines only knows about \n because:

In Haskell, a newline is always represented by the character '\n'

http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/System-IO.html#25

I'd prefer it when users would not have to care about CR, LF, CRLF and NEL.

Are there use cases when you want to split only on \n but not on \r\n?

jsternberg · 2012-01-18T16:44:43Z

I think it would be worth abstracting away the different types of line endings. At the very least, if you want both functionalities, have the more commonly used function (str::lines) abstract away the line-endings, and then have another function that only splits on newlines. str::lines should split on any line feed character. If you want to split only on newlines, you would likely just use a "split" function and put the '\n' character as the character to split on.

…#1556) (rust-lang#1557) Co-authored-by: Yuki Okushi <jtitor@2k36.org> Closes rust-lang/rustc-dev-guide#1556

Test and dist for arm64 linux on CI

marijnh closed this as completed Jan 18, 2012

kud1ing mentioned this issue Feb 7, 2012

str should support a few other string en/decodings #1771

Closed

Kobzol pushed a commit to Kobzol/rust that referenced this issue Dec 30, 2024

fix examples for rustc 1.68.0-nightly (935dc07 2022-12-19) (rust-lang…

f816635

…#1556) (rust-lang#1557) Co-authored-by: Yuki Okushi <jtitor@2k36.org> Closes rust-lang/rustc-dev-guide#1556

bors pushed a commit to rust-lang-ci/rust that referenced this issue Jan 2, 2025

fix examples for rustc 1.68.0-nightly (935dc07 2022-12-19) (rust-lang…

e04658a

…#1556) (rust-lang#1557) Co-authored-by: Yuki Okushi <jtitor@2k36.org> Closes rust-lang/rustc-dev-guide#1556

bjorn3 added a commit to bjorn3/rust that referenced this issue Mar 25, 2025

Merge pull request rust-lang#1557 from rust-lang/arm64_linux_ci

a26a938

Test and dist for arm64 linux on CI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify "str::lines()" and "str::lines_any()" and make it Unicode-correct #1557

Unify "str::lines()" and "str::lines_any()" and make it Unicode-correct #1557

kud1ing commented Jan 18, 2012

marijnh commented Jan 18, 2012

marijnh commented Jan 18, 2012

kud1ing commented Jan 18, 2012

marijnh commented Jan 18, 2012

kud1ing commented Jan 18, 2012

jsternberg commented Jan 18, 2012

Unify "str::lines()" and "str::lines_any()" and make it Unicode-correct #1557

Unify "str::lines()" and "str::lines_any()" and make it Unicode-correct #1557

Comments

kud1ing commented Jan 18, 2012

marijnh commented Jan 18, 2012

marijnh commented Jan 18, 2012

kud1ing commented Jan 18, 2012

marijnh commented Jan 18, 2012

kud1ing commented Jan 18, 2012

jsternberg commented Jan 18, 2012