Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify "str::lines()" and "str::lines_any()" and make it Unicode-correct #1557

Closed
kud1ing opened this issue Jan 18, 2012 · 6 comments
Closed

Comments

@kud1ing
Copy link

kud1ing commented Jan 18, 2012

  • str::lines() appears to split strings only on \n
  • str::lines_any() appears to split on \n and on \r\n.
  • The Unicode line separator \u2028 is currently not handled.

I think, we want only one function doing "the right thing (tm)".

See

@marijnh
Copy link
Contributor

marijnh commented Jan 18, 2012

There is no 'right thing' when it comes to line splitting, which I assume is exactly the reason Kevin implemented both of these functions. The document you're linking appears to have very little to do with the problem these functions solve -- it relates to breaking lines, not splitting by line.

@marijnh marijnh closed this as completed Jan 18, 2012
@marijnh
Copy link
Contributor

marijnh commented Jan 18, 2012

Feel free to discuss, or even reopen if you feel I missed something. Programming languages can not solve the problem of there being different line ending conventions or the fact that programmers will have to think about how they want to be splitting lines. Thus, I think the current provisions in the str module for this are perfectly fine as they stand.

@kud1ing
Copy link
Author

kud1ing commented Jan 18, 2012

The document says: "Otherwise, the following recommendations specify how to cope with an NLF [...] when interpreting characters in text". And Haskell's lines does the right thing, imho.

Since i lack the rights to reopen, i leave it up to the others.

@marijnh
Copy link
Contributor

marijnh commented Jan 18, 2012

Haskell's lines, as far as I can find, does precisely what str::lines does.

@kud1ing
Copy link
Author

kud1ing commented Jan 18, 2012

Your are correct, lines only knows about \n because:

In Haskell, a newline is always represented by the character '\n'

http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/System-IO.html#25

I'd prefer it when users would not have to care about CR, LF, CRLF and NEL.

Are there use cases when you want to split only on \n but not on \r\n?

@jsternberg
Copy link
Contributor

I think it would be worth abstracting away the different types of line endings. At the very least, if you want both functionalities, have the more commonly used function (str::lines) abstract away the line-endings, and then have another function that only splits on newlines. str::lines should split on any line feed character. If you want to split only on newlines, you would likely just use a "split" function and put the '\n' character as the character to split on.

Kobzol pushed a commit to Kobzol/rust that referenced this issue Dec 30, 2024
bors pushed a commit to rust-lang-ci/rust that referenced this issue Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants