Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: make countlines count final non-empty line even without EOL #25845

Merged
merged 1 commit into from
Feb 1, 2018

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Feb 1, 2018

This is a breaking change that makes countlines agree with the length of eachline and readlines when the last non-empty line does not end with EOL.

See the discussion on discourse.

@stevengj stevengj added breaking This change will break code io Involving the I/O subsystem: libuv, read, write, etc. triage This should be discussed on a triage call labels Feb 1, 2018
@stevengj stevengj changed the title countlines now counts final non-empty line even without EOL RFC: make countlines count final non-empty line even without EOL Feb 1, 2018
@StefanKarpinski
Copy link
Member

We might want to have some way of preserving the old behavior. One way would be to consider the default eol to be a regex like r"\r?\n|$" – even if we implement it more efficiently. That would make the old behavior equivalent to having eol = '\n' and so on. We could also have a partial = true keyword option to control whether a trailing partial line is counted or not.

@stevengj
Copy link
Member Author

stevengj commented Feb 1, 2018

When would you ever want countlines(io) to not equal length(readlines(io))?

@StefanKarpinski
Copy link
Member

Note that if we didn't allow specification of what constitutes end-of-line this would be straightforward since it seems pretty clear that countlines should agree with counting lines yielded by eachline. The eol argument complicates matter, however, since we have to decide whether eol = '.' counts a trailing sentence with no period, for example. There are some nice benefits of allowing a regex for eol, such as potentially supporting the old Mac newlines with a regex like r"\r?\n?|$" – that would count \r and \n by themselves as an end of line but count consecutive \r\n as a single line terminator. Now, I'm not sure we should actually make that a default, but at least supporting it is nice.

@stevengj
Copy link
Member Author

stevengj commented Feb 1, 2018

In the rare cases (if any) where someone needs the old behavior, can't they easily get this via:

function countnontrailing(io, eol)
    n = countlines(io)
    return n == 0 ? n : n - (read(skip(seekend(io),-1), UInt8) != eol)
end

That would probably be lot more efficient than using a regex.

@Keno
Copy link
Member

Keno commented Feb 1, 2018

I feel like we could at some point have a Char iterator over files and write the current behavior as:

count(equalto('\n'), eachbyte(file))

so 👍 on this change.

@StefanKarpinski
Copy link
Member

Triage is in favor of accepting this as-is (I agree).

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Feb 1, 2018
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Feb 1, 2018
@stevengj
Copy link
Member Author

stevengj commented Feb 1, 2018

Probably eachread(UInt8, io) would be a more useful file-iterator interface, so that you could read bytes, chars, whatever.

@JeffBezanson JeffBezanson merged commit 0ee7817 into JuliaLang:master Feb 1, 2018
@stevengj stevengj deleted the countlines branch February 2, 2018 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants