-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of Windows (and legacy Mac) line endings #47
Conversation
See PEP 278: Universal Newline Support, but in particular want to support Linux/macOS with LF only, and Windows with CR LF.
Streamline slightly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit but this otherwise looks sane to me. If my comment is correct, a follow-up change to remove the __init__
would be appreciated 😄
doc8/checks.py
Outdated
if "\r" in line: | ||
yield ("D004", "Found literal carriage return") | ||
def __init__(self, cfg): | ||
super(CheckCarriageReturn, self).__init__(cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This __init__
is unnecessary given that it's simply calling the superclass implementation, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. I just followed the pattern of CheckNewlineEndOfFile
which has the same __init__
. I'll check this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd removed the __init__
as suggested.
self._raw_content = fh.read() | ||
self._lines = self._raw_content.splitlines(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did "++" mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just that it was a good improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm familiar with +1 in that context, this was new to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd asked earlier why change to subclass CheckCarriageReturn(ContentCheck):
and understand that this is essential to get access to the lines including the trailing assorted new line characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not process files as binary and rely on python to deal with newlines. Anything we do at low level would only create a maintenance burden.
@@ -80,17 +80,20 @@ def _read(self): | |||
with self._read_lock: | |||
if not self._has_read: | |||
with open(self.filename, "rb") as fh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are opening text files as binary? Opening as text should allow us to use universal newlines support in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure Markus will correct me if I am wrong, but wouldn't that disable D004 (Literal carriage return)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, otherwise you will lose the original line endings (they are converted to \n
) and D004 becomes obsolete. Would be fine for those who have filed the issue, but probably not for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are to support various newlines, we clearly need a test that assures no regression happens.
@ssbarnea could you expand on what further testing you had in mind (on top of the new tests already included)? |
@ssbarnea Maybe you have missed it, but I included already some tests (added cases to existing tests). |
🎉 |
This PR addresses issue #33.
Files with Windows line endings (CRLF, \r\n) raise
D004
(Literal carriage return) errors (which is OK), but alsoD002
(trailing whitespace) and they can also riseD001
(line too long).With this PR
This allows users that have Windows line endings to turn off detection of
D004
and usedoc8
to check their text files.Rationale:
The standard of a
git
installation on Windows is that line endings in text files are converted to CRLF in the local copy. There are ways, local and repository wide, to prevent that the committed files will contain CRLF (they are back-converted to LF during the commit). Unfortunately, style checkers likedoc8
check the local files, and these still contain the CRLF. This is also true when these style checkers are used within a pre-commit hook or with the softwarepre-commit
.Therefore, it is desirable for Windows users to have a way to check their files without manually converting their line endings. The proposed PR does this (Windows users can turn off the detection of
D004
) but still respects the wish of thedoc8
developers to flag non-Linux line endings in the standard settings.This PR has manually been tested with an
.rst
file which was converted to different line endings (Linux/Window/old Mac). In case of Windows and legacy Mac line endings, each line was flagged with D004 but not other error was raised. Using--ignore D004
let them pass without any error.