Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of Windows (and legacy Mac) line endings #47

Merged
merged 5 commits into from
Sep 2, 2020

Conversation

MarkusPiotrowski
Copy link
Contributor

@MarkusPiotrowski MarkusPiotrowski commented Sep 1, 2020

This PR addresses issue #33.

Files with Windows line endings (CRLF, \r\n) raise D004 (Literal carriage return) errors (which is OK), but also D002 (trailing whitespace) and they can also rise D001 (line too long).

With this PR

  • Windows and legacy Mac line endings (CRLF or CR, respectively) will still be flagged as D004 (Literal carriage return)
  • but they will not be flagged as trailing whitespace (D002)
  • they will not raise a "line too long" error (D001)
  • and they will not raise a "No newline at end of file" (D005)

This allows users that have Windows line endings to turn off detection of D004 and use doc8 to check their text files.

Rationale:
The standard of a git installation on Windows is that line endings in text files are converted to CRLF in the local copy. There are ways, local and repository wide, to prevent that the committed files will contain CRLF (they are back-converted to LF during the commit). Unfortunately, style checkers like doc8 check the local files, and these still contain the CRLF. This is also true when these style checkers are used within a pre-commit hook or with the software pre-commit.
Therefore, it is desirable for Windows users to have a way to check their files without manually converting their line endings. The proposed PR does this (Windows users can turn off the detection of D004) but still respects the wish of the doc8 developers to flag non-Linux line endings in the standard settings.

This PR has manually been tested with an .rst file which was converted to different line endings (Linux/Window/old Mac). In case of Windows and legacy Mac line endings, each line was flagged with D004 but not other error was raised. Using --ignore D004 let them pass without any error.

peterjc and others added 4 commits August 28, 2020 13:41
See PEP 278: Universal Newline Support, but
in particular want to support Linux/macOS
with LF only, and Windows with CR LF.
Copy link
Contributor

@stephenfin stephenfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit but this otherwise looks sane to me. If my comment is correct, a follow-up change to remove the __init__ would be appreciated 😄

doc8/checks.py Outdated
if "\r" in line:
yield ("D004", "Found literal carriage return")
def __init__(self, cfg):
super(CheckCarriageReturn, self).__init__(cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This __init__ is unnecessary given that it's simply calling the superclass implementation, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. I just followed the pattern of CheckNewlineEndOfFile which has the same __init__. I'll check this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd removed the __init__ as suggested.

self._raw_content = fh.read()
self._lines = self._raw_content.splitlines(True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did "++" mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that it was a good improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'm familiar with +1 in that context, this was new to me.

Copy link
Contributor

@peterjc peterjc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd asked earlier why change to subclass CheckCarriageReturn(ContentCheck): and understand that this is essential to get access to the lines including the trailing assorted new line characters.

Copy link
Member

@ssbarnea ssbarnea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not process files as binary and rely on python to deal with newlines. Anything we do at low level would only create a maintenance burden.

@@ -80,17 +80,20 @@ def _read(self):
with self._read_lock:
if not self._has_read:
with open(self.filename, "rb") as fh:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are opening text files as binary? Opening as text should allow us to use universal newlines support in python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure Markus will correct me if I am wrong, but wouldn't that disable D004 (Literal carriage return)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise you will lose the original line endings (they are converted to \n) and D004 becomes obsolete. Would be fine for those who have filed the issue, but probably not for you?

Copy link
Member

@ssbarnea ssbarnea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to support various newlines, we clearly need a test that assures no regression happens.

@peterjc
Copy link
Contributor

peterjc commented Sep 2, 2020

@ssbarnea could you expand on what further testing you had in mind (on top of the new tests already included)?

@MarkusPiotrowski
Copy link
Contributor Author

If we are to support various newlines, we clearly need a test that assures no regression happens.

@ssbarnea Maybe you have missed it, but I included already some tests (added cases to existing tests).

@ssbarnea ssbarnea dismissed their stale review September 2, 2020 14:52

I missed them, sorry.

@ssbarnea ssbarnea added the bug This issue/PR relates to a bug. label Sep 2, 2020
@ssbarnea ssbarnea merged commit 1fc3c21 into PyCQA:master Sep 2, 2020
@stephenfin
Copy link
Contributor

🎉

@MarkusPiotrowski MarkusPiotrowski deleted the dos_newlines branch September 3, 2020 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue/PR relates to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants