Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make patchcheck should check the whitespace of .c/.h files #53158

Open
brettcannon opened this issue Jun 6, 2010 · 29 comments
Open

make patchcheck should check the whitespace of .c/.h files #53158

brettcannon opened this issue Jun 6, 2010 · 29 comments
Labels
build The build process and cross-build type-feature A feature request or enhancement

Comments

@brettcannon
Copy link
Member

BPO 8912
Nosy @jcea, @abalkin, @pitrou, @vstinner, @ezio-melotti, @merwok, @davidmalcolm, @sandrotosi, @cjerdonek, @serhiy-storchaka
Files
  • refactor.diff: Simple refactoring to support upcoming improvements
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2010-06-06.03:24:37.866>
    labels = ['type-feature']
    title = '`make patchcheck` should check the whitespace of .c/.h files'
    updated_at = <Date 2020-02-03.21:38:44.046>
    user = 'https://github.com/brettcannon'

    bugs.python.org fields:

    activity = <Date 2020-02-03.21:38:44.046>
    actor = 'mark.dickinson'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2010-06-06.03:24:37.866>
    creator = 'brett.cannon'
    dependencies = []
    files = ['17741']
    hgrepos = []
    issue_num = 8912
    keywords = ['patch', 'needs review']
    message_count = 29.0
    messages = ['107178', '107754', '107762', '107817', '107824', '107831', '107832', '107844', '107849', '107850', '107853', '107854', '107856', '107857', '107861', '107862', '107864', '107865', '108362', '108367', '108369', '109269', '109296', '109302', '109312', '113847', '113848', '113851', '169585']
    nosy_count = 10.0
    nosy_names = ['jcea', 'belopolsky', 'pitrou', 'vstinner', 'ezio.melotti', 'eric.araujo', 'dmalcolm', 'sandro.tosi', 'chris.jerdonek', 'serhiy.storchaka']
    pr_nums = []
    priority = 'low'
    resolution = 'accepted'
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue8912'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

    @brettcannon
    Copy link
    Member Author

    This would be especially useful now that only spaces are used in .c/.h files and not tabs.

    @brettcannon brettcannon added the type-feature A feature request or enhancement label Jun 6, 2010
    @merwok
    Copy link
    Member

    merwok commented Jun 13, 2010

    Here’s a quick crack at it. Conservative approach: Don’t reindent, just complain.

    @merwok
    Copy link
    Member

    merwok commented Jun 13, 2010

    Updating patch to check header files too. Untested.

    @brettcannon
    Copy link
    Member Author

    Thanks for the patch, Éric. I will get to it when I can.

    @brettcannon brettcannon self-assigned this Jun 14, 2010
    @merwok
    Copy link
    Member

    merwok commented Jun 14, 2010

    I may add that although there is no formal test of any kind, I added tabs in two places in one .c file and in one place in another place, and it correctly reported two files. It’s a small function mostly copied from the one above; the only part where I used my brain is the counter (Python did the rest), so it should be ok.

    @abalkin
    Copy link
    Member

    abalkin commented Jun 14, 2010

    Éric,

    It will be helpful if your code could also check for lines longer than 79 characters.

    @abalkin
    Copy link
    Member

    abalkin commented Jun 14, 2010

    .. and trailing white space.

    @merwok
    Copy link
    Member

    merwok commented Jun 14, 2010

    Replacing VCS hooks, are we? :) Updated and attached.

    Ideas for other feature requests:

    • Printing the name of files with issues;
    • Fail-fast mode or full report.

    @mdickinson
    Copy link
    Member

    Nitpick: shouldn't that

    len(line) > 79
    

    be

    len(line) > 80
    

    ? Either that, or strip the line ending from the line before computing its length.

    @mdickinson
    Copy link
    Member

    Similarly, isn't

    line[-1].isspace()

    always going to be true (well, except possibly at the end of the file), thanks to the line-ending character?

    @merwok
    Copy link
    Member

    merwok commented Jun 15, 2010

    Forgot readling kept the line end, thanks for catching. Updated (yay for Mercurial Queues!)

    @merwok
    Copy link
    Member

    merwok commented Jun 15, 2010

    s/readling/readline/
    s/readline/iterating over the file/

    @mdickinson
    Copy link
    Member

    Sorry to be nitpicky, but:

    (a) the line[-2] produces "IndexError: string index out of range" on empty
    lines, and

    (b) this won't detect trailing whitespace on the last line of a file, if there's no newline at the end of the file. Actually, it might also be worth checking that there *is* a newline at the end of the file; this is actually required by the C standards. (E.g., C99 5.1.1.2p2: "A source file that is not empty shall end in a new-line character".)

    @mdickinson
    Copy link
    Member

    s/actually//g

    @merwok
    Copy link
    Member

    merwok commented Jun 15, 2010

    Thanks for the helpful reviews. I have fixed the trailing whitespace check with “line[-2:-1].isspace()”, but I have a bug with my file counting. Before I go further, I’d like feedback from people using patchcheck:

    1. Reindenting Python is a task best handled by reindent.py, but checking for tabs, line length and trailing whitespace is basically a reimplementation of grep and wc. Is it an explicit and important goal that patchcheck.py be independent of external tools?

    2. Do you like the current report format? (“checking for one thing: n files”) It requires you to use grep of your editor’s search after checking. What about printing out file names and line numbers?

    3. What about a function to strip trailing spaces and add a final newline where needed instead of just complaining? Is it okay to replace tabs with spaces in C too?

    @merwok
    Copy link
    Member

    merwok commented Jun 15, 2010

    Another one: I’d like to move the file name filtering from the worker functions into the main function. With that change, looping over all file names to get only the C files would be written and run only once, even if there are four functions that operate on these files. There is not backward compatibility guarantee for Tools/*, right?

    @mdickinson
    Copy link
    Member

    Answering as a rather infrequent user of 'make patchcheck', but someone who vows to use it more often in future... :)

    (1) Well, it would be awkward to use grep or wc on Windows, so it's convenient not to need external tools.

    (2) +1 to a more detailed report, at least giving the file name and what was wrong with it ("trailing whitespace in yourfile.c"); I could imagine that people might care about finding trailing whitespace or tabs, but not about line length, for example. (Many of the existing C files already have lines >= 80 characters.)

    (3) Don't really care: emacs already does both these things for me nicely. I'd rather just have patchcheck do the reporting. And automatic replacement of tabs with spaces in C files might not guess correctly what the author intended.

    @mdickinson
    Copy link
    Member

    and your proposed refactoring sounds fine to me. I can't really see any backwards compatibility concerns.

    @merwok
    Copy link
    Member

    merwok commented Jun 22, 2010

    1. Okay, so I’ll refactor some code in patchcheck into a grep-like function to reduce duplication and support adding new checks.

    2. I’ll add more useful report. Perhaps I’ll ask first on python-dev to see if I should add a command-line option or just improve it. Also, whether I should use logging or just output file names and lines.

    3. Well, if your editor strips trailing spaces and adds final newlines, you won’t ever see make patchcheck complain, will you? Since the report will be helpful only to people with less smart editors, I think I’ll go with replacing instead of just reporting. Tabs will not be replaced, though, just reported.

    4. See attached patch for py3k (no idea if this will go into 2.7 too).

    @mdickinson
    Copy link
    Member

    I think I’ll go with replacing instead of just reporting.

    How about a --report-only (insert better name here) command line option to patchcheck, then?

    My editor doesn't delete trailing whitespace *automatically*, but it's trivial to remove trailing whitespace from a file if you know it's there (M-x delete-trailing-whitespace, I believe); some people (like me) might prefer to make all changes within the editor rather than having an external tool do it for them. This is especially true if you've got the editor open for the relevant file at the time, since you can end up with two different conflicting versions of the file---one in the editor and one on disk; saving in the editor can undo the changes that patchcheck made on disk.

    @merwok
    Copy link
    Member

    merwok commented Jun 22, 2010

    Agreed. reindent.py has a dry-run mode too, so adding such a global flag and making it the default is okay with me.

    @brettcannon
    Copy link
    Member Author

    Committed in r82567. I also cleaned up the output slightly so that "N file(s)" pluralizes "file" properly based on N. Also now list what files have their whitespace fixed.

    @abalkin
    Copy link
    Member

    abalkin commented Jul 5, 2010

    Hmm, it looks like patchcheck does not fix trailing whitespace in c files:

    $ grep ' $' Modules/datetimemodule.c| cat -ve
        $
                       GET_TD_DAYS(offset1) * 86400 - $
                    Py_DECREF(offset);                $
    	    if ((offset1 != offset2) && $
     $
    $ make patchcheck

    Modules/Setup.dist is newer than Modules/Setup;
    check to make sure you have all the updates you
    need in your Modules/Setup file.
    Usually, copying Modules/Setup.dist to Modules/Setup will work.
    -----------------------------------------------
    ./python.exe ./Tools/scripts/patchcheck.py
    Getting the list of files that have been added/changed ... 1 file
    Fixing whitespace ... 0 files
    Docs modified ... NO
    Misc/ACKS updated ... NO
    Misc/NEWS updated ... NO

    Did you run the test suite?

    @abalkin abalkin reopened this Jul 5, 2010
    @merwok
    Copy link
    Member

    merwok commented Jul 5, 2010

    Sorry if I was unclear. Out of the four points I listed, this patch only did the refactoring (4). I have another patch to add and use a check_pep7 function, I’ll refresh it and post it.

    @brettcannon
    Copy link
    Member Author

    On Sun, Jul 4, 2010 at 22:40, Éric Araujo <report@bugs.python.org> wrote:

    Éric Araujo <merwok@netwok.org> added the comment:

    Sorry if I was unclear. Out of the four points I listed, this patch only did the refactoring (4). I have another patch to add and use a check_pep7 function, I’ll refresh it and post it.

    You were clear, I just prematurely closed it by accident.

    @merwok
    Copy link
    Member

    merwok commented Aug 13, 2010

    I wonder how to have sane code reuse between patchcheck and Mercurial hooks. Mark seems to have started seconding Dirkjan with hooks, see e.g.
    http://hg.python.org/hooks/rev/0344575ad60e, so I wonder if we’re going to use only hooks (pro: they will be invoked on every push, whereas patchcheck is opt-in, con: they would require that contributors clone the hooks repo and set them up) or still maintain patchcheck, which has pros (no setup, does not require a VCS) and cons (not automatic, not widely used, whereas hooks can be part of the new python-dev policy).

    @abalkin
    Copy link
    Member

    abalkin commented Aug 13, 2010

    Note directly related to this issue, but untabify.py fails on files that contain non-ascii characters. For example:

    $ ./python.exe Tools/scripts/untabify.py Modules/_heapqmodule.c
    Traceback (most recent call last):
        ...
        (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0xe7 in position 173: invalid continuation byte

    I am not sure what relevant C standard has to say about using non-ascii characters in comments, but the checking tool should not fail with a traceback in such situation.

    @abalkin
    Copy link
    Member

    abalkin commented Aug 13, 2010

    I opened bpo-9598 for the untabify bug. For the purposes of source checking, I believe non-ascii characters should be disallowed in python C source code. According to my understanding of C89 standard, interpretation of characters outside of the basic character set is implementation and locale dependent.

    @brettcannon brettcannon removed their assignment Jun 27, 2011
    @cjerdonek
    Copy link
    Member

    The following commit from today is related to this issue: 815b88454e3e

    "Remove trailing whitespace in order to silence warnings on HP-UX."

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @iritkatriel iritkatriel added the build The build process and cross-build label Oct 26, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    build The build process and cross-build type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants