Skip to content
This repository has been archived by the owner on Oct 2, 2024. It is now read-only.

add pylint to test suite #1860

Open
lucaudill opened this issue Mar 12, 2024 · 38 comments
Open

add pylint to test suite #1860

lucaudill opened this issue Mar 12, 2024 · 38 comments

Comments

@lucaudill
Copy link
Collaborator

lucaudill commented Mar 12, 2024

Since I started using VSCode, I've noticed that it's easy to add unused imports to our Python code without noticing. Adding a Python linter (e.g. pylint) to our test suite could help catch those instances and other potentially bad coding practices.

One thing worth noting is that pylint really doesn't like our code, e.g.

$ pylint --indent-string="   " lib/*py | wc -l
1656

Here our code is given a rating of 5.38/10 ☹️

pylint messages have a naming convention based on the following categories (seemingly in descending order of severity): Fatal, Error, Warning, Convention, Refactor, Information. The breakdown of the messages we get is as follows:

  • Fatal: 0
  • Error: 66
  • Warning: 131
  • Convention: 1333
  • Refactor: 111
  • Information: 0

So the vast majority of complaints pylint gives us fall under the "Convention" category. Some common messages that show up for us are:

This is just a sample, there are obviously many more. I think step 1 for whoever takes this on is to go through the list of messages and figure out what we want to ignore and what we don't.

@lucaudill
Copy link
Collaborator Author

lucaudill commented Mar 12, 2024

Here's a text file with the full output of

$ pylint --indent-string="   " lib/*py

pylint.txt

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

I propose we do the following.

General comments / notes

  • We should specify a version of Pylint we’re targeting. Debian Bullseye e.g. has 2.14 but upstream latest is 3.2.

  • Pylint may be able to replace 10_sanity.bats:'trailing whitespace' for Python code, but is the exclusion too messy?

  • The Pylint message docs are overall light on rationale and occasionally contain flat-out errors, e.g. C0209 consider-using-f-string claims that f-strings are faster, citing a tweet whose immediate replies present more detailed contrary analyses. (Contrast e.g. ShellCheck docs such as SC2053.)

  • We should do Python code everywhere, not just lib/, and including *.py.in.

  • We probably want a configuration, including py-version specifying 3.6 (the default is the version of Python doing the analysis).

Dispositions (what to do)

I went through the 1,641 messages in the text file above and ended up boiling it down to 75 groups with the same disposition. (Note: I would call these “warnings”, but Pylint calls them “messages”, with “warning” being a specific category of message.) These dispositions fell into the six classes below.

I’d like the initial PR for this to yield no messages. We’ll do that by fixing the easy stuff in that PR and deferring the rest.

Now

Each of these is described in an individual comment below. These should all be fixed in this issue’s PR.

  1. Ignore globally forever. I just disagree with Pylint, so let’s turn off the warning for the entire project. This configuration happens in the same PR that implements Pylint.

  2. Fix now. Real problems that should be fixed and are important/easy enough to do in the same PR that implements Pylint.

  3. Case-by-case ignore or fix now. Each of the individual messages needs to be examined and either ignored (with a Pylint exclusion comment) or fixed in the original Pylint PR.

Later

Each of these has its own issue, linked below.

  1. Enforce the opposite. I actively want the opposite of Pylint, and because Pylint is extensible, it can enforce that. I suspect that for many of these, we can use the existing inverted checker for reference.

  2. Fix later. Real problems where we should do something, but it’s hard enough that a new PR is needed.

  3. Unclear. I don’t know what to do.

I wonder if a good approach is to disable all the messages we get (I have a spreadsheet that can give a list), e.g. with --disable X001,X002,... or configuration, then one-by-one re-enable the messages below in this issue. After that the disabled list should be exactly what we decided to defer.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Class names (C0103 invalid-name)

  • done

class File_Metadata:

We just use a different naming convention. Pylint doesn’t have a pre-configured style name but it should be easy to configure via regex.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Log function names (C0103 invalid-name)

  • done

def DEBUG(msg, hint=None, **kwargs):
if (log_level >= Log_Level.DEBUG):
log(msg, hint, None, "38;5;6m", "", **kwargs) # dark cyan (same as 36m)

These functions just have special names. Probably just add ignore comments to each def.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Bad None comparison (C0121 singleton-comparison)

  • done

return self.find_commit(git_id)[1] != None

Should be is None.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Bad type check? (C0123 unidiomatic-typecheck)

  • done

def __eq__(self, other):
return (type(self) == type(other))

I think this is OK because this class has a weird notion of equality, but needs a little more though.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Bad range() (C0200 consider-using-enumerate)

  • done

charliecloud/lib/build.py

Lines 799 to 801 in d679373

args = list(self.tree.child_terminals("copy_list", "STRING_QUOTED"))
for i in range(len(args)):
args[i] = args[i][1:-1] # strip quotes

These can be easily refactored to not use indexes.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Non-idiomatic class argument (C0202 bad-classmethod-argument)

  • done

@classmethod
def git_prepare(class_, image_root, large_file_thresh,

We use class_ but it’s easy to just change to the idiomatic cls.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Bad __slots__ (C0205 single-string-used-for-slots)

  • done

These should all be fixed.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Unnecessary splitting (C0207 use-maxsplit-arg)

  • done

return self.__class__.__name__.split("_")[0].upper()

This appears to be a performance thing and we don’t use this in places where it matters. I think what we have is more readable, so let’s global ignore.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Modules too long (C0302 too-many-lines)

This probably right, but I don’t see this changing in the foreseeable future and I don’t think Pylint is the right place to enforce it. If the warnings are all solved, then the most likely way it would re-appear is that a small change pushes a module over the threshold, which is the wrong time to demand a refactor.

Global ignore.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Multiple statements per line (C0321 multiple-statements)

  • done

I don’t mind this and we use it a lot:

class Instruction_Ignored(Exception): pass

but:

charliecloud/lib/image.py

Lines 633 to 635 in d679373

for (i, (lh, (fp, members))) in enumerate(layers.items(), start=1):
if (i > max_i): break
members2 = list(members) # copy b/c we’ll alter members

or:

charliecloud/lib/image.py

Lines 839 to 842 in d679373

if (prefix is not None):
self.path = prefix.split("/") + self.path
if (self.port is None): self.port = 443
if (self.host == "registry-1.docker.io" and len(self.path) == 0):

seem confusion-prone. Let’s just stop using multiple statements per line.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Bad import position (C0413 wrong-import-position)

  • done

# List of dependency problems. This variable needs to be created before we
# import any other Charliecloud stuff to avoid #806.
depfails = [] # 👻
import filesystem as fs
import registry as rg
import version

We have a good reason for this, but probably better to disable on a case-by-case basis.

@reidpr
Copy link
Collaborator

reidpr commented Mar 20, 2024

Inappropriate dunder call (C2801 unnecessary-dunder-call)

  • done

return fnmatch.fnmatchcase(self.__fspath__(), pattern)

False positive; we’re implementing os.PathLike. Suppress case-by-case.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Bad except order (E701 bad-except-order)

  • done

charliecloud/lib/image.py

Lines 751 to 759 in d679373

except lark.exceptions.UnexpectedInput as x:
if (x.column == -1):
ch.FATAL("image ref syntax, at end: %s" % s, hint);
else:
ch.FATAL("image ref syntax, char %d: %s" % (x.column, s), hint)
except lark.exceptions.UnexpectedEOF as x:
# We get UnexpectedEOF because of Lark issue #237. This exception
# doesn’t have a column location.
ch.FATAL("image ref syntax, at end: %s" % s, hint)

This is a real bug that should be fixed.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Nonexistent instance member access in external classes (E1101 no-member)

  • done

This happens twice:

ch.FATAL("unexpected file type in image: %x: %s"
% (stat.IFMT(fm.mode), path))

Genuinely wrong and will crash the error message.

rpu = rg.requests.packages.urllib3

This one seems like if it’s not present it shouldn't work at all? Investigate.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Non-iterable iterated (E1133 not-an-iterable)

  • done

for class_ in self.escalators:

The specific error is a false positive, but it’s a property that need @abc.abstractmethod I think.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Bad % (E1305 too-many-format-args)

  • done

ch.FATAL("can’t copy metadata: %s -> %s" % (self, dst, x.strerror))

Yep, that's wrong.

This was referenced Mar 22, 2024
@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Merge isinstance() (R1701 consider-merging-isintance)

  • done

if (not (isinstance(inst, Directive_G)
or isinstance(inst, From__G)
or isinstance(inst, Instruction_No_Image))):

Should probably fix.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

FP complaints about return on ch.FATAL (R1710 inconsistent-return-statements)

  • done

try:
return d.split(":", maxsplit=1)[1]
except AttributeError:
FATAL("not a string: %s" % repr(d))
except IndexError:
FATAL("no algorithm tag: %s" % d)

This is a FP because ch.FATAL doesn’t return (it goes into an exception procedure that kills the program). Can we teach Pylint that it doesn’t return?

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Useless return complaints (R1711 useless-return)

  • done

def commit(self, path, *args):
self.permissions_fix(path)
return None

The return is indeed useless but in this particular case I like it there because related classes’ commit() does return something meaningful, which also might be None. Perhaps looks at these and ignore on a case-by-case basis?

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Built-in exit() re-defined (R1722 consider-using-sys-exit, W0622 redefined-builtin)

  • done

if (len(depfails) > 0):
exit(1)

def exit(code):
profile_stop()
profile_dump()
sys.exit(code)

This is the same problem. Rename our exit(), and see if we are calling sys.exit() anywhere we should be using our own exit().

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Use error-checking ch.open_() (R1732 consider-using-with)

  • done

fp = open(pid_path, "rt", encoding="UTF-8")

with isn’t a good match for Charliecloud because the key operations are wrapped so they can’t fail. That is, I don’t think we should take the advice, but the flagged code should probably be using fs.Path.open() instead and the warning should be left enabled.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Bad keyword argument defaults (W0102 dangerous-default-value)

  • done

def ch_run_modify(img, args, env, workdir="/", binds=[], ch_run_args=[],
fail_ok=False):

I agree. Fix.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Needless pass (W0107 unnecessary pass)

  • done

charliecloud/lib/build.py

Lines 449 to 452 in d679373

def execute(self):
"""Do what the instruction says. At this point, the unpack directory is
all ready to go. Thus, the method is cache-ignorant."""
pass

IMO clearer this way. Ignore globally.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

FP on self.__class__ (W0212 protected-access)

  • done

self.__class__._gzip_set()

Technically this a false positive — we’re accessing a private class (not instance) member — but going though self.__class__ seems weird. Why not just self?

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Superclass __init__() not called (W0231 super-init-not-called)

  • done

class Disabled_Cache(Rebuild_Cache):
def __init__(self, *args):
pass

This is on purpose b/c the inheritance is a bit strange. I’d like to leave it enabled and justify exceptions on a case-by-base basis.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Superclass argument renamed (W0237 arguments-renamed)

  • done

def add(self, x):

Personally I think value is a bad name but this is probably right, less confusing if we use the same name.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Method that only delegates back to parent (W0246 useless-parent-delegation)

  • done

charliecloud/lib/image.py

Lines 928 to 929 in d679373

def iter_subtrees_topdown(self, *args, **kwargs):
return super().iter_subtrees_topdown(*args, **kwargs)

Indeed this method seems to be doing nothing. Solution is probably delete it regardless, but I really wonder why.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Bad indentation (W0311 bad-indentation)

  • done

def commit_find_deleted(self, git_id):
deleted = self.git(["log", "--format=%h%n%B", "-n", "1",
"&%s" % git_id],fail_ok=True)

This whole method is indented too much, probably due to being moved at some point.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Needless semicolons (W0301 unnecessary-semicolon)

  • done

except ValueError:
ch.FATAL("state ID: malformed hex: %s" % text);

Somebody forgot they were writing Python instead of C. Fix.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Global variable introduced secretly (W0601 global-variable-undefined)

  • done

def init(cli):
# logging
global log_festoon, log_fp, log_level, trace_fatal, xattrs_save

This is where xattrs_save is defined. We should be able to see a complete list of globals at the top of the file.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Remove unused imports (W0611 unused-import)

  • done

import enum

I think unused imports might have been the original motivation for this entire issue.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Remove unused variables (W0612 unused-variable)

  • done

for i in range(timeout*2):

The idiom in this particular case is IIRC for _ in range(timeout*2). In general, I agree, unused variables seems bad.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Undefined loop variables? (W0631 unused-loop-variable)

  • done

self.tag = tag

Here it is defined. Should we simplify the control flow? I don’t know. Probably analyze on a case-by-case basis.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Catch-all *args after keyword args (W1113 keyword-arg-before-vararg)

  • done

def git(self, argv, cwd=None, quiet=True, *args, **kwargs):

This can cause duplicate argument exceptions. Fix.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

Non-string default in os.getenv() (W1508 invalid-envvar-default)

  • done

self.port = int(os.getenv("CH_REGY_DEFAULT_PORT", 443))

In this particular case it’s harmless because we immediately convert to int anyway, but probably an antipattern. Fix.

@reidpr
Copy link
Collaborator

reidpr commented Mar 22, 2024

subprocess.run() with implicit check (W1510 subprocess-run-check)

  • done

cp = subprocess.run(argv, **kwargs)

Sure, fix.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants