Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ruff to ci setup #82

Merged
merged 37 commits into from
Feb 8, 2024
Merged

Add ruff to ci setup #82

merged 37 commits into from
Feb 8, 2024

Conversation

adonath
Copy link
Contributor

@adonath adonath commented Jan 26, 2024

As proposed in #73 I'm adding a CI check that runs ruff as a linter. It currently runs as an "allowed failure". I can adapt the configuration as needed and optionally add ruff as a pre-commit hook. But I think so far array-api-compat does not pre-commit hooks, so I opted for the CI.

@rgommers
Copy link
Member

Thanks for working on this @adonath!

I did a quick browse of the results, and it looks like a few error codes probably should simply be ignored

  • E402 Module level import not at top of file: that can't really be avoided probably, when all dependencies / wrapped libraries are optional
  • F401 numpy.abs imported but unused that's par for the course in __init__.py files when you're curating a public API to expose.

With those two ignored, hopefully it'll be a useful setup to catch other issues that may otherwise slip in.

But I think so far army-api-compat does not pre-commit hooks, so I opted for the CI.

Yes, that sounds like the right choice to me. CI is nice, and a lot less invasive than pre-commit.

@vnmabus
Copy link

vnmabus commented Jan 26, 2024

* `F401 `numpy.abs` imported but unused` that's par for the course in `__init__.py` files when you're curating a public API to expose.

I think the convention in that case is to do explicit re-exports:

from numpy import abs as abs

@rgommers
Copy link
Member

Oh, good point - if that makes the checker happy, then sure seems good to fix up that way.

@adonath
Copy link
Contributor Author

adonath commented Jan 26, 2024

Thanks @rgommers and @vnmabus for the comments! @rgommers do you have any preference whether to add the fixes to this PR or an independent one?

@rgommers
Copy link
Member

Probably this PR, since it's needed to get CI green here? It's still not that much code, so fine to review in one go.

@asmeurer
Copy link
Member

I don't think I've ever seen the import abs as abs pattern before. If I saw that I would just think there was a typo or someone made a mistake. Let's not do that.

The correct way to be explicit about exports is to define __all__. I don't remember if there was a good reason I didn't define __all__ in __init__.py. I probably just didn't because it would be a pain and unnecessary given the code that is there. But it should be doable. We already do this in _aliases.py and linalg.py, and you'll note ruff doesn't give any such warnings for those files.

@asmeurer
Copy link
Member

And yes, make whatever improvements you need to make in this same PR. The point of this is also to make sure ruff isn't warning about any false positives, and there's no way to tell that unless we see the changes.

@asmeurer
Copy link
Member

The import * error codes also need to be ignored.

@vnmabus
Copy link

vnmabus commented Jan 26, 2024

I don't think I've ever seen the import abs as abs pattern before. If I saw that I would just think there was a typo or someone made a mistake. Let's not do that.

The correct way to be explicit about exports is to define __all__. I don't remember if there was a good reason I didn't define __all__ in __init__.py. I probably just didn't because it would be a pain and unnecessary given the code that is there. But it should be doable. We already do this in _aliases.py and linalg.py, and you'll note ruff doesn't give any such warnings for those files.

Some references:
https://mypy.readthedocs.io/en/stable/command_line.html#cmdoption-mypy-no-implicit-reexport
astral-sh/ruff#717

Also in the typing PEP, considering stub files in this case:
https://peps.python.org/pep-0484/#stub-files

@asmeurer
Copy link
Member

asmeurer commented Jan 26, 2024

import x as x doesn't actually do anything. It's just a weird convention someone came up with to silence linters, and in my opinion a quite bad one as it looks distinctly like a mistake. On the other hand, __all__ actually does something, so we should be using it anyways. And like I said, we are already doing this in other files (notice how you don't get any of these errors in _aliases.py).

This is actually a good example of an issue I have with a lot of linters, especially "code quality" linters. In a sane world, a linter would flag import x as x as a probable typo. But instead, they are encouraging it.

@lucascolley
Copy link
Member

In a sane world, a linter would flag import x as x as a probable typo

We can do that here if you want, ruff has it. https://docs.astral.sh/ruff/rules/useless-import-alias/#useless-import-alias-plc0414

@vnmabus
Copy link

vnmabus commented Jan 26, 2024

import x as x doesn't actually do anything. It's just a weird convention someone came up with to silence linters, and in my opinion a quite bad one as it looks distinctly like a mistake. On the other hand, __all__ actually does something, so we should be using it anyways. And like I said, we are already doing this in other files (notice how you don't get any of these errors in _aliases.py).

The "problem" with __all__ is that it is a separate piece of information that we need to update. It adds repetition in a different part of the file and risks that they get desynchronized. Thus, some library authors prefer not to use it. The import foo as foo pattern does not have this problem and, although the first time you see it looks admittedly a bit weird, you quickly learn to recognize it.

That said, it __all__ or implicit re-exports (disabling the error code) work for you, then lets go with that.

return res

@pytest.mark.parametrize("library", ["cupy", "numpy", "torch"])
def test_isdtype_spec_dtypes(library):
xp = import_('array_api_compat.' + library)
xp = pytest.importorskip('array_api_compat.' + library)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this? This is not equivalent to the import_ helper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the same elsewhere. The only tests that should be skipped are cupy tests, because cupy cannot be installed everywhere (and in particular, not on CI).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. I looked at the dependency definitions and could not find torch as required or extra. So I assumed it was optional and test can be skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default those tests failed for me, because torch was not installed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All runtime dependencies are optional. But for the tests, we want to require all optional dependencies so that we make sure the tests are actually run. The exception is cupy, which simply cannot be installed everywhere.

I'm not sure if we can express this in a better way in the package metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could add another entry in the extras_require dict, like {"test-no-cupy": ["numpy", "torch"], "test": ["numpy", "torch", "cupy"]}. But I would suggest to not do this in this PR. If the package grows further, it is probably worth using tox or similar to handle the different environments.

@@ -61,12 +59,12 @@ def isdtype_(dtype_, kind):
res = dtype_categories[kind](dtype_)
else:
res = dtype_ == kind
assert type(res) is bool
assert isinstance(res, bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be changed. This is a test. We want the type to be exactly bool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This is highlighted as an error in ruff, but in this case it is the desired behavior. I will add an ignore comment instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# Note: torch.linalg.cross does not default to axis=-1 (it defaults to the
# first axis with size 3), see https://github.com/pytorch/pytorch/issues/58743
def cross(x1: array, x2: array, /, *, axis: int = -1) -> array:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you move these to this file? These were in linalg.py because they're linalg only functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand. It seemed conceptually simpler in the end to just use linalg.py the same way as __init__.py. So it only contains the explicit imports and the __all__ declarations. This avoids the need for del statements and if something is missing from __all__ it errors as an unused import during the style check. I hope this makes sense. Otherwise I could maybe just introduce _aliases_linalg.py or similar.

exclude : callable, optional
A callable that takes a name and returns True if the name should be
excluded from the list of members.
extend_all : bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is monkeypatching torch.__all__ etc.? We don't want to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this just keeps the current behavior. Take a look at https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/torch/__init__.py#L3

I have not checked whether this is still necessary, but probably we have to keep it this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with the following code:

import torch

torch_all = set(torch.__all__)
public = set([name for name in dir(torch) if not name.startswith("_")])

print(torch_all.difference(public))
print(public.difference(torch_all))

And this gives:

set()
{'complex64', 'eig', 'special', ... , 'QInt8Storage', 'segment_reduce', 'ComplexDoubleStorage'}

So indeed __all__ does not contain multiple members and most importantly it does not contain the dtypes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this just keeps the current behavior. Take a look at main/array_api_compat/torch/init.py#L3

That code does not modify the torch.__all__ list:

>>> import torch
>>> torch_all = list(torch.__all__)
>>> import array_api_compat.torch
>>> torch_all2 = list(torch.__all__)
>>> torch_all == torch_all2
True

Generally speaking, this package should not monkeypatch the underlying libraries.

So indeed all does not contain multiple members and most importantly it does not contain the dtypes.

Yes, that's a known issue. pytorch/pytorch#91908

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I understand. I did not mean to actually modify torch.__all__ in place but copy and extend instead. I'll fix that behavior.

@@ -19,4 +19,18 @@
"""
__version__ = '1.4.1'

from .common import *
from .common import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ruff help keep these two files in sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part I was not happy with either, because the import path is two-level: functions are imported from array_api_compat/common._helpers.py into array_api_compat.common from their they are imported into array_api_compat. So if anything gets added to _helpers.py it has to be updated in two places. Maybe one keeps the * import here and ignores...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is not important that users can do both from array_api_compat import get_namespace and from array_api_compat.common import get_namespace. So my proposal would be to decide on one instead? However this would be a backwards incompatible change...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could just keep the * import and ignore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ruff can't automatically keep __all__ synced between a module and submodule with explicit imports then using import * and just setting __all__ = common.__all__ is the next best thing.

This would be a useful feature for some linter. We do this sort of thing in SymPy, for instance (e.g., at https://github.com/sympy/sympy/blob/master/sympy/core/__init__.py and https://github.com/sympy/sympy/blob/master/sympy/__init__.py), and we have to manually keep them in sync.

OTOH, having common as a submodule that people can import from isn't that important, unlike the torch, numpy, etc. which are important since those are the actual array API namespaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ruff can't automatically keep all synced between a module and submodule with explicit imports then using import * and just setting all = common.all is the next best thing.

I don't think any linter would do this, because developers can choose to only re-expose a subset. The correct behavior would depend on their intention.

I'll change to the * import now to keep backwards compatibility.

@asmeurer
Copy link
Member

Re-exports are the whole point of this package. Anyway, the duplication of __all__ isn't a big deal especially if we do have a linter to help us keep things consistent.

@adonath
Copy link
Contributor Author

adonath commented Jan 26, 2024

Thanks everyone for the comments. I ended up doing more work than initially anticipated, but it's all related. Let me quickly summarize the related changes:

  • For now I decided to use the __all__ pattern in the public name spaces, such as numpy/__init__.py , numpy/inalg.py etc.
  • I thus removed __all__ definitions in non-public namespaces such as _aliases.py and _linalg.py. They were effectively un-used and conceptional it is probably weak to define public members in a non-public module.
  • I decided to introduce a _get_all_public_members function to get all public members of an imported module in a uniform way. It is either taken from __all__ if present or created from dir(module) with excludes.
  • I decided to move the definitions of the linalg aliases to _aliases.py as well. This has the advantage of just defining the public name space in linalg.py. No need for any del statements or similar. And it works the same as in __init__.py, which also import from .aliases.py
  • The imports and definition of __all__ in the public modules now follows the same scheme everywhere:
    • * import of the main namespace. Such as form torch import * (there we can't avoid the * import)
    • Get all public member of the module as string using _get_all_public_members
    • Explicit imports from ..common._helpers.py
    • Explicit imports from ._aliases.py
    • Explicit imports from .linalg.py
    • The __all__ is build after the same import groups, such that maintenance is a bit easier.

I don't really have a preference of __all__ vs the rename during import. I think any reasonable editor (at least vscode does) highlights inconsistencies between __all__ and the imports. Names missing in __all__ will be shown as unused import, while names declared in __all__ which are not defined in the module are also highlighted.

@asmeurer
Copy link
Member

I thus removed all definitions in non-public namespaces such as _aliases.py and _linalg.py. They were effectively un-used and conceptional it is probably weak to define public members in a non-public module.

For me it's a little easier to remember to update __all__ when it's in the same file. But I guess more ideally would be if we could somehow check this automatically with a linter.
I don't know if that's possible.

Explicit imports from ..common._helpers.py

As I noted in a line comment, it would be great if a linter can help us keep them consistent.

I think any reasonable editor (at least vscode does) highlights inconsistencies between all and the imports.

These editors are using a linter under the hood. If ruff doesn't do this, we should use a linter that does.

@lucascolley
Copy link
Member

Names missing in all will be shown as unused import, while names declared in all which are not defined in the module are also highlighted.
...
If ruff doesn't do this, we should use a linter that does.

ruff does this, indeed the initial CI log caught some of these, as discussed at the start of this thread. (unless you were referring to something else?)

@adonath
Copy link
Contributor Author

adonath commented Jan 26, 2024

For me it's a little easier to remember to update all when it's in the same file.

Yes, but you have to change the import into the public namespace anyway. And if you forget to update __all__ in this file, ruff errors with an unused import. I hope this makes sense.

These editors are using a linter under the hood. If ruff doesn't do this, we should use a linter that does.

In vscode I use pylance, which shows if undefined names are listed in __all__. Ruff by default does not seem to do this, but I can check whether there is a config option.

@adonath
Copy link
Contributor Author

adonath commented Jan 26, 2024

Concerning the type annotations and checking I think there are still two open issues:

  • The Array and Device annotations are undefined in common/_helpers.py. Here it is not 100% clear what to do. Either defined a union of valid arrays, or re-define the type annotations in the get_xp() for the specific array library.
  • There is the np._CopyMode annotation in common/_aliases.py which requires numpy for type checking. But I guess this might be fine...

@lucascolley
Copy link
Member

Ruff by default does not seem to do this, but I can check whether there is a config option.

https://docs.astral.sh/ruff/rules/undefined-export/

@adonath
Copy link
Contributor Author

adonath commented Jan 29, 2024

Thanks @lucascolley! I enabled the corresponding option in ruff. For now I only added it as a command line option in the CI, long term it might be better to have ruff config file.

@lucascolley
Copy link
Member

Above it sounded like Aaron wanted PLC0414 too.

@adonath
Copy link
Contributor Author

adonath commented Jan 30, 2024

There is a test fail in the CI, but it seems unrelated to this PR https://github.com/data-apis/array-api-compat/actions/runs/7702062263/job/20989506932?pr=82#step:6:5798

@asmeurer
Copy link
Member

So just to be clear, is ruff now checking for __all__ consistency across files?

I think probably my number 1 development mistake when working on this library is forgetting to add a new wrapper function to __all__ (it's especially confusing when the function is already exported from the base library). Maybe we should just write some code that automatically exports everything in the namespace that doesn't start with an underscore, but I do like the explicitness that avoids accidentally exporting something you don't want.

I don't know if there's some way to tell a linter, "warn me if I have a name in this file that isn't exported". Most linters do that for __init__.py but I don't know if there's a way to force it to do that for other files as well (and even better if it can read __all__ from __init__.py but check the names from some other file). Maybe we can just throw together our own custom linter for this. It probably isn't hard, especially if we just import the package and compare dir(submodule) and __all__. I'll admit the use-case of this library is a little niche. Most packages aren't in the business of re-exporting a namespace.

@asmeurer
Copy link
Member

You can ignore the test failures. Looks like the numpy 1.21 failure needs to be added to this list

# NumPy 1.21 doesn't support NPY_PROMOTION_STATE=weak, so many tests fail with
. I don't know where the torch failure is coming from, but since it's in an operator, it should be xfailed too (maybe we should just be skiping all operator tests since we can't do anything about them anyway)

@adonath
Copy link
Contributor Author

adonath commented Feb 1, 2024

I think probably my number 1 development mistake when working on this library is forgetting to add a new wrapper function to all (it's especially confusing when the function is already exported from the base library). Maybe we should just write some code that automatically exports everything in the namespace that doesn't start with an underscore, but I do like the explicitness that avoids accidentally exporting something you don't want.

I see you concern here. The "source of truths" is now the __all__ list and not what is implement in for example _aliases.py. However as a first time contributor to this library I find the explicit declaration easier to grasp.

@adonath
Copy link
Contributor Author

adonath commented Feb 7, 2024

The idea is that _aliases.py would also have all defined, just as it is in main. Basically what I want is a linter that makes sure that any init.py with all always includes everything in that all that is also in the all of each package it imports from. The programmatic way to do this is to only define all at the bottom levels and use all += submodule.all in the init.py files, but the downside of that is that you can't easily see what names are exported just by looking at init.py (which is already sort of true anyway in this package because of the re-exports, so maybe it isn't a big deal).

I'm not aware of any linter that would do this. Playing a bit around I think this is roughly what you aim for:

import importlib
from array_api_compat._internal import _get_all_public_members

EXCLUDE = {
    "numpy": {"get_xp", "annotations", "partial", "np"},
    "torch": {"get_xp", "wraps", "annotations", "builtin_all", "builtin_any", "vecdot_linalg"},
}


def check_aliases_consistency(library):
    """Check that the public API of the library is consistent with the aliases."""

    submodule = importlib.import_module(f"array_api_compat.{library}")

    all_public = set(submodule.__all__)

    if hasattr(submodule, "linalg"):
        all_public_linalg = set(submodule.linalg.__all__)
    else:
        all_public_linalg = set()

    aliases = set(_get_all_public_members(submodule._aliases))

    print(aliases.difference(all_public | all_public_linalg | EXCLUDE[library]))

Which shows for example when calling check_aliases_consistency("torch"):

{'UniqueCountsResult', 'UniqueInverseResult', 'UniqueAllResult'}

So these are forgotten exports. However those should definitely show up in the tests later.

@asmeurer
Copy link
Member

asmeurer commented Feb 7, 2024

That looks like it would be a useful script. That already doesn't show up in the tests because the UniqueAllResult name is not part of the standard (the standard only requires the function to return a namedtuple, but the type of that namedtuple is not in the spec namespace). So indeed the UniqueAllResult should be either not exported at all or exported in both files.

In fact, I would be cautious in general about relying on the test suite to catch these mistakes. There's a dozen reasons the test suite might not catch something. Maybe a function is wrapped to fix some behavior that is missed by the test suite. Or (more likely), maybe the test for a function is xfailed for some other reason.

def import_(library):
if 'cupy' in library:

def import_or_skip_cupy(library):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I'm probably going to rename this back, because I need to add some additional skipping logic for jax as well at #84

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I think it would be good to keep the "or skip" in the name, because from looking at the places where it was used, it was not clear what it does.

@asmeurer asmeurer merged commit dab01be into data-apis:main Feb 8, 2024
@adonath
Copy link
Contributor Author

adonath commented Feb 8, 2024

Thanks @asmeurer, @vnmabus , @rgommers and @lucascolley for the comments and review!

asmeurer added a commit to asmeurer/array-api-compat that referenced this pull request Feb 23, 2024
asmeurer added a commit that referenced this pull request Feb 26, 2024
Revert __all__ related changes from #82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants