Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cython-lint configuration. #5439

Merged
merged 15 commits into from
Jul 5, 2023
Merged

Conversation

bdice
Copy link
Contributor

@bdice bdice commented May 24, 2023

This PR adds a configuration for cython-lint. It eliminates a lot of unused imports and unused variables from the Cython code.

@bdice bdice requested a review from a team as a code owner May 24, 2023 21:57
@github-actions github-actions bot added the Cython / Python Cython or Python issue label May 24, 2023
@@ -9,3 +9,7 @@ ignore-words-list = "inout,numer,startd,couldn,referr"
builtin = "clear"
# disable warnings about binary files and wrong encoding
quiet-level = 3

[tool.cython-lint]
max-line-length = 999
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to re-enable E501 with a reasonable line length limit (following our flake8/black conventions) in a follow-up PR. Wrapping lines takes a little more manual effort than I wanted to put in for this first pass.

@@ -347,7 +345,7 @@ class BaseRandomForestModel(Base):
fil_sparse_format, threshold=0.5,
output_class=False,
predict_proba=False) -> CumlArray:
_, n_rows, n_cols, dtype = \
_, _, _, _ = \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this call to input_to_cuml_array doing anything? Its results are not used but it might be doing some error checking / validation?

Copy link
Contributor

@csadorf csadorf May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will set the classe's input_dtype. Getting rid of unexpected side-effects like that is one of my refactoring goals. Raelized it's not involving the class, might not do anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doing nothing. The CumlArray conversion happens in the FIL call. I suspect this is an artifact of a time before FIL was used for inference here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wphicks Can you push changes or file an issue if you'd like to see a change here? I don't have enough familiarity with this code, unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit, removing the affected line.

Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyproject.toml Show resolved Hide resolved
@bdice
Copy link
Contributor Author

bdice commented May 24, 2023

Style checks are failing because of incorrect dependency versions. I opened #5440 to resolve it.

@bdice bdice self-assigned this Jun 2, 2023
@bdice bdice added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Jun 2, 2023
@bdice
Copy link
Contributor Author

bdice commented Jun 2, 2023

@csadorf I resolved merge conflicts on this PR. Can you re-approve it now that you have Python code owner privileges?

@csadorf
Copy link
Contributor

csadorf commented Jun 7, 2023

/merge

Comment on lines -58 to -59
from cuml.manifold.simpl_set import fuzzy_simplicial_set, \
simplicial_set_embedding
Copy link
Contributor Author

@bdice bdice Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the unused import of simplicial_set_embedding broke test_simpl_set.py. Is this meant to be a public API exposed from cuml.manifold.umap? Do we need to worry about other unused imports removed in this PR that are meant to be exposed as public APIs? If so, the proper mechanism is to define __all__ = ["simplicial_set_embedding", ...].

cc: @wphicks @dantegd @csadorf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have to perform a more thorough check on which symbols should be included in the public API (and either excluded from linting or explicitly added to __all__). I will take care of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @csadorf!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to merge this PR sooner, we could disable the linter checks for unused Python imports (unused Cython cimports should all be safe to remove).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please go ahead with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed an issue about this: MarcoGorelli/cython-lint#79

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cython-lint also doesn't detect usage in __all__, so that isn't a viable solution here. I filed a second issue: MarcoGorelli/cython-lint#80

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I added these imports back with # no-cython-lint in d773305. That should resolve the problem. I reviewed the changes in this PR for other possible public API breaks but I didn't see any candidates except for #5439 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment that explains the need for the markers there.

@csadorf csadorf assigned csadorf and unassigned bdice Jun 28, 2023
@@ -26,10 +25,8 @@ from cuml.internals.input_utils import input_to_cupy_array
from cuml.explainer.base import SHAPBase
from cuml.explainer.common import get_cai_ptr
from cuml.explainer.common import model_func_call
from cuml.explainer.common import output_list_shap_values
Copy link
Contributor Author

@bdice bdice Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a public API? If so we need to add it back with # no-cython-lint since it is an unused import. Otherwise no action is needed.

This is the only questionable case I saw, all the other unused imports are clearly not meant to be public APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is safe to assume that it is not.

Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just reviewed this change set once more with special attention to the removed imports. That looks all good to me.

I've also pushed three commits with some changes, but once tests pass, this should be ready for merge.

Comment on lines -58 to -59
from cuml.manifold.simpl_set import fuzzy_simplicial_set, \
simplicial_set_embedding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment that explains the need for the markers there.

@@ -128,5 +128,5 @@ def find_ab_params(spread, min_dist):
yv = np.zeros(xv.shape)
yv[xv < min_dist] = 1.0
yv[xv >= min_dist] = np.exp(-(xv[xv >= min_dist] - min_dist) / spread)
params, covar = curve_fit(curve, xv, yv)
params, _covar = curve_fit(curve, xv, yv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
params, _covar = curve_fit(curve, xv, yv)
params, _ = curve_fit(curve, xv, yv)

Should do it, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and applied that change.

@@ -347,7 +345,7 @@ class BaseRandomForestModel(Base):
fil_sparse_format, threshold=0.5,
output_class=False,
predict_proba=False) -> CumlArray:
_, n_rows, n_cols, dtype = \
_, _, _, _ = \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit, removing the affected line.

@bdice
Copy link
Contributor Author

bdice commented Jun 30, 2023

@csadorf Thanks for seeing this through! Your final changes LGTM.

@csadorf
Copy link
Contributor

csadorf commented Jun 30, 2023

Note that the wheeltest failures are a known and unrelated issue. They should be resolved by Monday.

@rapids-bot rapids-bot bot merged commit e350b99 into rapidsai:branch-23.08 Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants