Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] [python-package] tell mypy 'auto' has a special meaning for feature_name and categorical_feature #5874

Merged
merged 5 commits into from
May 10, 2023

Conversation

jameslamb
Copy link
Collaborator

Contributes to #3756.
Contributes to #3867.

Fixes the following error from mypy.

python-package/lightgbm/basic.py:691: error: Incompatible types in assignment (expression has type "List[object]", variable has type "Union[List[str], List[int], str, None]")  [assignment]

For arguments feature_name and categorical_feature, the Python package supports passing 3 possible non-None values:

  • list of strings (column names)
  • list of ints (0-based positional indexes of the columns)
  • string "auto" (interpreted as "figure it out automatically" for pandas inputs, and as None otherwise)

Using the typing.Literal annotation, added in Python 3.8, mypy is able to understand that if these arguments are not None or a list, they must be literally "auto", not any arbitrary string (as the current type hint of str implies).

This helps it to understand that, for example...

if categorical_feature != "auto" and categorical_feature is not None:
    # categorical_feature MUST be a list at this point

Notes for Reviewers

For more details on the Literal type, see https://mypy.readthedocs.io/en/stable/literal_types.html#literal-types.

For an explanation of how string-literal type hints can be used to prevent runtime users, see https://mypy.readthedocs.io/en/stable/runtime_troubles.html.

@jameslamb jameslamb changed the title [ci] [python-package] tell mypy 'auto' has a special meaning for feature_name and categorical_feature WIP: [ci] [python-package] tell mypy 'auto' has a special meaning for feature_name and categorical_feature May 8, 2023
@jameslamb jameslamb marked this pull request as ready for review May 8, 2023 02:23
@jameslamb jameslamb changed the title WIP: [ci] [python-package] tell mypy 'auto' has a special meaning for feature_name and categorical_feature [ci] [python-package] tell mypy 'auto' has a special meaning for feature_name and categorical_feature May 8, 2023
@jameslamb jameslamb requested a review from guolinke May 8, 2023 02:44
Comment on lines 690 to 691
else: # use cat cols specified by user
categorical_feature = list(categorical_feature)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed? I think this is meant to create a copy and avoid errors derived from this value being changed outside of lightgbm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is meant to create a copy and avoid errors derived from this value being changed outside of lightgbm

Looking back through the blame, this line was introduced in #2121. I don't see any PR review discussion about it there or in the comment linked to from there... I assumed this list() was just thrown in to be extra-sure that from this point forward, the value was a list.

If we want protection against things outside of LightGBM having side effects on this value or vice versa (via the fact that a list is mutable and passed by reference), I think it'd be clearer to handle that with copy.deepcopy() explicitly, somewhere further upstream in the code.

Would you like me to try that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using list(collection) is a common way to ensure that:

  1. From this point forward the collection is a list.
  2. The variable now holds a copy of the original collection.

So if the input was already a list calling list() on it only creates a copy and if was some other type of iterable it achieves both.

Since the input is expected to be a list of strings deepcopy might be overkill and won't achieve 2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll put this back for now. I really think it's both unclear that the point is to create a copy and unnecessary for hte purpose "ensure that it's a list", given that by this point in the code we've already checked that categorical_feature is,

not None:

if categorical_feature is not None:

and not the string "auto"

if categorical_feature == 'auto': # use cat cols from DataFrame
categorical_feature = cat_cols_not_ordered

.. so the only other things that attribute permits are List[str] or List[int].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this back in 51ab03a and ceef1f6

It's necessary to add a # type: ignore comment if we're going to keep that code branch, to avoid this from mypy.

python-package/lightgbm/basic.py:694: error: Incompatible types in assignment (expression has type "List[object]", variable has type "Union[List[str], List[int], Literal['auto'], None]")  [assignment]

@jameslamb jameslamb requested a review from jmoralez May 8, 2023 16:55
@jameslamb jameslamb merged commit d8d6fae into master May 10, 2023
@jameslamb jameslamb deleted the ci/mypy-literals branch May 10, 2023 19:35
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants