[SPARK-54544][PYTHON] Enable flake8 F811 check#53253
[SPARK-54544][PYTHON] Enable flake8 F811 check#53253gaogaotiantian wants to merge 2 commits intoapache:masterfrom
Conversation
| :class:`~pyspark.sql.Column` | ||
| natural logarithm of the given value. | ||
|
|
||
| Examples |
There was a problem hiding this comment.
let's move the examples to the remaining log, to keep the doctest coverage
There was a problem hiding this comment.
The remaining log has its examples and I think the coverage is similar (if not more):
Examples
--------
Example 1: Specify both base number and the input value
>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
>>> df.select("*", sf.log(2.0, df.value)).show()
+-----+---------------+
|value|LOG(2.0, value)|
+-----+---------------+
| 1| 0.0|
| 2| 1.0|
| 4| 2.0|
+-----+---------------+
Example 2: Return NULL for invalid input values
>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (0), (-1), (NULL) AS t(value)")
>>> df.select("*", sf.log(3.0, df.value)).show()
+-----+------------------+
|value| LOG(3.0, value)|
+-----+------------------+
| 1| 0.0|
| 2|0.6309297535714...|
| 0| NULL|
| -1| NULL|
| NULL| NULL|
+-----+------------------+
Example 3: Specify only the input value (Natural logarithm)
>>> from pyspark.sql import functions as sf
>>> df = spark.sql("SELECT * FROM VALUES (1), (2), (4) AS t(value)")
>>> df.select("*", sf.log(df.value)).show()
+-----+------------------+
|value| ln(value)|
+-----+------------------+
| 1| 0.0|
| 2|0.6931471805599...|
| 4|1.3862943611198...|
+-----+------------------+
| messageParameters={"error_msg": error_msg}, | ||
| ) | ||
|
|
||
| def test_list_row_unequal_schema(self): |
There was a problem hiding this comment.
why removing this test?
There was a problem hiding this comment.
This one is a bit special. There's another test with the same name at line 1593. I believe they are doing very similar things. However, this test is failing (it never shows up because it was shielded by the other one). I would guess the other one is the new one that is supposed to replace this old one.
| expected[r][e] == result[r][e], f"{expected[r][e]} == {result[r][e]}" | ||
| ) | ||
|
|
||
| def test_createDataFrame_pandas_with_struct_type(self): |
There was a problem hiding this comment.
There is another one at line 986 that is exactly the same.
|
Merged to master. |
What changes were proposed in this pull request?
Enabled flake8 F811 check on our repo and fixed reported issues.
Why are the changes needed?
I know upgrading lint system is a pain, but we should not just put it aside forever. Our pinned
flake8version is not even usable on Python3.12+.During this "lint fix", I actually discovered a few real bugs - most of them are silently disabled unittests because there is a test method that has the same name (probably due to copy/paste). I think this result supported the idea that we should take lint more seriously.
About
functions.log, we got it wrong. It's not because@overloaddoes not work properly - it's because we have twologfunction in that gigantic file. The former one is dead. I just removed that one.Again, I really think we should upgrade our lint system. I'm trying to do it slowly - piece by piece, so that people's daily workflow is not impacted too much.
I hope we can eventually move to a place where all linters are updated and people can be more confident about their changes.
Does this PR introduce any user-facing change?
No
How was this patch tested?
flake8test on major directories. CI should give more a comprehensive result.Was this patch authored or co-authored using generative AI tooling?
No