Skip to content

Conversation

@zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Function sha2 should raise PySparkValueError for invalid numBits

Why are the changes needed?

vanilla PySpark invokes the Scala side and raise an IllegalArgumentException

def sha2(e: Column, numBits: Int): Column = {
require(
Seq(0, 224, 256, 384, 512).contains(numBits),
s"numBits $numBits is not in the permitted values (0, 224, 256, 384, 512)")
Column.fn("sha2", e, lit(numBits))
}

while Python client won't do this check and raise an AnalysisException.

They should both raise a PySparkValueError for this case.

Does this PR introduce any user-facing change?

yes

In [1]: from pyspark.sql import functions as sf
   ...: spark.range(1).select(sf.sha2(sf.col("id"), 1024)).collect()
---------------------------------------------------------------------------
PySparkValueError                         Traceback (most recent call last)
<ipython-input-1-1ae9879dcc31> in ?()
      1 from pyspark.sql import functions as sf
----> 2 spark.range(1).select(sf.sha2(sf.col("id"), 1024)).collect()

~/Dev/spark/python/pyspark/sql/utils.py in ?(*args, **kwargs)
    190             from pyspark.sql.connect import functions
    191
    192             return getattr(functions, f.__name__)(*args, **kwargs)
    193         else:
--> 194             return f(*args, **kwargs)

~/Dev/spark/python/pyspark/sql/functions/builtin.py in ?(col, numBits)
   9112     |Bob  |cd9fb1e148ccd8442e5aa74904cc73bf6fb54d1d54d333bd596aa9bb4bb4e961|
   9113     +-----+----------------------------------------------------------------+
   9114     """
   9115     if numBits not in [0, 224, 256, 384, 512]:
-> 9116         raise PySparkValueError(
   9117             error_class="VALUE_NOT_ALLOWED",
   9118             message_parameters={
   9119                 "arg_name": "numBits",

PySparkValueError: [VALUE_NOT_ALLOWED] Value for `numBits` has to be amongst the following values: [0, 224, 256, 384, 512].

How was this patch tested?

added ut

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng
Copy link
Contributor Author

cc @HyukjinKwon

@zhengruifeng zhengruifeng deleted the py_connect_sha2_check branch December 29, 2023 07:15
@zhengruifeng
Copy link
Contributor Author

merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants