How does one define a function for a register_check_method? #1531

techelhit · 2024-03-17T22:23:05Z

techelhit
Mar 17, 2024

Hi,

Is it possible to pass a custom registered check and a custom error to the SchemaError/Dictionary containing the errors, for the pandera.pyspark implementation. Currently is return the check_name as "None" and the error as "Failed Validation None".

The current implementations makes it impossible to map registered custom to checks and thus the reason the test failed. Thereby effectively making the custom checks useless as you need to verify 1 by 1 (manually) which test actually failed.

Example:

`
import pandera.pyspark as pa
import pyspark.sql.types as T
from pandera.extensions import register_check_method

from decimal import Decimal
from pyspark.sql import SparkSession
from pyspark.sql import functions as f
from pyspark.sql import DataFrame
from pandera.pyspark import DataFrameModel

###Generate Dataset
data = [
(5, "Bread", Decimal(44.4), ["description of product"], {"product_category": "dairy"}),
(15, "Butter", Decimal(99.0), ["more details here"], {"product_category": "bakery"}),
]

spark_schema = T.StructType(
[
T.StructField("id", T.IntegerType(), False),
T.StructField("product", T.StringType(), False),
T.StructField("price", T.DecimalType(20, 5), False),
T.StructField("description", T.ArrayType(T.StringType(), False), False),
T.StructField(
"meta", T.MapType(T.StringType(), T.StringType(), False), False
),
],
)

df = spark.createDataFrame(data, spark_schema)

###Register Custom Check
@register_check_method(statistic = ['col'])
def new_pyspark3(pyspark_obj, *, col) -> bool:
return pyspark_obj.dataframe.select(col).count() > 4

class Schema(DataFrameModel):
"""Schema"""
product: T.StringType()
price: T.DecimalType(20,5) = pa.Field(new_pyspark3 = {"col" : 'price'})
`

###See Error in Image sdf_out = Schema.validate(df, lazy = False)

Below the error Dictionary:
`
sdf_out = Schema.validate(df, lazy = True)
sdf_out.pandera.errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does one define a function for a register_check_method? #1531

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How does one define a function for a register_check_method? #1531

techelhit Mar 17, 2024

Replies: 0 comments

techelhit
Mar 17, 2024