Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't pickle function with @check_types decorator #1846

Open
2 of 3 tasks
bmwilly opened this issue Nov 4, 2024 · 0 comments
Open
2 of 3 tasks

Can't pickle function with @check_types decorator #1846

bmwilly opened this issue Nov 4, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@bmwilly
Copy link

bmwilly commented Nov 4, 2024

Describe the bug

Functions cannot be pickled if they are decorated with @pa.check_types.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pickle
from pathlib import Path

import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series


pa.__version__


# Define a schema using pandera
class Schema(pa.DataFrameModel):
    column1: Series[int]
    column2: Series[float]
    column3: Series[str]


# Use the check_types decorator to validate input and output
@pa.check_types
def process_data_and_check_types(df: DataFrame[Schema]) -> DataFrame[Schema]:
    # Example processing: add a new column
    df["column4"] = df["column1"] + df["column2"]
    return df


# Create a sample DataFrame
data = {"column1": [1, 2, 3], "column2": [0.1, 0.2, 0.3], "column3": ["a", "b", "c"]}
df = pd.DataFrame(data)

# Process the DataFrame
processed_df = process_data_and_check_types(df)


# Same function without checking types
def process_data(df: DataFrame[Schema]) -> DataFrame[Schema]:
    # Example processing: add a new column
    df["column4"] = df["column1"] + df["column2"]
    return df


# Can only serialize function without decorator
path = Path("/tmp/tmp.pkl")
with path.open("wb") as f:
    pickle.dump(process_data, f)

# This will raise an error
with path.open("wb") as f:
    pickle.dump(process_data_and_check_types, f)

Expected behavior

Ability to pickle a function decorated with @pa.check_types.

Desktop (please complete the following information):

  • OS: macOS 15.1
  • Version: pandera 0.20.4

Screenshots

Error from above code:

{
	"name": "NotImplementedError",
	"message": "object proxy must define __reduce_ex__()",
	"stack": "---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[6], line 2
      1 with path.open(\"wb\") as f:
----> 2     pickle.dump(process_data_and_check_types, f)

NotImplementedError: object proxy must define __reduce_ex__()"
}

Additional context

This functionality is required to use a variety of ML frameworks (such as MLFlow and Metaflow) that pickle artifacts under the hood in order to log them.

@bmwilly bmwilly added the bug Something isn't working label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant