Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for docs for dropping Invalid Rows does not seem obvious #1852

Open
ttimbers opened this issue Nov 12, 2024 · 0 comments
Open

Example for docs for dropping Invalid Rows does not seem obvious #1852

ttimbers opened this issue Nov 12, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ttimbers
Copy link

I have tried the example in the docs here for dropping invalid rows using DataFrameSchema - which leads one to believe that two rows should be dropped (all those less than 3), but the output is the entire data frame. I see after some time, the issue is due to the quotes around the numbers in the construction of the data frame, which makes the column the wrong type (object, not int). I think that is not the intention here... Should those quotes be removed? There is a similar issue I think for SeriesSchema. The example given for dropping invalid rows with DataFrameModels doesn't have this issue and seems to work as obviously intended.

Problematic example

import pandas as pd
import pandera as pa

from pandera import Check, Column, DataFrameSchema

df = pd.DataFrame({"counter": ["1", "2", "3"]})
schema = DataFrameSchema(
    {"counter": Column(int, checks=[Check(lambda x: x >= 3)])},
    drop_invalid_rows=True,
)

schema.validate(df, lazy=True)

output:

  counter
0       1
1       2
2       3

Maybe change to:

import pandas as pd
import pandera as pa

from pandera import Check, Column, DataFrameSchema

df = pd.DataFrame({"counter": [1, 2, 3]})
schema = DataFrameSchema(
    {"counter": Column(int, checks=[Check(lambda x: x >= 3)])},
    drop_invalid_rows=True
)

schema.validate(df, lazy=True)

output:

   counter
2        3
@ttimbers ttimbers added the bug Something isn't working label Nov 12, 2024
@ttimbers ttimbers changed the title Dropping Invalid Rows does not seem to work with DataFrameSchema Example for docs for dropping Invalid Rows does not seem obvious Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant