You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tried the example in the docs here for dropping invalid rows using DataFrameSchema - which leads one to believe that two rows should be dropped (all those less than 3), but the output is the entire data frame. I see after some time, the issue is due to the quotes around the numbers in the construction of the data frame, which makes the column the wrong type (object, not int). I think that is not the intention here... Should those quotes be removed? There is a similar issue I think for SeriesSchema. The example given for dropping invalid rows with DataFrameModels doesn't have this issue and seems to work as obviously intended.
Problematic example
import pandas as pd
import pandera as pa
from pandera import Check, Column, DataFrameSchema
df = pd.DataFrame({"counter": ["1", "2", "3"]})
schema = DataFrameSchema(
{"counter": Column(int, checks=[Check(lambda x: x >= 3)])},
drop_invalid_rows=True,
)
schema.validate(df, lazy=True)
output:
counter
0 1
1 2
2 3
Maybe change to:
import pandas as pd
import pandera as pa
from pandera import Check, Column, DataFrameSchema
df = pd.DataFrame({"counter": [1, 2, 3]})
schema = DataFrameSchema(
{"counter": Column(int, checks=[Check(lambda x: x >= 3)])},
drop_invalid_rows=True
)
schema.validate(df, lazy=True)
output:
counter
2 3
The text was updated successfully, but these errors were encountered:
ttimbers
changed the title
Dropping Invalid Rows does not seem to work with DataFrameSchema
Example for docs for dropping Invalid Rows does not seem obvious
Nov 12, 2024
I have tried the example in the docs here for dropping invalid rows using
DataFrameSchema
- which leads one to believe that two rows should be dropped (all those less than 3), but the output is the entire data frame. I see after some time, the issue is due to the quotes around the numbers in the construction of the data frame, which makes the column the wrong type (object, not int). I think that is not the intention here... Should those quotes be removed? There is a similar issue I think forSeriesSchema
. The example given for dropping invalid rows withDataFrameModel
s doesn't have this issue and seems to work as obviously intended.Problematic example
output:
Maybe change to:
output:
The text was updated successfully, but these errors were encountered: