Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional doesn't flag column as nullable, when other constraints are added to Field #1800

Open
2 of 3 tasks
antonioalegria opened this issue Sep 4, 2024 · 2 comments
Open
2 of 3 tasks
Labels
bug Something isn't working

Comments

@antonioalegria
Copy link

Describe the bug
A clear and concise description of what the bug is.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

from pandera.polars import Field # type: ignore
from pandera.polars import DataFrameModel # type: ignore

from typing import Optional

import polars as pl


class MyModel(DataFrameModel):
    a: Optional[str] = Field(description="some description", nullable=True)
    b: Optional[str] = Field(description="some description") # BOOM
    c: Optional[str] = Field(description="some description", str_contains=".", nullable=True)
    d: Optional[str] = Field(description="some description", str_contains=".") # BOOM

df = pl.DataFrame({"a": ["a", None], "b": ["b.com", None], "c": ["c.com", None], "d": ["d.com", None]})
MyModel.validate(df) # ==> pandera.errors.SchemaError: non-nullable column 'b' contains null values

Expected behavior

The dataframe should've been validated.

Desktop (please complete the following information):

OS: macOS 14.6.1
Python 3.12.4
polars-lts-cpu 1.6.0
pandera 0.20.3

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

@antonioalegria antonioalegria added the bug Something isn't working label Sep 4, 2024
@cosmicBboy
Copy link
Collaborator

Hi @antonioalegria, this is the intended behavior. Optional marks a column as not required to be in the dataframe (see docs). You still have to mark it as nullable=True specifically in the Field, these are two different behaviors.

@antonioalegria
Copy link
Author

I see. Then str | None should be equivalent to nullable=True, no? In any case, if Optional means the column can be missing, it would mean it would also be nullable, no?

I have a workaround that marks all my Optional columns as nullable as well, dynamically but am wondering if there is a more natural (i.e. least unexpected) behavior.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants