Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json infer_schema drops null-only object keys #11860

Closed
2 tasks done
cmdlineluser opened this issue Oct 19, 2023 · 0 comments · Fixed by #12677
Closed
2 tasks done

json infer_schema drops null-only object keys #11860

cmdlineluser opened this issue Oct 19, 2023 · 0 comments · Fixed by #12677
Labels
bug Something isn't working python Related to Python Polars

Comments

@cmdlineluser
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

data = b"""[
{"a": null, "b": 1, "c": null},
{"a": null, "b": 2, "c": null},
{"a": null, "b": 3, "c": null}
]"""

pl.read_json(data)
# shape: (3, 1)
# ┌─────┐
# │ b   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 2   │
# │ 3   │
# └─────┘

Log output

No response

Issue description

It seems that at least 1 non-null value is required for an object key to be included in the final output when using pl.read_json / .str.json_extract

From what I can tell, filter_map_nulls skips all nulls

fn infer_object(inner: &Object) -> PolarsResult<DataType> {
let fields = inner
.iter()
.filter_map(|(key, value)| {
infer(value)
.map(|dt| filter_map_nulls(dt).map(|dt| (key, dt)))

Which results in the "all-null" keys being ignored/dropped?

Expected behavior

Just using pandas as a comparison, a and c are present in the result:

import io
import pandas as pd
import polars as pl

data = """[
{"a": null, "b": 1, "c": null},
{"a": null, "b": 2, "c": null},
{"a": null, "b": 3, "c": null}
]"""

pl.from_pandas(pd.read_json(io.StringIO(data)))
# shape: (3, 3)
# ┌──────┬─────┬──────┐
# │ a    ┆ b   ┆ c    │
# │ ---  ┆ --- ┆ ---  │
# │ f64  ┆ i64 ┆ f64  │
# ╞══════╪═════╪══════╡
# │ null ┆ 1   ┆ null │
# │ null ┆ 2   ┆ null │
# │ null ┆ 3   ┆ null │
# └──────┴─────┴──────┘

Installed versions

--------Version info---------
Polars:              0.19.9
Index type:          UInt32
Platform:            macOS-12.6.7-arm64-arm-64bit
Python:              3.11.6 (main, Oct  2 2023, 20:46:17) [Clang 14.0.0 (clang-1400.0.29.202)]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         <not installed>
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              2023.6.0
gevent:              <not installed>
matplotlib:          <not installed>
numpy:               1.25.0
openpyxl:            <not installed>
pandas:              2.0.3
pyarrow:             12.0.1
pydantic:            <not installed>
pyiceberg:           <not installed>
pyxlsb:              <not installed>
sqlalchemy:          <not installed>
xlsx2csv:            <not installed>
xlsxwriter:          <not installed>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
1 participant