Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Deserializing lists of dictionary / categorical from parquet #1091

Closed
thobai opened this issue Jun 21, 2022 · 1 comment · Fixed by #1175
Closed

Deserializing lists of dictionary / categorical from parquet #1091

thobai opened this issue Jun 21, 2022 · 1 comment · Fixed by #1175
Assignees
Labels
enhancement An improvement to an existing feature no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@thobai
Copy link

thobai commented Jun 21, 2022

I'm using polars (version 0.13.49 on Python), which is making use of arrow2. I can write columns with lists of type Categorical to parquet, but I cannot read them back. It seems this is not yet implemented on arrow2. Polars fails with the error message: exceptions.ArrowErrorException: NotYetImplemented("Deserializing type Dictionary(UInt32, LargeUtf8, false) from parquet")

The following code reproduces the problem:

import polars as pl
from polars import col
print(pl.version())

df = pl.DataFrame(
    {
        'str': ['A', 'B', 'A', 'B', 'C'],
        'group': [1,1,2,1,2]
    }
).lazy()
df = df.with_column(col('str').cast(pl.Categorical))

df_groups = df.groupby('group').agg([col('str').list().alias('str_list')])
df_groups.collect().write_parquet('test.parquet')
df_groups_import = pl.scan_parquet('test.parquet').collect()

Is there any plan to support this soon?

@jorgecarleitao
Copy link
Owner

I am working on this :)

@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Jul 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement An improvement to an existing feature no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants