-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DictEncoded doesn't write as DictEncoded #116
Comments
quinnj
added a commit
that referenced
this issue
Jan 30, 2021
…ng issues Fixes #117, #116, and #113. For #116, we just need to special case if user happens to pass in a DictEncoded themselves. We need to pass it through to the `toarrowvector` method that no-ops. For #113, we require the new functionality in PooledArrays that allows passing the `signed` and `compress` keyword arguments to ensure we get signed refs for our dict encoding. For #117, we add CategoricalArrays as a test dependency and ensure that if it contains any `missing` value, we *don't* recode the indices values down by 1, since the `missing` ref is 0, so other refs can already be considered "offsets". If there are no `missing`, then we still need to recode down since refs should always start from 0 in arrow format.
quinnj
added a commit
that referenced
this issue
Jan 31, 2021
#119) * Rework dict encoding of PooledArray/CategoricalArray to fix outstanding issues Fixes #117, #116, and #113. For #116, we just need to special case if user happens to pass in a DictEncoded themselves. We need to pass it through to the `toarrowvector` method that no-ops. For #113, we require the new functionality in PooledArrays that allows passing the `signed` and `compress` keyword arguments to ensure we get signed refs for our dict encoding. For #117, we add CategoricalArrays as a test dependency and ensure that if it contains any `missing` value, we *don't* recode the indices values down by 1, since the `missing` ref is 0, so other refs can already be considered "offsets". If there are no `missing`, then we still need to recode down since refs should always start from 0 in arrow format. * PooledArrays 1.0 compat * Update src/arraytypes/dictencoding.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Check refpool * Fix test Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Fixed in #119 |
tanmaykm
pushed a commit
to tanmaykm/arrow-julia
that referenced
this issue
Apr 7, 2021
apache#119) * Rework dict encoding of PooledArray/CategoricalArray to fix outstanding issues Fixes apache#117, apache#116, and apache#113. For apache#116, we just need to special case if user happens to pass in a DictEncoded themselves. We need to pass it through to the `toarrowvector` method that no-ops. For apache#113, we require the new functionality in PooledArrays that allows passing the `signed` and `compress` keyword arguments to ensure we get signed refs for our dict encoding. For apache#117, we add CategoricalArrays as a test dependency and ensure that if it contains any `missing` value, we *don't* recode the indices values down by 1, since the `missing` ref is 0, so other refs can already be considered "offsets". If there are no `missing`, then we still need to recode down since refs should always start from 0 in arrow format. * PooledArrays 1.0 compat * Update src/arraytypes/dictencoding.jl Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr> * Check refpool * Fix test Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Arrow.toarrowvector
of a Arrow.DictEncoded type returns anArrow.List
.Perhaps it doesn't detect that
Arrow.DictEncoded
is a factor-like (in the R sense) object.I can look at this if you wish but I'm not entirely sure where
The text was updated successfully, but these errors were encountered: