Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LocalDate and LocalTime are not Serialized Properly over Arrow #5265

Closed
cpwright opened this issue Mar 19, 2024 · 2 comments · Fixed by #5446
Closed

LocalDate and LocalTime are not Serialized Properly over Arrow #5265

cpwright opened this issue Mar 19, 2024 · 2 comments · Fixed by #5446
Assignees
Labels
arrow barrage barrage-wrkr2wrkr bug Something isn't working core Core development tasks
Milestone

Comments

@cpwright
Copy link
Contributor

Run in an Enterprise environment, but this reproducer is only using Core components.

Deephaven Version: 0.30.4

x = """
import jpy
from deephaven import dtypes
from deephaven import new_table
from deephaven.column import InputColumn
from deephaven.column import string_col

DateTimeFormatter = jpy.get_type("java.time.format.DateTimeFormatter")
LocalDate = jpy.get_type("java.time.LocalDate")

cols = []
ecd = LocalDate.parse("2023-03-19", DateTimeFormatter.ofPattern("yyyy-MM-dd"))
cols.append(InputColumn("localDateCol", dtypes.LocalDate, input_data=[ecd]))
cols.append(string_col('stringCol', ['someString']))

tbl = new_table(cols)
"""

worker.run_script(x)
tbl = worker.open_table('tbl')

print(tbl)

the below line should not segfault ...

z = tbl.to_arrow().to_pandas()

@cpwright cpwright added bug Something isn't working triage labels Mar 19, 2024
@rcaudy rcaudy added this to the 2. April 2024 milestone Mar 19, 2024
@jmao-denver
Copy link
Contributor

Reproduce in DHC:

import jpy
from deephaven import dtypes
from deephaven import new_table
from deephaven.column import InputColumn
from deephaven.column import string_col
from deephaven.arrow import to_arrow

DateTimeFormatter = jpy.get_type("java.time.format.DateTimeFormatter")
LocalDate = jpy.get_type("java.time.LocalDate")

cols = []
ecd = LocalDate.parse("2023-03-19", DateTimeFormatter.ofPattern("yyyy-MM-dd"))
cols.append(InputColumn("localDateCol", dtypes.LocalDate, input_data=[ecd]))
cols.append(string_col('stringCol', ['someString']))

tbl = new_table(cols)
pa_table = to_arrow(tbl)
print(pa_table)

produces

pyarrow.Table
localDateCol: fixed_size_binary[6]
stringCol: string
----
localDateCol: [[000000000A00]]
stringCol: [<Invalid array: Buffer #1 too small in array of type string and length 1: expected at least 4 byte(s), got 0>]

@nbauernfeind Any idea?

@nbauernfeind
Copy link
Member

nbauernfeind commented Mar 19, 2024

The fastest work around: drop the local-date column (or .view to a stringified version)

Once upon a time we decided to create our own LocalDate and LocalTime because we felt arrow's options were not well documented and it was not obvious what encoding should actually be used. Last I looked into this with Colin (Nov. '23), we discovered that the UI is previewing LocalDate/LocalTime columns anyway.. so we can probably swap to the proper encoding whenever we're ready.

That said, the custom type for LocalDate is a fixed_size_binary blob of 6 bytes in length. The py arrow client should still be able to parse this -- even if the 6-byte blob might be meaningless in panda-land.

I'm going to see if I can replicate the issue round-tripped with the java-flight client; the error msg suggests that padding is not correct at the end of the LocalDate arrow buffer. (Noting that swapping localDateCol and stringCol work.)

Edit: Oh, it's worse than I thought. The generated schema says I'm going to send this as fixed-len-byte-array but then it sends it as a variable length string anyway! I will implement the non-custom types to fix LocalDate and LocalTime.

@nbauernfeind nbauernfeind changed the title LocalDate .to_arrow() conversion in python client LocalDate and LocalTime are not Serialized Properly over Arrow Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow barrage barrage-wrkr2wrkr bug Something isn't working core Core development tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants