Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JS] Regression in roundtripping JS Date objects from apache-arrow@15 to apache-arrow@16 #43193

Open
marvold-mw opened this issue Jul 8, 2024 · 1 comment

Comments

@marvold-mw
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

The Create a Table from JavaScript arrays example from the documentation is sufficient:

import { tableFromArrays } from 'apache-arrow';

const LENGTH = 2000;

const rainAmounts = Float32Array.from(
    { length: LENGTH },
    () => Number((Math.random() * 20).toFixed(1)));

const rainDates = Array.from(
    { length: LENGTH },
    (_, i) => new Date(Date.now() - 1000 * 60 * 60 * 24 * i));

const rainfall = tableFromArrays({
    precipitation: rainAmounts,
    date: rainDates
});

console.table([...rainfall]);

When running this code with apache-arrow@15.0.2 installed, the date column in the resulting table is a column of JavaScript Date objects. With apache-arrow@16.0.0 installed, it is a column of JavaScript numbers representing epoch timestamps. This is a surprising result, I think; while there's no reason why Arrow would be required to map Date objects to something which maps back to Date objects, it is an unusual choice which is not documented and which has clearly been changed recently.

Tested under Node 20.15.0 on Apple Silicon, but I have no reason to believe this is platform-specific.

PR #40892 appears to have gone through some iteration after I tested it in duckdb/duckdb-wasm#1231, and these follow-up changes caused this regression. Perhaps it's desirable—but I'm not quite convinced this was the correct outcome? Feel free to close as by design (but I think it would be worth documenting all of the. type mappings going from JS to Arrow and back, at the very least).

Component(s)

JavaScript

@trxcllnt
Copy link
Contributor

trxcllnt commented Jul 29, 2024

Dates are relatively expensive to create in node, so the decision was made in #40892 to always return the numeric representation, and allow users to construct Dates/handle timezones themselves, i.e. new Date(ts.get(0)).

This strikes a balance between API consistency and performance, as without it, users who didn't wish to incur the cost of constructing Date instances would need to manually enumerate the raw buffers in each underlying data instance (and handle null checks etc. manually):

ts.data.forEach(chunk => chunk.values.forEach(num => /* compute w/ timestamp */))

while there's no reason why Arrow would be required to map Date objects to something which maps back to Date objects

The Vector creation convenience functions still accept Arrays of Date instances because they infer the dtype if none is explicitly passed.

@amoeba amoeba changed the title Regression in roundtripping JS Date objects from apache-arrow@15 to apache-arrow@16 [JS] Regression in roundtripping JS Date objects from apache-arrow@15 to apache-arrow@16 Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants