Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apache Arrow as a recommended library and supported file attachment #223

Merged
merged 1 commit into from
Jul 29, 2021

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Jun 2, 2021

Apache Arrow is exposed as Arrow in the standard library, and fileAttachment.arrow() returns a Promise to an Arrow.Table.

Screen Shot 2021-06-02 at 3 56 19 PM

@mbostock mbostock requested a review from visnup June 2, 2021 22:56
@visnup
Copy link
Member

visnup commented Jun 2, 2021

We're effectively pinned to these library versions, at least major if not minor versions, until we can version the standard library, correct? It feels like they are making steady, significant progress on this library and it's gone from v1 to v4 in the past year. All of the other libraries we've recently added I assumed were pretty stable, but I'm half worried Arrow could go from 4 to 7 before we can follow suit.

@mbostock
Copy link
Member Author

mbostock commented Jun 2, 2021

We’re committed to backwards compatibility until we ship version pinning, yes. I don’t think we should block adding useful functionality on us shipping version pinning: I’d rather include a slightly out-of-date version of Apache Arrow in the box than nothing.

@mbostock
Copy link
Member Author

mbostock commented Jun 3, 2021

Side note, but it does look like 5.0.0 is already planned per the package.json, but I can’t find any release notes, so I’m not really sure what’s different. In any case, I think we should still go ahead, but also redouble our efforts to ship version pinning.

@domoritz
Copy link
Contributor

domoritz commented Jun 3, 2021

All Arrow packages are released every three months and every time there is a new major version. Note that the binary format is not changing. See https://arrow.apache.org/docs/format/Versioning.html for details.

In the past few versions, the JS library hasn't changed much but for v5, we started some significant improvements to make the library leaner and more tree-shakeable. One breaking change we already added is apache/arrow#10277 (which you can work around easily by returning a DataFrame) and another significant change will be apache/arrow#10371.

Copy link
Member

@visnup visnup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be updated to 4.0.1. But yeah ok, let's recommend Arrow!

@mbostock
Copy link
Member Author

Arrow 5.0.0 is out already, but since we plan on adding Arquero imminently, I figure we should stick to 4.0.1.

@mbostock mbostock merged commit 273f027 into main Jul 29, 2021
@mbostock mbostock deleted the mbostock/arrow branch July 29, 2021 22:11
@domoritz
Copy link
Contributor

The api hasn't changed much between 4 and 5. The biggest change is that tables don't extend data frame anymore (but data frames still extend tables).

@visnup
Copy link
Member

visnup commented Jul 29, 2021

Is there a reason we can't do 5 then? Would it be incompatible with the current version Arquero?

@domoritz
Copy link
Contributor

I don't think so but it would be good to confirm by updating arquero to v5.

@mbostock
Copy link
Member Author

I will investigate upgrading to Arrow 5 at the time we add Arquero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants