-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12549: [JS] Table and RecordBatch should not extend Vector, make JS lib smaller #10371
Conversation
[Symbol.iterator]() { | ||
return new MapRowIterator(this[kKeys], this[kVals]); | ||
} | ||
public toArray() { return Object.values(this.toJSON()); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be more efficient not to go through toJSON
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely would, yeah 😄
js/src/table.ts
Outdated
return this.data.reduce((ary, data) => | ||
ary.concat(toArrayVisitor.visit(data)), | ||
new Array<Struct<T>['TValue']>() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this perform compared to return [...this]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dunno, can't run benchmarks yet 😛
js/src/table.ts
Outdated
|
||
/** @ignore */ | ||
export interface Table<T extends { [key: string]: DataType } = any> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to support table.map(...)
?
This fixes https://issues.apache.org/jira/browse/ARROW-10794 as well, right? |
@domoritz yep, looks like it |
js/package.json
Outdated
@@ -96,10 +96,9 @@ | |||
"ts-jest": "27.0.0", | |||
"ts-node": "10.0.0", | |||
"typedoc": "0.20.36", | |||
"typescript": "4.0.2", | |||
"typescript": "4.3.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to figure out a solution to this before proceeding: microsoft/TypeScript-DOM-lib-generator#890 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…erable into factories.ts
…creation as for perf
Use data in struct row so we don't have to create a new vector. In map row, use data for values but not key since the key needs to be a vector for the iterator. Also, we want to cache it.
@ursabot please benchmark lang=JavaScript |
Benchmark runs are scheduled for baseline = 7029f90 and contender = 6619579. Results will be available as each benchmark for each run completes. |
Benchmark runs are scheduled for baseline = 7029f90 and contender = 20b66c2. 20b66c2 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This pull request addresses a number of issues that requires a more substantial refactor.
The main goals are:
Vector
,Chunked
, andColumn
classes.In this pull request, we have eliminated type specific Vector classes. There is now only one vector that has a data instance and we use type-specific visitors. Record batches don't inherit from vectors anymore. Neither do Tables. Columns are gone. To create vectors and tables, we now have separate methods that can be easily tree shaken.
We also added tests for the bundles, fixed some issues with bundling in webpack, updated dependencies (including typescript and flatbuffers). We also added memoization to dictionary vectors to reduce the overhead of decoding UTF-8 to strings.
A quick overview of Arrow with the new API: https://observablehq.com/d/9480eccb30a21010.
Also addresses:
Performance comparison:
Master:
This branch: