-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for Arrow IPC stream format #4252
Conversation
Features: * Auto-detection via zio/anyio.NewReader* * Format specification via "-f arrows" and "-i arrows" (".arrows" being the file extension recommend by the Arrow project for this format) * HTTP API support in requests and responses via the application/vnd.apache.arrow.stream MIME type * Preservation on read of most Arrow types (excluding metadata) via Zed named types (with an "arrow_" prefix) for Arrow types without an equivalent Zed type * Ability to write most Arrow types given properly named Zed types (with an "arrow_" prefix) Limitations: * No support for the Arrow IPC file format * Reader is slow due to repeated array.New*Data calls * Writer does not support big endianness, compression, control of record batch length, or dictionary or sparce union types * Tests aren't thorough
This increases the zq executable's size by 17 MB for darwin/amd64 and 12 MB for linux/amd64. Ouch. $ git checkout main
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
$ git rev-parse @
a31161a3404c7cb45ba2dd405b195c01e0123cf9
$ GOOS=darwin GOARCH=amd64 make build
$ ls -lh dist/zq
-rwxr-xr-x 1 noah staff 18M Dec 8 07:34 dist/zq*
$ GOOS=linux GOARCH=amd64 make build
$ ls -lh dist/zq
-rwxr-xr-x 1 noah staff 13M Dec 8 07:34 dist/zq*
$ git checkout arrow
Switched to branch 'arrow'
$ git rev-parse @
2f046e6473c9a85ffa5b4ce19daf8b11a634c364
$ GOOS=darwin GOARCH=amd64 make build
$ ls -lh dist/zq
-rwxr-xr-x 1 noah staff 35M Dec 8 07:35 dist/zq*
$ GOOS=linux GOARCH=amd64 make build
$ ls -lh dist/zq
-rwxr-xr-x 1 noah staff 25M Dec 8 07:35 dist/zq* |
This reverts commit a7a90e7.
{Name: "days", Type: zed.TypeInt32}, | ||
{Name: "milliseconds", Type: zed.TypeUint32}, | ||
} | ||
var decimal128Fields = []zed.Column{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make an issue for adding decimal128 support. The type code exists in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to separate them.
case arrow.INT64: | ||
return zed.TypeInt64, nil | ||
case arrow.FLOAT16: | ||
return r.zctx.LookupTypeNamed("arrow_float16", zed.TypeFloat32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create an issue for float16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already open as #1312.
return nil, err | ||
} | ||
return r.zctx.LookupTypeNamed("arrow_decimal128", typ) | ||
case arrow.DECIMAL256: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add issue for decimal256
zio/arrowio/writer.go
Outdated
|
||
// Writer is a zio.Writer for the Arrow IPC stream format. Given Zed values | ||
// with appropriately named types (see the newArrowDataType implementation), it | ||
// can write all Arrow types except dictionaries and sparse unions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarify this comment with the concept that dictionaries don't exist as a Zed type but we could add support to boomerang dictionary types with a named arrow type. Maybe create an issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 868965f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Features:
Limitations:
For #4226.