-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplify interactions with arrow flight APIs #377
Conversation
Initial work to implement some basic traits
Some more polishing of the basic code I provided last week.
Add support for representing tickets as base64 encoded strings. Also: more polishing of Display, etc...
When writing BOOLEAN data, writing more than 2048 rows of data will overflow the hard-coded 256 buffer set for the bit-writer in the PlainEncoder. Once this occurs, further attempts to write to the encoder fail, becuase capacity is exceeded, but the errors are silently ignored. This fix improves the error detection and reporting at the point of encoding and modifies the logic for bit_writing (BOOLEANS). The bit_writer is initially allocated 256 bytes (as at present), then each time the capacity is exceeded the capacity is incremented by another 256 bytes. This certainly resolves the current problem, but it's not exactly a great fix because the capacity of the bit_writer could now grow substantially. Other data types seem to have a more sophisticated mechanism for writing data which doesn't involve growing or having a fixed size buffer. It would be desirable to make the BOOLEAN type use this same mechanism if possible, but that level of change is more intrusive and probably requires greater knowledge of the implementation than I possess. resolves: apache#349
Tacky, but I can't think of better way to do this without specialization.
Bring in parquet fix.
Remove the byte tracking from the PlainEncoder and use the existing bytes_written() method in BitWriter. This is neater.
The test ensures that we can write > 2048 rows to a parquet file and that when we read the data back, it finishes without hanging (defined as taking < 5 seconds). If we don't want that extra complexity, we could remove the thread/channel stuff and just try to read the file and let the test runner terminate hanging tests.
The values.len() reports the number of values to be encoded and so must be divided by 8 (bits in a bytes) to determine the effect on the byte capacity of the bit_writer.
Following merge with master, make sure this is exposed so that integration tests work. also: there has been a release since I last looked at this so update the deprecation warnings.
TryFrom, not From
clippy complains about using deprecated functions, so replace them with the new trait support. also: fix the trait documentation
Codecov Report
@@ Coverage Diff @@
## master #377 +/- ##
==========================================
- Coverage 82.65% 82.47% -0.19%
==========================================
Files 165 166 +1
Lines 45556 45691 +135
==========================================
+ Hits 37655 37683 +28
- Misses 7901 8008 +107
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments @garyanaplan
arrow-flight/src/utils.rs
Outdated
// TODO: add more explicit conversion that exposes flight descriptor and metadata options | ||
/// Convert a `Schema` to `SchemaResult` by converting to an IPC message | ||
#[deprecated( | ||
since = "4.3.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be 4.4.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
arrow-flight/src/lib.rs
Outdated
|
||
fn try_from(value: i32) -> ArrowResult<Self> { | ||
match value { | ||
0 => Ok(DescriptorType::Unknown), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DescriptorType
should have a method for this IIRC. Prost allows creating enums from numbers, so we might not need to implement this ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- update deprecated warnings - improve TryFrom for DescriptorType
@alamb I've marked this as an API change because the import paths for the proto module have changed. |
Initial work to implement some basic traits and illustrate the direction of travel
Which issue does this PR close?
closes #376
Rationale for this change
Illustrating direction of travel. Open Questions:
Should I put this into a separate module or is it ok in utils.rs?gen
module.Should I work on hiding some of the raw generated code?What changes are included in this PR?
Illustrates what kind of work I'm doing before I go too far in the wrong direction
Are there any user-facing changes?
Deprecation warnings...