-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to iterate over a record batch? #468
Comments
The patterns I've seen/used revolved around down-casting/type matching on the FieldVector subclasses:
VectorSchemaRoot stores a schema and a corresponding bag of FieldVectors returned by For (1) you do need to know the mapping between your schema's arrow field types <-> ValueVectors subclasses and have logic to cast accordingly based on the field type. This is reflected a bit in this example the vectors have to be downcasted in order to write the values properly, this also applies to reading their values in a typed manner. Agree that it's a bit tricky at first to map the simple types to their ValueVector subclasses (basically need to look here) - would be nice documentation add. It looks like there's already a stub for a table with this mapping here. ("Table with non-intuitive names"). As an aside it seems safer to cast the vector to its typed vector first rather than casting the type directly from the For (2) this is letting java multiple dispatch route the FieldVector from your VectorSchemaRoot to a function that accepts its concrete subclass. Curious what other folks approaches are/if there are conventions or patterns I might be missing here. |
The mapping should stay fixed. Unfortunately I don't think there's a way in Java to do some type-level metaprogramming like we can in the C++ library (in C++ the vector type is effectively an associated type of the...type type, so you can write |
In general I would like to see several documentation improvements:
|
Describe the usage question you have. Please include as many useful details as possible.
Now I have a
VectorSchemaRoot
. I can see to iterate over the batch withgetVector
and thengetObject
.But the return value is of type
Obejct
. And I wonder how I can downcast it for some useful class I can retrieve the real value (string, int, float, etc.).I know we have the field info of each vetcor, but I don't know the mapping between field type to real Java class. It looks over challenge to remember all the mapping by reverse engineering the code, and it may change as version evolves.
I checked https://arrow.apache.org/docs/java/index.html but all the pages tell about constructing a batch and how to move it from one place to another, rather than tell about how to read and dump a batch to a typed two-dimensional matrix.
The most trivial usage,
contentToTSVString
, callObject::toString
on each cell. But I don't thing we should convert all the values to String and reparse it to concrete type.The text was updated successfully, but these errors were encountered: