You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Flight in C++ bypass the Protobuf serializer by specializing a gRPC template and doing what is almost certainly an illegal cast to trick gRPC into using our specialization. However, gRPC supports a "generic" API that lets you call methods by name and get back the gRPC byte buffers, which should be a safe, officially sanctioned way of doing what we want. The API linked there is only applicable to async gRPC, so it would only help our new async implementation.
This has a few other benefits:
The gRPC template we specialize technically lets us return (de)serialization errors, but in practice gRPC crashes if you error. This new API would let us handle the error gracefully in Flight code.
We could more easily pass in other arguments, like a memory allocator, to the (de)serialization code. (gRPC still controls memory allocation, though.)
Right now, we do a cast that is definitely UB to try to hook into the gRPC serialization machinery
gRPC offers a "generic" stub that gives us the raw byte buffers directly without that hack, but it is only available for the async API
The gRPC serialization machinery is known to be broken (e.g. if you raise an error gRPC will just abort instead) so being able to bypass it entirely is a benefit
So when we implement the async version of DoGet etc we should just try to use the generic stub and avoid all this in the first place (requires some refactoring, though)
Hello! I'm working on implementing a few of the Flight RPC endpoints into our existing gRPC server (where we use the new async/callback api exclusively) and came across this ticket. I have everything working, but would love to be able to utilize the the gRPC zero-copy serialization in my server (right now I'm returning populated FlightData objects, which requires a copy of the data in the IpcPayload objects).
I could reuse the existing functions if, instead of hiding all of the gRPC server internals, arrow exposed the functions in flight/transport/grpc/serialization_internal.{h,cc}--specifically the FlightDataSerialize function. My plan was to build a ServerWriteReactor implementation that my application could add RecordBatches to, that would in turn translate each RecordBatch into a series of FlightPayload objects (similar to what is done in the RecordBatchStream) and in turn to a series of ByteBuffers (using FlightDataSerialize). The reactor would send the ByteBuffers as soon as they were available and the previous send was complete.
If there isn't already work in process to implement this ticket I'd be happy to take a stab at exposing the serialization internals and then exposing a reactor implementation as described above. With those in place it would be much easier for projects to implement Flight RPC interfaces into existing gRPC servers, while still taking advantage of the zero-copy serialization.
Describe the enhancement requested
Flight in C++ bypass the Protobuf serializer by specializing a gRPC template and doing what is almost certainly an illegal cast to trick gRPC into using our specialization. However, gRPC supports a "generic" API that lets you call methods by name and get back the gRPC byte buffers, which should be a safe, officially sanctioned way of doing what we want. The API linked there is only applicable to async gRPC, so it would only help our new async implementation.
This has a few other benefits:
Component(s)
C++, FlightRPC
The text was updated successfully, but these errors were encountered: