-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add output support for Arrow Flight #8
Comments
Recently I have looked at that topic for a little bit. These are my findings for now and I will continue to research. According to Apache Arrow project’s Implementation Status page: https://arrow.apache.org/docs/status.html Go implementation is another native and separately written one. It was recently (2 or 3 years ago) donated by the folks from InfluxData. So currently we have no native implementation in Go for Parquet File and Flight RPC. There is a current effort for the Parquet File Go Implementation according to this Jira issue: https://issues.apache.org/jira/browse/ARROW-7905 There is also some effort for the Flight RPC. It started in Apr 2020 and it has one old work in progress and one ready pull request waiting for the merge which is last updated this week. https://issues.apache.org/jira/browse/ARROW-8601 There is an option of binding to the C++ implementation in Go. And there is a ready to use project called CArrow for that. So there is a choice for waiting for the native Go implementations to be ready or calling C++ code in Go today with the CArrow project. And additionally I have to mention this for the future: After a quick look at our current Obslytics code, We have roughly Input (Store API), Dataframe (Memory Object) and Output (Writer) object layers in particular order. I am sharing this comment to share the current picture from my point of view. I will continue to watch the updates for the Arrow project in the Apache community and experiment on the use and integration with the Arrow and Arrow Flight. |
Perfect thanks for this. It's currently hard or impossible to move Memory Arrow frame between processes natively. That's why Flight is helpful here.
True, but think about different usage cases. Someone can totally run python app with arrow lib and run gRPC client against Flight endpoint. This will do the call to Obslytics which will convert data from Prometheus/Thanos efficiently. It will do the work for first iteration and allow to use other integrations like panda, spark etc from Apache Arrow memory model which constructed data directly from gRPC flight, no? (: |
Yes, I am aware of the Arrow Flight's purpose and use case but I focused to Parquet File creation (and so the Memory Format) in my previous comment. So I think we can say, for now, we will skip searching for a more efficient way of Parquet File creation with standard Apache Arrow, and we want to focus to the Apache Arrow Flight server/client remote data sharing over the network use case, right? |
Good news :) Arrow Flight Go implementation is just merged a couple days ago. I will experiment with it soon. https://issues.apache.org/jira/browse/ARROW-8601 |
No description provided.
The text was updated successfully, but these errors were encountered: