Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Migrate to arrow_format crate #517

Merged
merged 3 commits into from
Oct 18, 2021
Merged

Migrate to arrow_format crate #517

merged 3 commits into from
Oct 18, 2021

Conversation

jorgecarleitao
Copy link
Owner

@jorgecarleitao jorgecarleitao commented Oct 10, 2021

I published a crate arrow_format containing the generated code for the arrow-format (IPC and Flight atm, c data interface is still missing) that we have here.

The idea is that arrow_format tracks any change to the arrow specification, and/or changes to flatbuffers' code generation.

This PR migrates this code-base to it.

This is part of #257 and related to #463.

@jorgecarleitao jorgecarleitao added the investigation Issues or PRs that are investigations. Prs may or may not be merged. label Oct 10, 2021
@codecov
Copy link

codecov bot commented Oct 10, 2021

Codecov Report

Merging #517 (3ed6701) into main (47a632c) will increase coverage by 1.03%.
The diff coverage is 57.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #517      +/-   ##
==========================================
+ Coverage   80.20%   81.24%   +1.03%     
==========================================
  Files         381      380       -1     
  Lines       23744    23472     -272     
==========================================
+ Hits        19044    19069      +25     
+ Misses       4700     4403     -297     
Impacted Files Coverage Δ
...ng/src/flight_server_scenarios/auth_basic_proto.rs 0.00% <0.00%> (ø)
...ng/src/flight_server_scenarios/integration_test.rs 0.00% <0.00%> (ø)
...-testing/src/flight_server_scenarios/middleware.rs 0.00% <ø> (ø)
src/io/flight/mod.rs 0.00% <0.00%> (ø)
src/io/ipc/read/array/boolean.rs 100.00% <ø> (ø)
src/io/ipc/read/array/dictionary.rs 100.00% <ø> (ø)
src/io/ipc/read/array/fixed_size_binary.rs 100.00% <ø> (ø)
src/io/ipc/read/array/fixed_size_list.rs 58.33% <ø> (ø)
src/io/ipc/read/array/list.rs 100.00% <ø> (ø)
src/io/ipc/read/array/map.rs 62.50% <ø> (ø)
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 47a632c...3ed6701. Read the comment docs.

@jorgecarleitao
Copy link
Owner Author

jorgecarleitao commented Oct 10, 2021

Migrated flight. With the last commit, all the utilities to serialize and deserialize data from and to the flight data structs are part of this crate and feature-gated by io_flight.

I really dislike that io_flight drags tonic, because the generated code in the format done by protobuf requires both prost (data) and tonic (server and client).

Specifically, users that wish to use the arrow flight protocol in non-tonic implementations have to drag the tonic dependency just to serialize our schema into prost in this PR.

A potentially solution here is to split the generated flight code in arrow-format crate in two features:

  • arrow-format/flight-data (e.g. FlightDescriptor, FlightData)
  • arrow-format/flight-protocol (e.g. flight_service_client and flight_service_server)

This would allow arrow2 to only depend on prost with feature io_flight, and only have the integration tests (and any user of the flight service) to depend on arrow-format/flight-protocol (and thus tonic).

This way, people that wish to use the tonic implementation of the protocol would:

  • activate arrow2/io_flight (which would only drag prost)
  • activate arrow-format/flight-protocol

People that wish to use other implementation of the protocol would

  • activate arrow2/io_flight (which would only drag prost)
  • implement the protocol in their favorite library (dragging whatever dependencies they wish)
  • bind the serialized structs from arrow_format/flight-data to their implementation

Any thoughts here, @Dandandan , @houqp , @nevi-me ?

@houqp
Copy link
Collaborator

houqp commented Oct 10, 2021

I think this is a good idea. I would go one step further and add a feature flag in arrow2 that maps to arrow-format/flight-protocol so users won't need to pull in arrow-format explicitly in their dependency graph if not needed. This should reduce chances for version conflicts on future arrow2 updates. So maybe something along the lines of arrow2/io_flight_tonic, arrow2/io_flight_protocol, etc.

@jorgecarleitao
Copy link
Owner Author

@houqp , could you elaborate?

arrow-format is a slowly changing crate; arrow2 is not. The dependency tree is tonic <- arrow-format <- arrow2 <- [application]. If a user adds arrow-format and arrow2, arrow2 will pick the latest version of arrow-format declared in the application's Cargo.toml, which should not be a concern?

@houqp
Copy link
Collaborator

houqp commented Oct 11, 2021

oh, never mind, I just saw you pinned arrow-format to version "*", which should not cause downstream version conflicts. I thought you pinned arrow-format to a major version.

@jorgecarleitao jorgecarleitao marked this pull request as ready for review October 14, 2021 20:52
@jorgecarleitao jorgecarleitao force-pushed the arrow_format branch 4 times, most recently from c6df8c6 to f5de334 Compare October 17, 2021 05:04
@jorgecarleitao
Copy link
Owner Author

If no one opposes, I plan to merge this tomorrow.

@jorgecarleitao jorgecarleitao added backwards-incompatible enhancement An improvement to an existing feature and removed investigation Issues or PRs that are investigations. Prs may or may not be merged. enhancement An improvement to an existing feature labels Oct 17, 2021
@jorgecarleitao jorgecarleitao changed the title Experiment: migrate to arrow_format crate. Migrate to arrow_format crate Oct 17, 2021
@jorgecarleitao jorgecarleitao merged commit 13f8d09 into main Oct 18, 2021
@jorgecarleitao jorgecarleitao deleted the arrow_format branch October 18, 2021 05:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants