-
Notifications
You must be signed in to change notification settings - Fork 224
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #1464 +/- ##
==========================================
- Coverage 83.93% 83.92% -0.02%
==========================================
Files 387 387
Lines 41596 41596
==========================================
- Hits 34913 34908 -5
- Misses 6683 6688 +5 see 5 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
I am not entirely sure about this. I totally agree on working together towards a similar memory/crate interface. But as it stands this deprecates the whole arrow2 API, which is used by polars and is deeply entrenched in the project. |
We tried to migrate to arrow-rs last week. But we found the untyped |
I'm actively working on porting the remainder of this across in apache/arrow-rs#3879 perhaps you might be able to lend a hand identifying any missing functionality? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some wording suggestions to soften the tone and yet still communicate clearly to potential users that this crate isn't likely to maintained going forward
It might also be good to remove / update the https://github.com/jorgecarleitao/arrow2#faq section
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@ritchie46 can you help with the alternate wording? I didn't realize you still planned to keep maintaining the arrow2 crate. Maybe we have different ideas of what "maintaining" means 🤔 I think the best thing for users of arrow2 would be to be as explicit as possible about the state of the crate. For example:
If you can provide some guidance on these issues I can take a shot at writing a section to summarize |
@alamb polars is heavily depended on arrow2's typed arrays/offsets/buffer interfaces. If we are able have this API on top of arrow-rs memory, I can progressively refactor to arrow-rs, but this should be a slow progress as this is not top priority for polars. My ideal situation is a way to improve the memory interop first (this is what @tustvold has been adding, so that is a great start). As I am aware, there is currently not away to go forward whilst using arrow2's array API. Before this should be deprecated, there should be such an interface as a zero-cost abstraction on top of arrow-rs's buffers IMO. If that is not something that is wanted by arrow-rs, that might also be fine. Maybe polars should fork arrow2 and continue maintaining it less publicly. I am just scared of the core dependency being pulled from underneath of polars. Releases are still needed as I want bugs to be fixed and be able to publish on crates.io. |
How zero-cost does this need to be? I guess my thinking was that #1446 was sufficiently low overhead as it doesn't copy the underlying buffers, and so would be sufficient to bridge between migrated and unmigrated parts of a codebase? What does a couple of additional reference counts matter when you have a kernel that is processing an array of thousands of rows? Perhaps I am missing something here? The largest overhead at the moment is likely schema mapping, switching arrow2 to using the same schema types is definitely tractable should this prove a bottleneck.
My intention with this PR was not to say people should stop using these APIs today, I'm not suggesting we archive this repo 😅, but rather to communicate a direction of travel to the community. I'm not overly concerned about timescales, if it takes 6 months it takes 6 months, I just would like to be able to set both projects on a path towards unification and clearly communicate this, along with any implications. |
I certainly don't mean in any way to stop or prevent maintenance of arrow2 for people who want to do so. As he says, @tustvold and I and the rest of the Arrow community plan to keep pushing arrow-rs forward and are trying to assist users of arrow2 who want to join us.
The purpose of this particular PR to change the documentation is as @tustvold says, to communicate that there seems to be no maintainers or plan for maintenance of this crate. I certainly don't plan to "pull" anything from arrow2 or polars.
I would personally love to get the zero copy approach in arrow-rs. In terms of maintaining a polars specific fork of arrow2 that certainly is a reasonable approach (or since you are a maintainer of this repo, you could maybe just use this one). Also perhaps you or other interested parties could work out an alternate maintenance plan for this crate with its other users. I think it is up to you where you want to invest your time and effort. |
@@ -15,6 +15,27 @@ for a general introduction on how to use this crate, and | |||
[API docs](https://jorgecarleitao.github.io/arrow2/main/docs/arrow2) | |||
for a detailed documentation of each of its APIs. | |||
|
|||
# 2023-04-12: Deprecation Notice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# 2023-04-12: Deprecation Notice | |
# 2023-04-12: Maintenance Notice |
Maybe if we change the title so it didn't say "deprecated" this would sound less scary 🤔
Here is a PR that clarifies that the maintenance bandwidth on this crate is low: #1476 It side steps the issues of what "deprecated" might mean / not mean for the future and instead attempts to describe the current state of affairs. I also hope it might help any current users of arrow2 to offer to help maintain the crate if that is valuable to them. |
See #1476 |
Closes #1429
Following on from #1446 I think all that remains is to provide some form of communication to users regarding the relationship of the projects going forward. I believe what I've written covers the consensus from the various issues and discussions on the topic, but my intent here is purely to kickstart the conversation, not to dictate the message 😅