discuss: Move into the Apache ORC PMC and develop as `apache/orc-rust` #120

Xuanwo · 2024-08-14T13:28:48Z

Hello, everyone. I am initiating this discussion to explore the possibility of moving into the Apache ORC PMC and developing apache/orc-rust.

By developing apache/orc-rust, we will establish this implementation as the official Rust version of ORC, thereby creating a larger and more cohesive community for those interested in a Rust ORC implementation. This will make it much easier for us to build a community around this project.

What are your thoughts? I plan to discuss this with the orc community if contributors are satisfied with it.

cc @Jefffrey @WenyXu @progval @waynexia @klangner @alamb @v0y4g3r @youngsofun @harveyyue

The text was updated successfully, but these errors were encountered:

waynexia · 2024-08-14T13:39:20Z

About the place to move in, we can also consider Arrow or Datafusion, given this repo is deeply involved with Apache Arrow and Datafusion at the API level. Like the fact that parquet-rs has been maintained in arrow-rs for a long time.

progval · 2024-08-14T13:51:31Z

given this repo is deeply involved with Apache Arrow and Datafusion at the API level

To be honest I find it a bit surprising to have integration with a query engine in an ORC library. Would it make sense to split the Datafusion-related bits into either their own crate (with all the fun of keeping versions in sync), or move them to Datafusion (like it already does with Parquet)?

waynexia · 2024-08-14T13:52:06Z

Cross reference: https://lists.apache.org/thread/zrwnhwojf9v5c58hov8hcnpt03ftf3ql

Xuanwo · 2024-08-14T14:26:37Z

Hi, I wasn't involved in the datafusion side of development, so I'm not familiar with the ORC and Datafusion integration. From my perspective as a passerby, datafusion-orc seems more like Datafusion integration rather than focusing solely on the ORC format. I'm a bit worried that this might reduce the likelihood of potential users or contributors finding us.

I agree with @progval that it would be better to separate the ORC support component directly into DataFusion, similar to how parquet is handled.

Xuanwo · 2024-08-14T14:30:23Z

We have three possible options:

apache/orc-rust
apache/arrow-rs/orc
apache/datafusion-orc

wgtmac · 2024-08-14T15:17:12Z

Chiming in from the Apache ORC community. I'm very excited for the discussion! Sorry that I'm not familiar with rust. For the approach of apache/orc-rust, I'd like to know what's the current dependency and what's the estimated amount of work to spilt the repository to remove the datafusion integration part?

Xuanwo · 2024-08-14T15:24:11Z

I'd like to know what's the current dependency

I created a dependency list, and I believe it meets the requirements of the ASF.

please checkout the details here:

Details

0BSD (1): adler@1.0.2
Apache-2.0 (114): adler@1.0.2, ahash@0.8.11, android-tzdata@0.1.1, android_system_properties@0.1.5, anyhow@1.0.86, arrow@52.0.0, arrow-arith@52.0.0, arrow-array@52.0.0, arrow-buffer@52.0.0, arrow-cast@52.0.0, arrow-csv@52.0.0, arrow-data@52.0.0, arrow-ipc@52.0.0, arrow-json@52.0.0, arrow-ord@52.0.0, arrow-row@52.0.0, arrow-schema@52.0.0, arrow-select@52.0.0, arrow-string@52.0.0, autocfg@1.3.0, base64@0.22.1, bitflags@1.3.2, bumpalo@3.16.0, cc@1.0.104, cfg-if@1.0.0, chrono@0.4.38, chrono-tz@0.9.0, chrono-tz-build@0.3.0, const-random@0.1.18, const-random-macro@0.1.16, core-foundation-sys@0.8.6, crc32fast@1.4.2, either@1.13.0, equivalent@1.0.1, fallible-streaming-iterator@0.1.9, flatbuffers@24.3.25, flate2@1.0.30, getrandom@0.2.15, half@2.4.1, hashbrown@0.14.5, heck@0.5.0, iana-time-zone@0.1.60, iana-time-zone-haiku@0.1.2, indexmap@2.2.6, itertools@0.12.1, itoa@1.0.11, jobserver@0.1.31, js-sys@0.3.69, lazy_static@1.5.0, lexical-core@0.8.5, lexical-parse-float@0.8.5, lexical-parse-integer@0.8.6, lexical-util@0.8.5, lexical-write-float@0.8.5, lexical-write-integer@0.8.5, libc@0.2.155, libm@0.2.8, log@0.4.22, miniz_oxide@0.7.4, num@0.4.3, num-bigint@0.4.6, num-complex@0.4.6, num-integer@0.1.46, num-iter@0.1.45, num-rational@0.4.2, num-traits@0.2.19, once_cell@1.19.0, orc-rust@0.3.1, pkg-config@0.3.30, proc-macro2@1.0.86, prost@0.12.6, prost-derive@0.12.6, quote@1.0.36, rand@0.8.5, rand_core@0.6.4, regex@1.10.5, regex-automata@0.4.7, regex-syntax@0.8.4, rustc_version@0.4.0, rustversion@1.0.17, ryu@1.0.18, semver@1.0.23, serde@1.0.203, serde_derive@1.0.203, serde_json@1.0.119, siphasher@0.3.11, snafu@0.8.3, snafu-derive@0.8.3, static_assertions@1.1.0, syn@2.0.68, thiserror@1.0.61, thiserror-impl@1.0.61, unicode-ident@1.0.12, unicode-width@0.1.13, version_check@0.9.4, wasi@0.11.0+wasi-snapshot-preview1, wasm-bindgen@0.2.92, wasm-bindgen-backend@0.2.92, wasm-bindgen-macro@0.2.92, wasm-bindgen-macro-support@0.2.92, wasm-bindgen-shared@0.2.92, windows-core@0.52.0, windows-targets@0.52.5, windows_aarch64_gnullvm@0.52.5, windows_aarch64_msvc@0.52.5, windows_i686_gnu@0.52.5, windows_i686_gnullvm@0.52.5, windows_i686_msvc@0.52.5, windows_x86_64_gnu@0.52.5, windows_x86_64_gnullvm@0.52.5, windows_x86_64_msvc@0.52.5, zerocopy@0.7.34, zstd-safe@6.0.6, zstd-sys@2.0.9+zstd.1.5.5
Apache-2.0 WITH LLVM-exception (1): wasi@0.11.0+wasi-snapshot-preview1
BSD-2-Clause (1): zerocopy@0.7.34
BSD-3-Clause (1): snap@1.1.1
BSL-1.0 (1): ryu@1.0.18
CC0-1.0 (1): tiny-keccak@2.0.2
MIT (115): adler@1.0.2, ahash@0.8.11, aho-corasick@1.1.3, android-tzdata@0.1.1, android_system_properties@0.1.5, anyhow@1.0.86, atoi@2.0.0, autocfg@1.3.0, base64@0.22.1, bitflags@1.3.2, bumpalo@3.16.0, byteorder@1.5.0, bytes@1.6.0, cc@1.0.104, cfg-if@1.0.0, chrono@0.4.38, chrono-tz@0.9.0, chrono-tz-build@0.3.0, comfy-table@7.1.1, const-random@0.1.18, const-random-macro@0.1.16, core-foundation-sys@0.8.6, crc32fast@1.4.2, crunchy@0.2.2, csv@1.3.0, csv-core@0.1.11, either@1.13.0, equivalent@1.0.1, fallible-streaming-iterator@0.1.9, flate2@1.0.30, getrandom@0.2.15, half@2.4.1, hashbrown@0.14.5, heck@0.5.0, iana-time-zone@0.1.60, iana-time-zone-haiku@0.1.2, indexmap@2.2.6, itertools@0.12.1, itoa@1.0.11, jobserver@0.1.31, js-sys@0.3.69, lazy_static@1.5.0, lexical-core@0.8.5, lexical-parse-float@0.8.5, lexical-parse-integer@0.8.6, lexical-util@0.8.5, lexical-write-float@0.8.5, lexical-write-integer@0.8.5, libc@0.2.155, libm@0.2.8, log@0.4.22, lz4_flex@0.11.3, lzokay-native@0.1.0, memchr@2.7.4, miniz_oxide@0.7.4, num@0.4.3, num-bigint@0.4.6, num-complex@0.4.6, num-integer@0.1.46, num-iter@0.1.45, num-rational@0.4.2, num-traits@0.2.19, once_cell@1.19.0, parse-zoneinfo@0.3.1, phf@0.11.2, phf_codegen@0.11.2, phf_generator@0.11.2, phf_shared@0.11.2, pkg-config@0.3.30, proc-macro2@1.0.86, quote@1.0.36, rand@0.8.5, rand_core@0.6.4, regex@1.10.5, regex-automata@0.4.7, regex-syntax@0.8.4, rustc_version@0.4.0, rustversion@1.0.17, semver@1.0.23, serde@1.0.203, serde_derive@1.0.203, serde_json@1.0.119, siphasher@0.3.11, snafu@0.8.3, snafu-derive@0.8.3, static_assertions@1.1.0, strum@0.26.3, strum_macros@0.26.4, syn@2.0.68, thiserror@1.0.61, thiserror-impl@1.0.61, twox-hash@1.6.3, unicode-ident@1.0.12, unicode-width@0.1.13, version_check@0.9.4, wasi@0.11.0+wasi-snapshot-preview1, wasm-bindgen@0.2.92, wasm-bindgen-backend@0.2.92, wasm-bindgen-macro@0.2.92, wasm-bindgen-macro-support@0.2.92, wasm-bindgen-shared@0.2.92, windows-core@0.52.0, windows-targets@0.52.5, windows_aarch64_gnullvm@0.52.5, windows_aarch64_msvc@0.52.5, windows_i686_gnu@0.52.5, windows_i686_gnullvm@0.52.5, windows_i686_msvc@0.52.5, windows_x86_64_gnu@0.52.5, windows_x86_64_gnullvm@0.52.5, windows_x86_64_msvc@0.52.5, zerocopy@0.7.34, zstd@0.12.4, zstd-safe@6.0.6, zstd-sys@2.0.9+zstd.1.5.5
Unicode-DFS-2016 (1): unicode-ident@1.0.12
Unlicense (5): aho-corasick@1.1.3, byteorder@1.5.0, csv@1.3.0, csv-core@0.1.11, memchr@2.7.4
Zlib (1): miniz_oxide@0.7.4

what's the estimated amount of work to spilt the repository to remove the datafusion integration part?

I believe it should be simple since it's just a mod of orc-rust. I'm willing to take on this part of the work.

waynexia · 2024-08-14T15:37:25Z

I'm not familiar with the ORC and Datafusion integration. From my perspective as a passerby, datafusion-orc seems more like Datafusion integration rather than focusing solely on the ORC format. I'm a bit worried that this might reduce the likelihood

I find a similar question about the relationship between arrow-rs and parquet-rs apache/arrow-rs#1715. I believe this repo was developed and maintained for the same purpose.

However if we are going to implement features that are not a strong demand from Datafusion side (like ORC writer apache/orc#1507) or integrate it with other consumers (like Databend databendlabs/databend#8016), having a dedicated repo would both reduce the maintenance burden of Datafusion and make the lib itself easier to use.

I agree with the opinion of separating this into two parts. The ORC format resides in a dedicated repo like apache/orc-rust with maintenance from both current contributors and the ORC community. And Datafusion uses it as a downstream user to implement ORC data source. I would like to help with both code work like splitting this code base and non-code work like IP clearance.

alamb · 2024-08-14T19:28:44Z

I agree with the opinion of separating this into two parts. The ORC format resides in a dedicated repo like apache/orc-rust with maintenance from both current contributors and the ORC community. And Datafusion uses it as a downstream user to implement ORC data source. I would like to help with both code work like splitting this code base and non-code work like IP clearance.

I agree with @waynexia and @progval that the following split makes a lot of sense to me

something like apache/orc-rs (similar to parquet in parquet-rs) that has no datafusion dependencies
this crate datafusion-contrib/datafusion-orc that has the DataFusion table provider and depeneds on apache/orc-rs as well as DataFusion and does the datafusion integration

alamb · 2024-08-14T19:31:56Z

Like the fact that parquet-rs has been maintained in arrow-rs for a long time.

FWIW I think this was partly an artifiact of history:

there was a time when the parquet PMC was largely focused on java
arrow needed a persistence format and parquet was an obvious choice
so parquet-cpp got made in the arrow repo
we basically followed the same pattern with arrow-rs / parquet-rs (in the arrow repo)

Given the current state of the code, I think it would be plausible to split parquet out of arrow-rs, but I also think unless there is some substantially larger group of maintainers that aren't also maintainers of arrow-rs it is likely easier to leave it there

wgtmac · 2024-08-15T02:09:35Z

cc @dongjoon-hyun @guiyanakuang @williamhyun @omalley from Apache ORC PMC

Xuanwo · 2024-08-15T02:49:33Z

Given the current state of the code, I think it would be plausible to split parquet out of arrow-rs, but I also think unless there is some substantially larger group of maintainers that aren't also maintainers of arrow-rs it is likely easier to leave it there

Agreed. I have thought about this before but haven't taken any action yet. I mean, it looks appealing to have apache/parquet-rs, but we need to consider the current project status.

I'm starting this thread because I believe it's beneficial for orc-rs to build a community by developing at upstream, but it doesn't seem applicable to parquet-rs at the moment.

mapleFU · 2024-08-15T02:54:22Z

Previously we discussed split parquet-cpp out of arrow-c++. However the dependency would be weird since there're:

arrow-dataset -> parquet-arrow -> parquet-core -> some arrow core libs

Xuanwo · 2024-08-15T03:01:24Z

arrow-dataset -> parquet-arrow -> parquet-core -> some arrow core libs

I believe the situation is different in parquet-rs since it depends on arrow-rs but not reverse. However, this is not the focus of our discussion. We can start another thread for this if interested.

Xuanwo · 2024-08-15T07:03:58Z

Thank you for the discussion. It looks like we can move forward! I think we can:

Set up datafusion-contrib/orc-rs first and split the ORC-related code into it (it's better to retain all the history).
Move all issues to datafusion-contrib/orc-rs.
Send the IP clearance to the ORC PMC.
Transfer datafusion-contrib/orc-rs to apache/orc-rs.

cc @alamb @waynexia @wgtmac for comments.

waynexia · 2024-08-15T07:09:19Z

I'll start preparing a PR to split the current repo.

Do you have something like guidance for IP clearance? I have attended it before but have not prepared one.

Xuanwo · 2024-08-15T07:15:29Z

I'll start preparing a PR to split the current repo.

Thanks!

Do you have something like guidance for IP clearance? I have attended it before but have not prepared one.

I think we can follow https://incubator.apache.org/ip-clearance/

Here's an example from apache/arrow-rs#2096. We can reach out to @alamb if we encounter any problems.

waynexia · 2024-08-15T10:05:35Z

Hi @progval @klangner, as part of the IP clearance process, could you please submit an ICLA (Individual Contributor Licence Agreement) following the follow the instructions at https://www.apache.org/licenses/contributor-agreements.html if you do not already have one on file? Thanks in advance for helping with this! If you already have filed one, please let me know the email address associated with your account.

Jefffrey · 2024-08-15T10:23:03Z

I would like to chime in my thoughts. I do apologize for being inactive, and have been meaning to pickup the work I left off on this repository (specifically the basic write functionality).

The way I see it, the primary focus of this repository is to serve as an integration with DataFusion to allow querying ORC files. Naturally this required first implementing a layer to read ORC files to Arrow, before then being able to integrate into DataFusion itself (similar to how there is parquet-rs, then the actual parquet integration code in DataFusion).

I can see the merit to splitting up this repository, but perhaps still be too early to do so? One benefit of having both the integration with Arrow and integration with DataFusion in a single repository is that it allows easier development, as these interfaces will be interacting with each other. Splitting across different repositories might make it harder to experiment with the interface for each respective integration, which can slow down development.

Furthermore, I don't think there were any immediate plans to develop a native ORC interface; that is, being able to read ORC in Rust without reading it to Arrow (similar to how parquet-rs has a low level column reader/writer API). From my point of view then, it might seem odd to donate a primarily Arrow <-> ORC interface library to ORC.

klangner · 2024-08-15T10:24:46Z

I think I have already signed it some time ago while doing some other work.

Xuanwo · 2024-08-15T10:42:32Z

I would like to chime in my thoughts. I do apologize for being inactive, and have been meaning to pickup the work I left off on this repository (specifically the basic write functionality).

Thank you very much for your contribution!

I can see the merit to splitting up this repository, but perhaps still be too early to do so?

From my perspective (as a committer on some Apache projects), it's already late for us to do so.

Developing at upstream can create a solid foundation for our entire community to build upon, making it easier for those interested in using ORC in Rust to find this project. Additionally, we can garner more support from the ORC community. Building a strong community is the key to our success. For example, we started iceberg-rust as a very basic project that could only read tables, but it has now grown to 53 contributors with full catalog support. By donating this to ORC, I expect to build a community around it, similar to what we've done with iceberg-rust.

Therefore, instead of waiting for our project to mature and gain full support, I prefer to start and attract more people to join now. I believe it's fine for us to use the existing orc -> arrow code base as a starting point.

Furthermore, I don't think there were any immediate plans to develop a native ORC interface;

I agree, but it depends on the community's feature requests. I would be happy to work with the community if someone wants to collaborate on this.

alamb · 2024-08-15T10:49:20Z

Yes -- I think one potential benefit to splitting out orc-rs would be that others who are not using it in the context of DataFusion might be more willing to help with the development.

I do not know how likely that is at this point, though

Xuanwo · 2024-08-15T11:02:41Z

Yes -- I think one potential benefit to splitting out orc-rs would be that others who are not using it in the context of DataFusion might be more willing to help with the development.

I have three such cases on my tables:

I (of course!) want to build both orc-rs and datafusion-orc seperately.
@youngsofun from databend wants native ORC support but not datafusion.
@wgtmac from the Apache ORC PMC is interested in a Rust ORC implementation.

Aitozi · 2024-08-15T11:09:15Z

Yes -- I think one potential benefit to splitting out orc-rs would be that others who are not using it in the context of DataFusion might be more willing to help with the development.

I have three such cases on my tables:

I (of course!) want to build both orc-rs and datafusion-orc seperately.

@youngsofun from databend wants native ORC support but not datafusion.

@wgtmac from the Apache ORC PMC is interested in a Rust ORC implementation.

From paimon-rust may also need a native ORC support but not datafusion

Xuanwo · 2024-08-15T11:23:47Z

From paimon-rust may also need a native ORC support but not datafusion

This case is interesting since paimon-rust will need datafusion but not require orc with datafusion. Paimon requires orc to read data but provides datafusion integration on its own.

XuQianJin-Stars · 2024-08-15T12:17:48Z

Is there still the parquet-rust project?

alamb · 2024-08-15T13:33:49Z

Is there still the parquet-rust project?

I do not know what parquet-rust refers to

https://parquet.apache.org/docs/contribution-guidelines/sub-projects/ has a list of open source rust implementations

parquet-rs refers to https://github.com/apache/arrow-rs/tree/master/parquet

Xuanwo · 2024-08-16T11:55:17Z

Hi, @alamb. This reminds me that we should establish the CLA for all projects in the datafusion-contrib organization. All contributors should agree that contributions to projects under datafusion-contrib will grant the license to the ASF. Please correct me if this isn't meant for datafusion-contrib.

alamb · 2024-08-16T15:21:10Z

Hi, @alamb. This reminds me that we should establish the CLA for all projects in the datafusion-contrib organization. All contributors should agree that contributions to projects under datafusion-contrib will grant the license to the ASF. Please correct me if this isn't meant for datafusion-contrib.

I think the idea with datafusion-contrib is to minimize process overhead (such as apache CLAs) and mostly serve as a very disparate set of crates. As they mature, we can then apply more process (as we are doing in this case)

The rationale is that many of the crates in datafusion-contrib will likely never get to the stage where they would be donated to the Apache foundation and thus any up-front cost to prepare for that is wasted effort (and thus reduces contributions)

Xuanwo · 2024-08-16T15:30:53Z

The rationale is that many of the crates in datafusion-contrib will likely never get to the stage where they would be donated to the Apache foundation and thus any up-front cost to prepare for that is wasted effort (and thus reduces contributions)

Understood, thank you. This design makes sense to me.

Jefffrey · 2024-08-19T10:58:11Z

I see mention of not needing the DataFusion integration code as motivation, but could this be addressed by splitting the current project to have two subcrates, one for pure Arrow-ORC and the other for DataFusion integration?

I wanted to do this initially but kept DataFusion as a feature to make it easier to develop with, especially since the DataFusion integration code is currently quite small (though I guess the dependency footprint isn't 😅 )

Xuanwo · 2024-09-11T11:54:59Z

I believe I also need a CCLA from my previous employer, as well as the current one (since 2024-08-01). This may take a few weeks. I'll tell you when it's done.

Hi, @progval. Sorry for the interruption. I wanted to check if it works well.

progval · 2024-09-11T12:04:18Z

Hi, this project does not belong to your employer (please correct me if I'm wrong). This donation will be sent from datafusion-contrib to orc. I believe an ICLA is sufficient.

https://www.apache.org/licenses/contributor-agreements.html says CCLAs are for "For a corporation that assigns employees to work on an Apache project", and I was an employee assigned to work on the project.

Either way, I need my ex-employer's permission for the ICLA

Hi, @progval. Sorry for the interruption. I wanted to check if it works well.

My ex-employer's staff came back from summer vacation this week, and they are about to start processing my request. Current employer won't be an issue.

Sorry for the delay.

progval · 2024-10-22T13:06:52Z

I just submitted my ICLA, and a CCLA from my ex-employer who owns all my past contributions.

~~I also started the process of getting a CCLA from my current employer, and will abstain from contributing until I can get it.~~ this is now resolved

Xuanwo · 2024-10-22T14:38:24Z

I just submitted my ICLA, and a CCLA from my ex-employer who owns all my past contributions.

Wow, really great!

cc @waynexia, are you still interested in working on this? The action items are:

Set up datafusion-contrib/orc-rs first and split the ORC-related code into it (it's better to retain all the history).
Move all issues to datafusion-contrib/orc-rs.
Send the IP clearance to the ORC PMC.
Transfer datafusion-contrib/orc-rs to apache/orc-rs.

Please let know me if you need any hand.

Xuanwo · 2024-10-22T14:39:04Z

cc @alamb, would you like to help create datafusion-contrib/orc-rs first? Or we can just rename this repo to orc-rs?

waynexia · 2024-10-22T14:59:31Z

That's great news!

cc @waynexia, are you still interested in working on this?

I'm resuming the IP clearance procedure, and will update any future problems to this thread.

Edit: as well as the code split things

alamb · 2024-10-22T19:10:14Z

https://github.com/datafusion-contrib/orc-rs is setup with @Xuanwo and @waynexia as admins

waynexia · 2024-10-23T13:18:24Z

IP Clearance file is updated to https://svn.apache.org/repos/asf/incubator/public/trunk/content/ip-clearance/orc-rs.xml

waynexia · 2024-10-23T14:22:21Z

Cross-referencing the thread from dev@orc.a.o https://lists.apache.org/thread/l6b0hsq29rr6to96tqmjpxt2mwz4nzbc

Xuanwo · 2024-10-23T14:41:03Z

Hi @alamb, apologies for my mistake. The repository should be named orc-rust instead. orc-rs is a different rust crate that has been abandoned.

I'm going to rename it now, just FYI.

waynexia · 2024-10-24T11:56:00Z

Donation PR: datafusion-contrib/orc-rust#1

Xuanwo · 2024-10-24T13:39:14Z

Donation PR: datafusion-contrib/orc-rust#1

Merged! We can remove duplicate code in this repo and transfer issues to new repositories now.

waynexia · 2024-10-25T07:54:13Z

Great 🎉

There is a temporary branch datafusion-integration made by spliting the current repo. But I'm not sure which approach is prefered:

Send a PR to this repo's main branch to remove donated code, or
Make that branch the default new main of this repo

Context: from the above discussion this repo will only focus on DataFusion-ORC data source integration in the future.

Xuanwo · 2024-10-25T08:28:17Z

I personally feel that it's better to send a PR to remove donated code. Having a repo where the main branch is not main could be confusing.

Xuanwo · 2024-10-29T13:46:19Z

Hi, @waynexia, I believe we are ok to implement this change.

waynexia · 2024-10-30T06:53:42Z

Hi, @waynexia, I believe we are ok to implement this change.

Thanks for reminding 🙈 I'll file a PR to the current main to remove ORC implementation and use the released upstream instead tonight.

waynexia · 2024-10-30T14:51:47Z

Progress update: the entire process is almost done if I don't miss anything (code split, ip clearance, transferring issue & tag etc). One last remaining thing is waiting for the ORC PMC to accept https://github.com/datafusion-contrib/orc-rust

Xuanwo · 2024-10-30T15:08:27Z

Progress update: the entire process is almost done if I don't miss anything (code split, ip clearance, transferring issue & tag etc). One last remaining thing is waiting for the ORC PMC to accept datafusion-contrib/orc-rust

Thank you! cc @wgtmac, would you like to start a VOTE for this?

wgtmac · 2024-10-30T15:16:33Z

Thanks for the heads up! @Xuanwo

Could you please provide the list of committers that will join the Apache ORC PMC? I will include this in the vote as well.

Xuanwo · 2024-10-30T15:43:48Z

Thanks for the heads up! @Xuanwo

Could you please provide the list of committers that will join the Apache ORC PMC? I will include this in the vote as well.

I propose to have the top 5 contributors of this project.

Xuanwo mentioned this issue Aug 14, 2024

ORC-1180: Implement an ORC Rust writer apache/orc#1507

Open

Xuanwo mentioned this issue Aug 14, 2024

Xuanwo Weekly Plan since 2024-08-12 Xuanwo/Xuanwo#5

Closed

waynexia mentioned this issue Aug 15, 2024

chore: add license header #121

Merged

waynexia mentioned this issue Oct 24, 2024

Move ORC-rust into this repo datafusion-contrib/orc-rust#1

Merged

This was referenced Oct 24, 2024

Oct 21, 2024: This week in DataFusion apache/datafusion#13035

Closed

[RFC] Use libcramjam for compression codecs #136

Closed

alamb mentioned this issue Oct 29, 2024

Oct 28, 2024: This week in DataFusion apache/datafusion#13167

Closed

3 tasks

waynexia mentioned this issue Oct 30, 2024

chore: remove donated ORC format related parts #138

Merged

Jefffrey mentioned this issue Dec 31, 2024

State of the repository datafusion-contrib/orc-rust#38

Open

discuss: Move into the Apache ORC PMC and develop as apache/orc-rust #120

discuss: Move into the Apache ORC PMC and develop as apache/orc-rust #120

Comments

Xuanwo commented Aug 14, 2024 • edited Loading

waynexia commented Aug 14, 2024

progval commented Aug 14, 2024

waynexia commented Aug 14, 2024

Xuanwo commented Aug 14, 2024

Xuanwo commented Aug 14, 2024

wgtmac commented Aug 14, 2024

Xuanwo commented Aug 14, 2024 • edited Loading

waynexia commented Aug 14, 2024

alamb commented Aug 14, 2024 • edited Loading

alamb commented Aug 14, 2024

wgtmac commented Aug 15, 2024

Xuanwo commented Aug 15, 2024 • edited Loading

mapleFU commented Aug 15, 2024

Xuanwo commented Aug 15, 2024

Xuanwo commented Aug 15, 2024

waynexia commented Aug 15, 2024

Xuanwo commented Aug 15, 2024

waynexia commented Aug 15, 2024

Jefffrey commented Aug 15, 2024

klangner commented Aug 15, 2024

Xuanwo commented Aug 15, 2024 • edited Loading

alamb commented Aug 15, 2024

Xuanwo commented Aug 15, 2024

Aitozi commented Aug 15, 2024

Xuanwo commented Aug 15, 2024 • edited Loading

XuQianJin-Stars commented Aug 15, 2024

alamb commented Aug 15, 2024

Xuanwo commented Aug 16, 2024 • edited Loading

alamb commented Aug 16, 2024

Xuanwo commented Aug 16, 2024

Jefffrey commented Aug 19, 2024

Xuanwo commented Sep 11, 2024

progval commented Sep 11, 2024

progval commented Oct 22, 2024 • edited Loading

Xuanwo commented Oct 22, 2024

Xuanwo commented Oct 22, 2024

waynexia commented Oct 22, 2024 • edited Loading

alamb commented Oct 22, 2024

waynexia commented Oct 23, 2024

waynexia commented Oct 23, 2024

Xuanwo commented Oct 23, 2024 • edited Loading

waynexia commented Oct 24, 2024

Xuanwo commented Oct 24, 2024

waynexia commented Oct 25, 2024

Xuanwo commented Oct 25, 2024

Xuanwo commented Oct 29, 2024

waynexia commented Oct 30, 2024

waynexia commented Oct 30, 2024

Xuanwo commented Oct 30, 2024

wgtmac commented Oct 30, 2024

Xuanwo commented Oct 30, 2024

discuss: Move into the Apache ORC PMC and develop as `apache/orc-rust` #120

discuss: Move into the Apache ORC PMC and develop as `apache/orc-rust` #120

Xuanwo commented Aug 14, 2024 •

edited

Loading

Xuanwo commented Aug 14, 2024 •

edited

Loading

alamb commented Aug 14, 2024 •

edited

Loading

Xuanwo commented Aug 15, 2024 •

edited

Loading

Xuanwo commented Aug 15, 2024 •

edited

Loading

Xuanwo commented Aug 15, 2024 •

edited

Loading

Xuanwo commented Aug 16, 2024 •

edited

Loading

progval commented Oct 22, 2024 •

edited

Loading

waynexia commented Oct 22, 2024 •

edited

Loading

Xuanwo commented Oct 23, 2024 •

edited

Loading