-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-10109: [Rust] Add support to the C data interface for primitive…
… types and utf8 This PR is a proposal to add support to the [C data interface](https://arrow.apache.org/docs/format/CDataInterface.html) by implementing the necessary functionality to both consume and produce structs with its ABI and lifetime rules. This is for now limited to primitive types and strings (utf8), but it is easily generalized for all types whose data is encapsulated in `ArrayData` (things with buffers and child data). Some design choices: * import and export does not care about the type of the data that is in memory (previously `BufferData`, now `Bytes`) - it only cares about how they should be converted from and to `ArrayData` to the C data interface. * import wraps incoming pointers on a struct behind an `Arc`, so that we thread-safely refcount them and can share them between buffers, arrays, etc. * `export` places `Buffer`s in `private_data` for bookkeeping and release them when the consumer releases it via `release`. I do not expect this PR to be easy to review, as it is touching sensitive (aka `unsafe`) code. However, based on the tests I did so far, I am sufficiently happy to PR it. This PR has three main parts: 1. Addition of an `ffi` module that contains the import and export functionality 2. Add some helpers to import and export an Array from C Data Interface 3. A crate to test this against Python/C++'s API It also does a small refactor of `BufferData`, renaming it to `Bytes` (motivated by the popular `bytes` crate), and moving it to a separate file. What is tested: * round-trip `Python -> Rust -> Python` (new separate crate, `arrow-c-integration`) * round-trip `Rust -> Python -> Rust` (new separate crate, `arrow-c-integration`) * round-trip `Rust -> Rust -> Rust` * memory allocation counts Finally, this PR has a large contribution of @pitrou , that took _a lot_ of his time to explain to me how the C++ was doing it and the main things that I had to worry about here. Closes #8401 from jorgecarleitao/arrow-c-inte Authored-by: Jorge C. Leitao <jorgecarleitao@gmail.com> Signed-off-by: Jorge C. Leitao <jorgecarleitao@gmail.com>
- Loading branch information
1 parent
63144ad
commit 1d2b4a5
Showing
24 changed files
with
1,531 additions
and
130 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
#!/usr/bin/env bash | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
set -ex | ||
|
||
arrow_dir=${1} | ||
source_dir=${1}/rust | ||
build_dir=${2}/rust | ||
rust=${3} | ||
|
||
export ARROW_TEST_DATA=${arrow_dir}/testing/data | ||
export PARQUET_TEST_DATA=${arrow_dir}/cpp/submodules/parquet-testing/data | ||
export CARGO_TARGET_DIR=${build_dir} | ||
|
||
pushd ${source_dir}/arrow-pyarrow-integration-testing | ||
|
||
#rustup default ${rust} | ||
#rustup component add rustfmt --toolchain ${rust}-x86_64-unknown-linux-gnu | ||
python3 -m venv venv | ||
venv/bin/pip install maturin==0.8.2 toml==0.10.1 pyarrow==1.0.0 | ||
|
||
source venv/bin/activate | ||
maturin develop | ||
python -m unittest discover tests | ||
|
||
popd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
[target.x86_64-apple-darwin] | ||
rustflags = [ | ||
"-C", "link-arg=-undefined", | ||
"-C", "link-arg=dynamic_lookup", | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
__pycache__ | ||
venv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
[package] | ||
name = "arrow-pyarrow-integration-testing" | ||
description = "" | ||
version = "3.0.0-SNAPSHOT" | ||
homepage = "https://github.com/apache/arrow" | ||
repository = "https://github.com/apache/arrow" | ||
authors = ["Apache Arrow <dev@arrow.apache.org>"] | ||
license = "Apache-2.0" | ||
keywords = [ "arrow" ] | ||
edition = "2018" | ||
|
||
[lib] | ||
name = "arrow_pyarrow_integration_testing" | ||
crate-type = ["cdylib"] | ||
|
||
[dependencies] | ||
arrow = { path = "../arrow", version = "3.0.0-SNAPSHOT" } | ||
pyo3 = { version = "0.12.1", features = ["extension-module"] } | ||
|
||
[package.metadata.maturin] | ||
requires-dist = ["pyarrow>=1"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Arrow c integration | ||
|
||
This is a Rust crate that tests compatibility between Rust's Arrow implementation and PyArrow. | ||
|
||
Note that this crate uses two languages and an external ABI: | ||
* `Rust` | ||
* `Python` | ||
* C ABI privately exposed by `Pyarrow`. | ||
|
||
## Basic idea | ||
|
||
Pyarrow exposes a C ABI to convert arrow arrays from and to its C implementation, see [here](https://arrow.apache.org/docs/format/CDataInterface.html). | ||
|
||
This package uses the equivalent struct in Rust (`arrow::array::ArrowArray`), and verifies that | ||
we can use pyarrow's interface to move pointers from and to Rust. | ||
|
||
## Relevant literature | ||
|
||
* [Arrow's CDataInterface](https://arrow.apache.org/docs/format/CDataInterface.html) | ||
* [Rust's FFI](https://doc.rust-lang.org/nomicon/ffi.html) | ||
* [Pyarrow private binds](https://github.com/apache/arrow/blob/ae1d24efcc3f1ac2a876d8d9f544a34eb04ae874/python/pyarrow/array.pxi#L1226) | ||
* [PyO3](https://docs.rs/pyo3/0.12.1/pyo3/index.html) | ||
|
||
## How to develop | ||
|
||
```bash | ||
# prepare development environment (used to build wheel / install in development) | ||
python -m venv venv | ||
venv/bin/pip install maturin==0.8.2 toml==0.10.1 pyarrow==1.0.0 | ||
``` | ||
|
||
Whenever rust code changes (your changes or via git pull): | ||
|
||
```bash | ||
source venv/bin/activate | ||
maturin develop | ||
python -m unittest discover tests | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
[build-system] | ||
requires = ["maturin"] | ||
build-backend = "maturin" |
Oops, something went wrong.