Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix lib.name in Cargo.toml #694

Closed
wants to merge 1 commit into from

Conversation

samuelcolvin
Copy link

With the current settings maturin build and pip install . gave me an error when I tried to import datafusion:

Traceback (most recent call last):
  File "/Users/samuel/code/datafusion-python/demo.py", line 1, in <module>
    from datafusion import SessionContext
  File "/Users/samuel/code/datafusion-python/datafusion/__init__.py", line 28, in <module>
    from ._internal import (
ModuleNotFoundError: No module named 'datafusion._internal'

@Michael-J-Ward
Copy link
Contributor

Note for reproducing: make sure to remove the .*so file that maturin develop places next to ./datafusion/__init__.py


@samuelcolvin, I assume this is an instance of this issue that the maturin docs warns about:

This structure is recommended to avoid a common ImportError pitfall

The docs recommend nesting the python source as python/datafusion/, and the only references I find about lib.name recommend keeping it the same.

To create a mixed Rust/Python project, add a directory with your package name (i.e. matching lib.name in your Cargo.toml) to contain the Python source:

Most importantly, my first attempt at using your fix did not appear to fix it (see below)

Do you have any thoughts on this solution vs the recommended one? (I'm still learning about maturin).


maturin build
⚠  Warning: You specified maturin >=0.15, <0.16 in pyproject.toml under `build-system.requires`, but the current maturin version is 1.4.0
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.8
🐍 Not using a specific python interpreter
📡 Using build options features, locked from pyproject.toml
   Compiling pyo3-build-config v0.20.2
   Compiling pyo3-ffi v0.20.2
   Compiling pyo3 v0.20.2
   Compiling datafusion-python v37.1.0 (/home/mike/workspace/datafusion-python/flake)
   Compiling arrow v51.0.0
   Compiling datafusion-common v37.1.0
   Compiling datafusion-expr v37.1.0
   Compiling datafusion-execution v37.1.0
   Compiling datafusion-sql v37.1.0
   Compiling datafusion-physical-expr v37.1.0
   Compiling datafusion-functions v37.1.0
   Compiling datafusion-optimizer v37.1.0
   Compiling datafusion-physical-plan v37.1.0
   Compiling datafusion-functions-array v37.1.0
   Compiling datafusion v37.1.0
   Compiling datafusion-substrait v37.1.0
    Finished dev [unoptimized + debuginfo] target(s) in 1m 02s
⚠  Warning: No compatible platform tag found, using the linux tag instead. You won't be able to upload those wheels to PyPI.
📦 Built wheel for abi3 Python ≥ 3.8 to /home/mike/workspace/datafusion-python/flake/target/wheels/datafusion-37.1.0-cp38-abi3-linux_x86_64.whl

flake on  flake [$!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄  impure (nix-shell-env) on ☁  (us-east-1) took 1m27s pip install .
Processing /home/mike/workspace/datafusion-python/flake
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: pyarrow>=11.0.0 in /nix/store/llp7z339ix0lm7nlrndn1yvdmhskcsyk-rust-toolchain/lib/python3.11/site-packages (from datafusion==37.1.0) (15.0.0)
Requirement already satisfied: numpy<2,>=1.16.6 in /nix/store/llp7z339ix0lm7nlrndn1yvdmhskcsyk-rust-toolchain/lib/python3.11/site-packages (from pyarrow>=11.0.0->datafusion==37.1.0) (1.26.4)
Building wheels for collected packages: datafusion
  Building wheel for datafusion (pyproject.toml) ... done
  Created wheel for datafusion: filename=datafusion-37.1.0-cp38-abi3-linux_x86_64.whl size=17875320 sha256=b417c0d9b4662a0f687359a2e751580d4a2fc0714a02b447cfe6deff11138413
  Stored in directory: /tmp/nix-shell.EXRUlg/pip-ephem-wheel-cache-tbyz75qv/wheels/8e/1b/87/2a5f750e961aa47403189d62f5e1236e0819f25e0a9068babf
Successfully built datafusion
Installing collected packages: datafusion
Successfully installed datafusion-37.1.0

flake on  flake [$!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄  impure (nix-shell-env) on ☁  (us-east-1) took 9m50s python
Python 3.11.9 (main, Apr  2 2024, 08:25:04) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datafusion
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mike/workspace/datafusion-python/flake/datafusion/__init__.py", line 28, in <module>
    from ._internal import (
ModuleNotFoundError: No module named 'datafusion._internal'

w/ a git diff of

git diff
diff --git a/Cargo.toml b/Cargo.toml
index 9da36d7..5bb1f83 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -60,7 +60,7 @@ url = "2.2"
 pyo3-build-config = "0.20.0"
 
 [lib]
-name = "datafusion_python"
+name = "_internal"
 crate-type = ["cdylib", "rlib"]
 
 [profile.release]
diff --git a/src/lib.rs b/src/lib.rs
index a696ebf..8ce1b16 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -72,6 +72,7 @@ pub(crate) struct TokioRuntime(tokio::runtime::Runtime);
 /// The higher-level public API is defined in pure python files under the
 /// datafusion directory.
 #[pymodule]
+#[pyo3(name="_internal")]
 fn _internal(py: Python, m: &PyModule) -> PyResult<()> {
     // Register the Tokio Runtime as a module attribute so we can reuse it
     m.add(

@@ -60,7 +60,7 @@ url = "2.2"
pyo3-build-config = "0.20.0"

[lib]
name = "datafusion_python"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have any impact on downstream maturin projects that use this crate as a dependency? If they also use the name _internal, will there be a conflict?

Should we replace all references to _internal with datafusion_internal instead?

@Michael-J-Ward
Copy link
Contributor

Michael-J-Ward commented May 14, 2024

Following the recommended solution of nesting python/datafusion and adding tool.maturin.python-source = "python" does seem to work for me.


git diff pyproject.toml 
diff --git a/pyproject.toml b/pyproject.toml
index d353605..24bd296 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -55,6 +55,7 @@ repository = "https://github.com/apache/arrow-datafusion-python"
 profile = "black"
 
 [tool.maturin]
+python-source = "python"
 module-name = "datafusion._internal"
 include = [
     { path = "Cargo.lock", format = "sdist" }

flake on  flake [$✘!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄  impure (nix-shell-env) on ☁  (us-east-1) ls -l python
total 4
drwxr-xr-x 5 mike users 4096 May 14 08:07 datafusion

flake on  flake [$✘!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄  impure (nix-shell-env) on ☁  (us-east-1) maturin build
⚠  Warning: You specified maturin >=0.15, <0.16 in pyproject.toml under `build-system.requires`, but the current maturin version is 1.4.0
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.8
🐍 Not using a specific python interpreter
📡 Using build options features, locked from pyproject.toml
    Finished dev [unoptimized + debuginfo] target(s) in 0.17s
⚠  Warning: No compatible platform tag found, using the linux tag instead. You won't be able to upload those wheels to PyPI.
📦 Built wheel for abi3 Python ≥ 3.8 to /home/mike/workspace/datafusion-python/flake/target/wheels/datafusion-37.1.0-cp38-abi3-linux_x86_64.whl

flake on  flake [$✘!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄  impure (nix-shell-env) on ☁  (us-east-1) took 20s pip install .
Processing /home/mike/workspace/datafusion-python/flake
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: pyarrow>=11.0.0 in /nix/store/llp7z339ix0lm7nlrndn1yvdmhskcsyk-rust-toolchain/lib/python3.11/site-packages (from datafusion==37.1.0) (15.0.0)
Requirement already satisfied: numpy<2,>=1.16.6 in /nix/store/llp7z339ix0lm7nlrndn1yvdmhskcsyk-rust-toolchain/lib/python3.11/site-packages (from pyarrow>=11.0.0->datafusion==37.1.0) (1.26.4)
Building wheels for collected packages: datafusion
  Building wheel for datafusion (pyproject.toml) ... done
  Created wheel for datafusion: filename=datafusion-37.1.0-cp38-abi3-linux_x86_64.whl size=17875125 sha256=921aca268be398c639695a5b851e14b3a19185ce39f2336d736a5f5705841b01
  Stored in directory: /tmp/nix-shell.e0IZLi/pip-ephem-wheel-cache-lzlmoc66/wheels/8e/1b/87/2a5f750e961aa47403189d62f5e1236e0819f25e0a9068babf
Successfully built datafusion
Installing collected packages: datafusion
  Attempting uninstall: datafusion
    Found existing installation: datafusion 37.1.0
    Uninstalling datafusion-37.1.0:
      Successfully uninstalled datafusion-37.1.0
Successfully installed datafusion-37.1.0

flake on  flake [$✘!?⇕] is 📦 v37.1.0 via 🐍 v3.11.9 (venv) via 🦀 v1.77.1 via ❄  impure (nix-shell-env) on ☁  (us-east-1) took 4s python 
Python 3.11.9 (main, Apr  2 2024, 08:25:04) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datafusion
>>> datafusion.__version__
'37.1.0'

@samuelcolvin
Copy link
Author

@davidhewitt, you're the maturin expert - can you tell us what to do.

@davidhewitt
Copy link

Yes I suggest moving python sources into a subdirectory as @Michael-J-Ward suggests - the ImportError pitfall described in the maturin docs is a common one and the subdirectory is my preferred workaround (we did the same in pydantic-core).

@samuelcolvin
Copy link
Author

makes sense. I think this started working because I had run maturin develop before making this change.

@davidhewitt
Copy link

In particular the problem is that maturin develop (or pip install -e .) will use the checkout's Python sources, but pip install . will put a copy into site-packages, which then conflicts with the checkout sources if you're running Python from the repository root. (You'll end up importing from the local sources rather than the installed contents in site-packages.)

Michael-J-Ward added a commit to Michael-J-Ward/datafusion-python that referenced this pull request May 14, 2024
… projects

The previous layout leads to an import error when installing with `maturin build` and `pip install .`.

This error was common enough that `maturin` changed the recommended project layout to what this commit does.

A prior PR attempted to solve this by altering `lib.name` in Cargo.toml, but that did not work for me.

- [Prior PR](apache#694)
- [maturin ImportError issue](PyO3/maturin#490)
- [maturin changes recommended project structure](PyO3/maturin#855)
@Michael-J-Ward
Copy link
Contributor

Gracias for the report @samuelcolvin and the knowledge @davidhewitt.

Superseding PR in #695

andygrove pushed a commit that referenced this pull request May 14, 2024
#695)

* chore: update to maturin's recommended project layout for rust/python projects

The previous layout leads to an import error when installing with `maturin build` and `pip install .`.

This error was common enough that `maturin` changed the recommended project layout to what this commit does.

A prior PR attempted to solve this by altering `lib.name` in Cargo.toml, but that did not work for me.

- [Prior PR](#694)
- [maturin ImportError issue](PyO3/maturin#490)
- [maturin changes recommended project structure](PyO3/maturin#855)

* ci: update `ruff check` for nested python directory
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants