Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

Unable to build #18

Closed
mmuru opened this issue Jan 15, 2022 · 8 comments
Closed

Unable to build #18

mmuru opened this issue Jan 15, 2022 · 8 comments

Comments

@mmuru
Copy link
Contributor

mmuru commented Jan 15, 2022

On MacOS and python 3.7

after clone the project, I tried to build it but failed with following

maturin develop
🍹 Building a mixed python/rust project
💥 maturin failed
  Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
  Caused by: `cargo metadata` exited with an error:     Updating crates.io index
error: failed to select a version for the requirement `datafusion = "=6.0.0"`
candidate versions found which didn't match: 5.0.0, 4.0.0, 3.0.0, ...
location searched: crates.io index

Please, let me know how to fix the build issue.

@houqp:
I noticed that README.md points the previous repo and it should be updated.

@mmuru
Copy link
Contributor Author

mmuru commented Jan 20, 2022

@houqp & @jimexist: Can you help me to unblock this build issue? Thanks.

@houqp
Copy link
Member

houqp commented Jan 20, 2022

Did you have a local path override for the datafusion dependency? version 6.x is available on creates.io: https://crates.io/crates/datafusion/versions.

Good catch on the readme and yes we should get that fixed. PRs welcome :)

@mmuru
Copy link
Contributor Author

mmuru commented Jan 20, 2022

@houqp: Actually, the issue was rustc version must be > 1.56.1. I had to upgrade rustc version to latest (1.58.1) and afterward s I was able to build datafusion-python package.
Sure, I will create a PR for documentation fix.

We noticed df.collect method performance is slow and I would like to discuss with you. Do you have an email I could reach out to you?

@houqp
Copy link
Member

houqp commented Jan 21, 2022

@mmuru for performance related issue, it's best if you can send a reproducible sample code to apache/arrow-datafusion repo and tag me so other people from the community can jump in to help as well.

@mmuru
Copy link
Contributor Author

mmuru commented Jan 21, 2022

@houqp: Thanks. It was related to python binding collect() from dataframe.rs., so thought I ask here but will post the issue in apache/arrow-datafusion. Ideally, we need it should return PyResult<Vec>.

fn collect(&self, py: Python) -> PyResult<Vec<PyObject>> {
        let batches = wait_for_future(py, self.df.collect())?;
        // cannot use PyResult<Vec<RecordBatch>> return type due to
        // https://github.com/PyO3/pyo3/issues/1813
        batches.into_iter().map(|rb| rb.to_pyarrow(py)).collect()
    }

@houqp
Copy link
Member

houqp commented Jan 22, 2022

@mmuru are you able to reproduce the performance issue using just Rust code?

@messense
Copy link

Actually, the issue was rustc version must be > 1.56.1. I had to upgrade rustc version to latest (1.58.1) and afterward s I was able to build datafusion-python package. Sure, I will create a PR for documentation fix.

The error message from maturin is quite confusing and needs improvement, I've opened PyO3/maturin#787 to track this.

@jimexist
Copy link
Contributor

since this isn't an issue with this crate, closing this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants