Feature: PySpark integration #3774

BjarkeTornager · 2024-07-07T09:42:24Z

API

Python

Description

Have you considered making an integration between Kùzu and PySpark?

Neo4j, as an example, has a Neo4j connector for Apache Spark.

Spark also has a community project called GraphFrames that can be used for basic graph algorithms.

Since Spark is widely used for analytical queries, machine learning, and streaming it could be useful to move between the two.

prrao87 · 2024-07-08T18:05:18Z

Hi @BjarkeTornager, this is something that could be on the roadmap but not yet been prioritized as we typically wait for several upvotes from the community to decide how much to prioritize new integrations. There are numerous other integrations already underway for our 0.5.0 release and beyond, so hope you can understand. In the meantime, we are also releasing a basic graph algorithms package soon that can provide some of the functionality that GraphFrames does, so stay tuned!

BjarkeTornager · 2024-07-08T20:24:18Z

Thanks @prrao87, looking forward to the Kùzu basic graph algorithm package!

abhiwattpad · 2024-07-24T22:55:26Z

It would be have to have spark integration with kuzu, especially for large scale data ingestion!

prrao87 · 2024-08-12T17:46:09Z

Just adding some scope for initial functionality here: The proposed integration would behave just like the Pandas/Polars DataFrame integration does:

Scan data from PySpark DataFrame into a Kùzu node/rel table
Export the results of a Cypher query to a Spark DataFrame

Unlike Pandas/Polars, the I/O and related tasks may not be fully in-memory - we'd need to see how the persistent formats under the hood of Spark work, and also how to design the API to expose the connector to the Python client of Kùzu.

lucifermorningstar1305 · 2024-10-15T22:33:50Z

While dealing with large scale data it's best if there is a way to integrate kuzu with spark dataframe. Something like what Neo4j has. This way anyone can upload batches of data to Kuzu without writing extensive code.

BjarkeTornager added the feature New features or missing components of existing features label Jul 7, 2024

prrao87 pinned this issue Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: PySpark integration #3774

Feature: PySpark integration #3774

BjarkeTornager commented Jul 7, 2024

prrao87 commented Jul 8, 2024

BjarkeTornager commented Jul 8, 2024

abhiwattpad commented Jul 24, 2024

prrao87 commented Aug 12, 2024 •

edited

Loading

lucifermorningstar1305 commented Oct 15, 2024

Feature: PySpark integration #3774

Feature: PySpark integration #3774

Comments

BjarkeTornager commented Jul 7, 2024

API

Description

prrao87 commented Jul 8, 2024

BjarkeTornager commented Jul 8, 2024

abhiwattpad commented Jul 24, 2024

prrao87 commented Aug 12, 2024 • edited Loading

lucifermorningstar1305 commented Oct 15, 2024

prrao87 commented Aug 12, 2024 •

edited

Loading