Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
simicd committed Feb 21, 2023
1 parent 22e77a6 commit 8276725
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 19 deletions.
12 changes: 3 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ from having to lock the GIL when running those operations.
Its query engine, DataFusion, is written in [Rust](https://www.rust-lang.org/), which makes strong assumptions
about thread safety and lack of memory leaks.

There is also experimental support for executing SQL against other DataFrame libraries, such as Polars, Pandas, and any
There is also experimental support for executing SQL against other DataFrame libraries, such as Polars, Pandas, and any
drop-in replacements for Pandas.

Technically, zero-copy is achieved via the [c data interface](https://arrow.apache.org/docs/format/CDataInterface.html).
Expand Down Expand Up @@ -70,17 +70,11 @@ df = ctx.sql("select passenger_count, count(*) "
"group by passenger_count "
"order by passenger_count")

# collect as list of pyarrow.RecordBatch
results = df.collect()

# get first batch
batch = results[0]

# convert to Pandas
df = batch.to_pandas()
pandas_df = df.to_pandas()

# create a chart
fig = df.plot(kind="bar", title="Trip Count by Number of Passengers").get_figure()
fig = pandas_df.plot(kind="bar", title="Trip Count by Number of Passengers").get_figure()
fig.savefig('chart.png')
```

Expand Down
10 changes: 2 additions & 8 deletions examples/sql-to-pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,17 +33,11 @@
"order by passenger_count"
)

# collect as list of pyarrow.RecordBatch
results = df.collect()

# get first batch
batch = results[0]

# convert to Pandas
df = batch.to_pandas()
pandas_df = df.to_pandas()

# create a chart
fig = df.plot(
fig = pandas_df.plot(
kind="bar", title="Trip Count by Number of Passengers"
).get_figure()
fig.savefig("chart.png")
4 changes: 2 additions & 2 deletions src/dataframe.rs
Original file line number Diff line number Diff line change
Expand Up @@ -313,8 +313,8 @@ impl PyDataFrame {
Ok(())
}

// Convert to pandas dataframe with pyarrow
// Collect the batches, pass to Arrow Table & then convert to Pandas DataFrame
/// Convert to pandas dataframe with pyarrow
/// Collect the batches, pass to Arrow Table & then convert to Pandas DataFrame
fn to_pandas(&self, py: Python) -> PyResult<PyObject> {
let batches = self.collect(py);

Expand Down

0 comments on commit 8276725

Please sign in to comment.