Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improved array arithmetic support #19837

Merged
merged 8 commits into from
Nov 18, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Nov 18, 2024

  • Adds support for performing arithmetic between numeric and array columns. The numeric value is broadcasted horizontally to every array element within each row, similar to list arithmetic.
  • Broadcasting for arithmetic between 2 array columns is now done without materializing the input columns (although we still materialize for the input validity)
  • Adds support flooring division for array columns
  • Fixes an issue with incorrect outer validities when adding arrays (Incorrect outer validity for array arithmetic result #19838)
shape: (6, 8)
┌────────────────────┬──────┬────────────────────┬────────────────────┬────────────────────┬────────────────────┬──────┬────────────────────┐
│ apa * pa * first(p)       ┆ first(a) * pa // pqa + q              │
│ ------------------------                │
│ array[i64, (1, 2)] ┆ i64array[i64, (1, 2)] ┆ array[i64, (1, 2)] ┆ array[i64, (1, 2)] ┆ array[i64, (1, 2)] ┆ i64array[i64, (1, 2)] │
╞════════════════════╪══════╪════════════════════╪════════════════════╪════════════════════╪════════════════════╪══════╪════════════════════╡
│ [[1, 1]]           ┆ 1    ┆ [[1, 1]]           ┆ [[1, 1]]           ┆ [[1, 1]]           ┆ [[1, 1]]           ┆ null ┆ [[null, null]]     │
│ [[2, 2]]           ┆ null ┆ [[null, null]]     ┆ [[2, 2]]           ┆ [[null, null]]     ┆ [[null, null]]     ┆ null ┆ [[null, null]]     │
│ [[3, 3]]           ┆ 3    ┆ [[9, 9]]           ┆ [[3, 3]]           ┆ [[3, 3]]           ┆ [[1, 1]]           ┆ null ┆ [[null, null]]     │
│ [[null, null]]     ┆ 4    ┆ [[null, null]]     ┆ [[null, null]]     ┆ [[4, 4]]           ┆ [[null, null]]     ┆ null ┆ [[null, null]]     │
│ [null]             ┆ 5    ┆ [null]             ┆ [null]             ┆ [[5, 5]]           ┆ [null]             ┆ null ┆ [null]             │
│ null6nullnull               ┆ [[6, 6]]           ┆ nullnullnull               │
└────────────────────┴──────┴────────────────────┴────────────────────┴────────────────────┴────────────────────┴──────┴────────────────────┘
Table code
df = (
    pl.select(
        pl.lit(
            pl.Series(
                [[[1, 1]], [[2, 2]], [[3, 3]], [[None, None]], [None], None],
                dtype=pl.Array(pl.Array(pl.Int64, 2), 1),
            )
        ).alias("a"),
    )
    .with_columns(
        pl.int_range(1, 1 + pl.len())
        .pipe(lambda e: pl.when(e != 2).then(e))
        .alias("p"),
    )
    .with_columns(
        (pl.col("a") * pl.col("p")).alias("a * p"),
        (pl.col("a") * pl.first("p")).alias("a * first(p)"),
        (pl.first("a") * pl.col("p")).alias("first(a) * p"),
        (pl.col("a") // pl.col("p")).alias("a // p"),
    )
    .with_columns(
        pl.lit(None, dtype=pl.Int64).alias("q"),
    )
    .with_columns(
        (pl.col("a") + pl.col("q")).alias("a + q"),
    )
)

print(df)

Fixes #19356
Fixes #19838

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Nov 18, 2024
@@ -0,0 +1,820 @@
use polars_error::{feature_gated, PolarsResult};
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This file has mostly the same structure as the one for list arithmetic, but with some array specific adjustments.


match (&self.op_apply_type, &self.broadcast) {
// Mostly the same as ListNumericOp, however with fixed size list we also have
// (BinaryOpApplyType::ListToPrimitive, Broadcast::Left) as a physical impl.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays have a simpler layout compared to lists, so we can also implement array<->primitive with the array side being broadcasted without materializing.

}
}

#[derive(Debug, Clone)]
pub enum NumericListOp {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factored out this and some other parts of list arithmetic for re-use

// We materialize the list columns with `new_from_index`, as otherwise we'd have to
// implement logic that broadcasts the offsets and validities across multiple levels
// of nesting. But we will re-use the materialized memory to store the result.
(BinaryOpApplyType::ListToList, Broadcast::Left) => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive-by - changed the order of this enum matching to the same order of the comment (expand the diff above to see the comment)

@@ -383,16 +383,3 @@ def test_zero_width_array(fn: str) -> None:

df = pl.concat([a.to_frame(), b.to_frame()], how="horizontal")
df.select(c=expr_f(pl.col.a, pl.col.b))


def test_elementwise_arithmetic_19682() -> None:
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to arithmetic/test_array.py below

)
@pytest.mark.parametrize("exec_op", EXEC_OP_COMBINATIONS)
@pytest.mark.slow
def test_array_arithmetic_values(
Copy link
Collaborator Author

@nameexhaustion nameexhaustion Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same parametrized test copied from list arithmetic. It has also extra parametrization to test operations between nested array types (array_side="both3").

Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 93.07590% with 52 lines in your changes missing coverage. Please review.

Project coverage is 79.37%. Comparing base (6ccb187) to head (6814121).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...lars-core/src/series/arithmetic/fixed_size_list.rs 92.49% 41 Missing ⚠️
...polars-core/src/series/arithmetic/list_borrowed.rs 92.00% 6 Missing ⚠️
...es/polars-core/src/series/arithmetic/list_utils.rs 95.29% 4 Missing ⚠️
crates/polars-plan/src/plans/aexpr/schema.rs 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #19837      +/-   ##
==========================================
+ Coverage   79.33%   79.37%   +0.04%     
==========================================
  Files        1548     1550       +2     
  Lines      214245   214751     +506     
  Branches     2460     2460              
==========================================
+ Hits       169968   170464     +496     
- Misses      43719    43728       +9     
- Partials      558      559       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46 ritchie46 merged commit 402c15e into pola-rs:main Nov 18, 2024
28 checks passed
@nameexhaustion nameexhaustion deleted the array-arith branch November 18, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect outer validity for array arithmetic result Array broadcasting support
2 participants