Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage after collect() despite using limit(1) #17714

Open
2 tasks done
MichalLebeda opened this issue Jul 18, 2024 · 1 comment
Open
2 tasks done

High memory usage after collect() despite using limit(1) #17714

MichalLebeda opened this issue Jul 18, 2024 · 1 comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars

Comments

@MichalLebeda
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

[package]
name = "polars_high_mem_usage"
version = "0.1.0"
edition = "2021"

[dependencies]
memory-stats = "1.2.0"
polars = { version = "0.41.3" , features = ["lazy"]}

Warning the following code will allocate nearly 5GB.

use polars::df;
use polars::frame::DataFrame;
use polars::prelude::{col, IntoLazy};

fn main() {
    const SIZE_PRECISION: usize = 2;
    const LEN: usize = 100_000_000;
    const ITERS: usize = 10;

    let pre_df_alloc_mem = memory_stats::memory_stats().unwrap().physical_mem;

    let mut foo_vec: Vec<i64> = vec![0; LEN / 2];
    foo_vec.extend(vec![2; LEN / 2]);

    let df = df!(
        "foo" => foo_vec
    ).unwrap();

    let post_df_alloc_mem = memory_stats::memory_stats().unwrap().physical_mem;

    println!(
        "Base DataFrame allocated pre: {} post: {} delta: {}",
        format_size(pre_df_alloc_mem, SIZE_PRECISION),
        format_size(post_df_alloc_mem, SIZE_PRECISION),
        format_size(post_df_alloc_mem - pre_df_alloc_mem, SIZE_PRECISION)
    );

    let mut result_df = DataFrame::empty();
    let mut prev_iter_post_vstack_mem = 0;
    for i in 0..ITERS {
        let sub_result_df = df
            .clone()
            .lazy()
            .filter(col("foo").lt_eq(1).or(col("foo").gt(3)))
            .limit(1)
            .collect()
            .unwrap();

        let pre_vstack_phys_mem = memory_stats::memory_stats().unwrap().physical_mem;

        result_df.vstack_mut(&sub_result_df).unwrap();

        let post_vstack_phys_mem = memory_stats::memory_stats().unwrap().physical_mem;

        println!(
            "{:>3}: allocated pre_vstack: {:>9}    post_vstack: {:>9}    vstack_delta: {:>9}    iter_delta: {:>9}",
            i,
            format_size(pre_vstack_phys_mem, SIZE_PRECISION),
            format_size(post_vstack_phys_mem, SIZE_PRECISION),
            format_size(post_vstack_phys_mem - pre_vstack_phys_mem, SIZE_PRECISION),
            format_size(post_vstack_phys_mem - prev_iter_post_vstack_mem, SIZE_PRECISION)
        );

        prev_iter_post_vstack_mem = post_vstack_phys_mem;
    }
}

pub fn format_size(size_in_bytes: usize, precision: usize) -> String {
    let value;
    let unit_str;
    if size_in_bytes == 0 {
        value = 0.;
        unit_str = "B";
    } else {
        let mut exponent = size_in_bytes.ilog(1000);
        unit_str = match exponent {
            0 => "B",
            1 => "kB",
            2 => "MB",
            3 => "GB",
            4.. => {
                exponent = 4;
                "TB"
            }
        };
        value = size_in_bytes as f64 / 1000.0_f64.powi(exponent as i32);
    }

    format!("{:.*} {}", precision, value, unit_str)
}

Log output

Base DataFrame allocated pre: 4.76 MB post: 805.25 MB delta: 800.49 MB
  0: allocated pre_vstack:   1.21 GB    post_vstack:   1.21 GB    vstack_delta: 200.70 kB    iter_delta:   1.21 GB
  1: allocated pre_vstack:   1.61 GB    post_vstack:   1.61 GB    vstack_delta:  81.92 kB    iter_delta: 400.08 MB
  2: allocated pre_vstack:   2.01 GB    post_vstack:   2.01 GB    vstack_delta:  16.38 kB    iter_delta: 400.00 MB
  3: allocated pre_vstack:   2.41 GB    post_vstack:   2.41 GB    vstack_delta:  16.38 kB    iter_delta: 400.01 MB
  4: allocated pre_vstack:   2.81 GB    post_vstack:   2.81 GB    vstack_delta:  90.11 kB    iter_delta: 400.14 MB
  5: allocated pre_vstack:   3.21 GB    post_vstack:   3.21 GB    vstack_delta:  20.48 kB    iter_delta: 400.02 MB
  6: allocated pre_vstack:   3.61 GB    post_vstack:   3.61 GB    vstack_delta:  20.48 kB    iter_delta: 400.01 MB
  7: allocated pre_vstack:   4.01 GB    post_vstack:   4.01 GB    vstack_delta:  16.38 kB    iter_delta: 400.00 MB
  8: allocated pre_vstack:   4.41 GB    post_vstack:   4.41 GB    vstack_delta:  20.48 kB    iter_delta: 400.01 MB
  9: allocated pre_vstack:   4.81 GB    post_vstack:   4.81 GB    vstack_delta:  20.48 kB    iter_delta: 400.00 MB

Issue description

The following code seems to allocate a lot of memory for a data frame of shape (1,1).

let mut foo_vec: Vec<i64> = vec![0; LEN / 2];
foo_vec.extend(vec![2; LEN / 2]);

let df = df!(
    "foo" => foo_vec
).unwrap();

df.clone()
   .lazy()
   .filter(col("foo").lt_eq(1).or(col("foo").gt(3)))
   .limit(1)
   .collect()
   .unwrap();

The longer code snippet can be used for testing & ensures that the memory measurements are more reliable (to my knowledge, there is no way to measure exact amount of used memory)

Expected behavior

Allocate a small amount of memory for a data frame of size (1,1).

Installed versions

lazy

@MichalLebeda MichalLebeda added bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars labels Jul 18, 2024
@ggggggggg
Copy link

I think I'm seeing the same issue, here is my reproducible example in python showing that limit(0) does not reduce runtime of a call including .lazy().filter(expr).limit(0).

In this example I calculate an empty df 3 different ways.

  1. .lazy().filter on an expr that will always evaluate to False that takes some time to calculate, then collect
  2. same as 1 except I add .limit(0) before collecting. Here I expect it to take no time because limit(0) means there is no need to evaluate the expression even once.
  3. Here I only limit(0) and do not filter.

I expect 2 and 3 to take the same amount of time (basically 0 s), and 1 to take a significant amount of time. Instead 1 and 2 take roughly the same amount of time (25 ms for me).

import polars as pl
import numpy as np
import time

a = np.arange(10000000).reshape((-1, 5000))
df = pl.from_numpy(a, schema={"a":pl.Array(pl.Int64, 5000)})

tstart1 = time.time()
df1_lazy = df.lazy().filter(pl.col("a").arr.median()<0)
df1 = df1_lazy.collect()
elapsed1 = time.time()-tstart1
tstart2 = time.time()
df2_lazy = df.lazy().filter(pl.col("a").arr.median()<0).limit(0)
df2 = df2_lazy.collect()
elapsed2 = time.time()-tstart2
tstart3 = time.time()
df3_lazy = df.lazy().limit(0)
df3 = df3_lazy.collect()
elapsed3 = time.time()-tstart3
print(elapsed1, elapsed2, elapsed3)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars
Projects
None yet
Development

No branches or pull requests

2 participants