High memory usage after `collect()` despite using `limit(1)` #17714

MichalLebeda · 2024-07-18T21:47:53Z

Checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Reproducible example

[package]
name = "polars_high_mem_usage"
version = "0.1.0"
edition = "2021"

[dependencies]
memory-stats = "1.2.0"
polars = { version = "0.41.3" , features = ["lazy"]}

Warning the following code will allocate nearly 5GB.

use polars::df;
use polars::frame::DataFrame;
use polars::prelude::{col, IntoLazy};

fn main() {
    const SIZE_PRECISION: usize = 2;
    const LEN: usize = 100_000_000;
    const ITERS: usize = 10;

    let pre_df_alloc_mem = memory_stats::memory_stats().unwrap().physical_mem;

    let mut foo_vec: Vec<i64> = vec![0; LEN / 2];
    foo_vec.extend(vec![2; LEN / 2]);

    let df = df!(
        "foo" => foo_vec
    ).unwrap();

    let post_df_alloc_mem = memory_stats::memory_stats().unwrap().physical_mem;

    println!(
        "Base DataFrame allocated pre: {} post: {} delta: {}",
        format_size(pre_df_alloc_mem, SIZE_PRECISION),
        format_size(post_df_alloc_mem, SIZE_PRECISION),
        format_size(post_df_alloc_mem - pre_df_alloc_mem, SIZE_PRECISION)
    );

    let mut result_df = DataFrame::empty();
    let mut prev_iter_post_vstack_mem = 0;
    for i in 0..ITERS {
        let sub_result_df = df
            .clone()
            .lazy()
            .filter(col("foo").lt_eq(1).or(col("foo").gt(3)))
            .limit(1)
            .collect()
            .unwrap();

        let pre_vstack_phys_mem = memory_stats::memory_stats().unwrap().physical_mem;

        result_df.vstack_mut(&sub_result_df).unwrap();

        let post_vstack_phys_mem = memory_stats::memory_stats().unwrap().physical_mem;

        println!(
            "{:>3}: allocated pre_vstack: {:>9}    post_vstack: {:>9}    vstack_delta: {:>9}    iter_delta: {:>9}",
            i,
            format_size(pre_vstack_phys_mem, SIZE_PRECISION),
            format_size(post_vstack_phys_mem, SIZE_PRECISION),
            format_size(post_vstack_phys_mem - pre_vstack_phys_mem, SIZE_PRECISION),
            format_size(post_vstack_phys_mem - prev_iter_post_vstack_mem, SIZE_PRECISION)
        );

        prev_iter_post_vstack_mem = post_vstack_phys_mem;
    }
}

pub fn format_size(size_in_bytes: usize, precision: usize) -> String {
    let value;
    let unit_str;
    if size_in_bytes == 0 {
        value = 0.;
        unit_str = "B";
    } else {
        let mut exponent = size_in_bytes.ilog(1000);
        unit_str = match exponent {
            0 => "B",
            1 => "kB",
            2 => "MB",
            3 => "GB",
            4.. => {
                exponent = 4;
                "TB"
            }
        };
        value = size_in_bytes as f64 / 1000.0_f64.powi(exponent as i32);
    }

    format!("{:.*} {}", precision, value, unit_str)
}

Log output

Base DataFrame allocated pre: 4.76 MB post: 805.25 MB delta: 800.49 MB
  0: allocated pre_vstack:   1.21 GB    post_vstack:   1.21 GB    vstack_delta: 200.70 kB    iter_delta:   1.21 GB
  1: allocated pre_vstack:   1.61 GB    post_vstack:   1.61 GB    vstack_delta:  81.92 kB    iter_delta: 400.08 MB
  2: allocated pre_vstack:   2.01 GB    post_vstack:   2.01 GB    vstack_delta:  16.38 kB    iter_delta: 400.00 MB
  3: allocated pre_vstack:   2.41 GB    post_vstack:   2.41 GB    vstack_delta:  16.38 kB    iter_delta: 400.01 MB
  4: allocated pre_vstack:   2.81 GB    post_vstack:   2.81 GB    vstack_delta:  90.11 kB    iter_delta: 400.14 MB
  5: allocated pre_vstack:   3.21 GB    post_vstack:   3.21 GB    vstack_delta:  20.48 kB    iter_delta: 400.02 MB
  6: allocated pre_vstack:   3.61 GB    post_vstack:   3.61 GB    vstack_delta:  20.48 kB    iter_delta: 400.01 MB
  7: allocated pre_vstack:   4.01 GB    post_vstack:   4.01 GB    vstack_delta:  16.38 kB    iter_delta: 400.00 MB
  8: allocated pre_vstack:   4.41 GB    post_vstack:   4.41 GB    vstack_delta:  20.48 kB    iter_delta: 400.01 MB
  9: allocated pre_vstack:   4.81 GB    post_vstack:   4.81 GB    vstack_delta:  20.48 kB    iter_delta: 400.00 MB

Issue description

The following code seems to allocate a lot of memory for a data frame of shape (1,1).

let mut foo_vec: Vec<i64> = vec![0; LEN / 2];
foo_vec.extend(vec![2; LEN / 2]);

let df = df!(
    "foo" => foo_vec
).unwrap();

df.clone()
   .lazy()
   .filter(col("foo").lt_eq(1).or(col("foo").gt(3)))
   .limit(1)
   .collect()
   .unwrap();

The longer code snippet can be used for testing & ensures that the memory measurements are more reliable (to my knowledge, there is no way to measure exact amount of used memory)

Expected behavior

Allocate a small amount of memory for a data frame of size (1,1).

Installed versions

lazy

The text was updated successfully, but these errors were encountered:

ggggggggg · 2024-08-26T21:35:44Z

I think I'm seeing the same issue, here is my reproducible example in python showing that limit(0) does not reduce runtime of a call including .lazy().filter(expr).limit(0).

In this example I calculate an empty df 3 different ways.

.lazy().filter on an expr that will always evaluate to False that takes some time to calculate, then collect
same as 1 except I add .limit(0) before collecting. Here I expect it to take no time because limit(0) means there is no need to evaluate the expression even once.
Here I only limit(0) and do not filter.

I expect 2 and 3 to take the same amount of time (basically 0 s), and 1 to take a significant amount of time. Instead 1 and 2 take roughly the same amount of time (25 ms for me).

import polars as pl
import numpy as np
import time

a = np.arange(10000000).reshape((-1, 5000))
df = pl.from_numpy(a, schema={"a":pl.Array(pl.Int64, 5000)})

tstart1 = time.time()
df1_lazy = df.lazy().filter(pl.col("a").arr.median()<0)
df1 = df1_lazy.collect()
elapsed1 = time.time()-tstart1
tstart2 = time.time()
df2_lazy = df.lazy().filter(pl.col("a").arr.median()<0).limit(0)
df2 = df2_lazy.collect()
elapsed2 = time.time()-tstart2
tstart3 = time.time()
df3_lazy = df.lazy().limit(0)
df3 = df3_lazy.collect()
elapsed3 = time.time()-tstart3
print(elapsed1, elapsed2, elapsed3)

MichalLebeda added bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars labels Jul 18, 2024

engylemure mentioned this issue Nov 15, 2024

feat(rust): Allow specification of chunk_size on LazyCsvReader.read_options #19819

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High memory usage after `collect()` despite using `limit(1)` #17714

High memory usage after `collect()` despite using `limit(1)` #17714

MichalLebeda commented Jul 18, 2024

ggggggggg commented Aug 26, 2024

High memory usage after collect() despite using limit(1) #17714

High memory usage after collect() despite using limit(1) #17714

Comments

MichalLebeda commented Jul 18, 2024

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

ggggggggg commented Aug 26, 2024

High memory usage after `collect()` despite using `limit(1)` #17714

High memory usage after `collect()` despite using `limit(1)` #17714