batch task memory leak #7696

xxchan · 2023-02-03T21:44:28Z

Describe the bug

I ran the following script:

import psycopg2

conn = psycopg2.connect("dbname=dev user=root port=4566 host=localhost")
cur = conn.cursor()
cur.execute(
    "drop table if exists t ;create table t (x int);"
)

for i in range(1000000):
    cur.execute(f"INSERT INTO t values ({i});")

And observed memory usage keep increasing. It seems the memory used by batch task is not freed. Is this normal?

To Reproduce

No response

Expected behavior

No response

Additional context

Find together with #7694, but they seem to be different issues?

BugenZhao · 2023-02-06T08:30:17Z

If it's caused by the spawned task not being freed, then it's reasonable to happen along with #7694. Since the task is not freed, then

the variables moved to the async block is not freed, so we get the backtrace of this issue;
the guard for memory stats is not freed, so we get the backtrace of memory leak in allocation_stat_for_batch #7694.

xxchan · 2023-02-06T18:56:39Z

Here's another test and I'm not sure how to interpret it 🥵 😇

Inserted 5000000 rows in the first hill. After a while the memory usage droped to 480M. Then inserted another 5000000 rows. This time the memory usage didn't drop. Then inserted again -- OOM

I tested using the following script:

create table t (id int primary key,uid int,v1 int,v2 float,s1 varchar,s2 varchar,update_time timestamp);

(c1,c2) = (0, 5000000) for the first time and (5000000, 10000000) for the second time.

import psycopg2
import random
import datetime

class InsertValue(object):
    def __init__(self):
        self.conn = psycopg2.connect(
            database="dev", user="root", host="127.0.0.1", port=4566
        )

    def parse(self, c1, c2):
        cursor = self.conn.cursor()
        for step in range(c1, c2, 10):
            li = []
            for j in range(step, step + 10):
                v2 = random.uniform(1, 1000)
                update_time = datetime.datetime.now()
                vv = """({},{},{},{},'test1','test2','{}')""".format(
                    j, j, j, v2, update_time
                )
                li.append(vv)
            v = ",".join(li)
            sql = """insert into t values {};""".format(v)
            cursor.execute(sql)
        self.conn.commit()
        cursor.close()

    def run(self, c1, c2):
        self.parse(c1, c2)
        self.conn.close()


if __name__ == "__main__":
    iv = InsertValue()
    iv.run(0, 5000000)

fuyufjh · 2023-02-07T07:53:16Z

@xxchan Can you help to run a heap profile? https://github.com/risingwavelabs/risingwave/blob/main/docs/memory-profiling.md

xx01cyx · 2023-02-07T07:53:22Z

This is caused by BatchTaskExecution not being freed after query execution. Each batch task is stored in a hashmap in BatchManager and never gets removed.
https://github.com/risingwavelabs/risingwave/blob/main/src/batch/src/task/task_manager.rs#L96-L97

fuyufjh · 2023-02-07T07:55:03Z

We seem to have fixed a similar problem 🤔 cc. @BowenXiao1999 @liurenjie1024

xx01cyx · 2023-02-07T07:57:26Z

We seem to have fixed a similar problem 🤔 cc. @BowenXiao1999 @liurenjie1024

IIRC that's query execution, which increases memory usage in frontend: #5827

BugenZhao · 2023-02-07T08:08:30Z

Seems another WeakHashMap-like issue... 🤔

BowenXiao1999 · 2023-02-07T08:21:44Z

Yes there will be some hashmap's reserve memory problem.

We might need a background corountine to do the clean-up. WeakHashMap or naive way may not help, as the task's upstream may keep reading from it (To be specific, get the receiver), and we do not know in what time we can delete the task (can not delete it immediately once the task is finished, as upstream may not started, so it's possible the task output channel is dropped), so we might need some status flag.

liurenjie1024 · 2023-02-07T10:35:33Z

Yes, this is an known issue for distributed mode, and we need a fix later.

xxchan · 2023-02-08T08:48:39Z

BTW, should we have a test for such problems (memory leak after a lot of batch tasks). e.g., in longevity test?

liurenjie1024 · 2023-02-08T11:28:17Z

BTW, should we have a test for such problems (memory leak after a lot of batch tasks). e.g., in longevity test?

Good idea, but currently test team has no resources for longevity test for distributed query, so we can postpone it.

liurenjie1024 · 2023-02-27T03:27:18Z

Fixed.

xxchan added the type/bug Something isn't working label Feb 3, 2023

github-actions bot added this to the release-0.1.17 milestone Feb 3, 2023

TennyZhuang added the priority/high label Feb 4, 2023

TennyZhuang assigned st1page Feb 4, 2023

fuyufjh mentioned this issue Feb 7, 2023

Tracking: Critical Performance & Stability Issues #6640

Open

65 tasks

xx01cyx mentioned this issue Feb 7, 2023

bug: Clean up BatchTaskExecution after the end of a batch task #7735

Closed

st1page assigned liurenjie1024 and unassigned st1page Feb 7, 2023

liurenjie1024 closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch task memory leak #7696

batch task memory leak #7696

xxchan commented Feb 3, 2023 •

edited by TennyZhuang

Loading

BugenZhao commented Feb 6, 2023 •

edited

Loading

xxchan commented Feb 6, 2023

fuyufjh commented Feb 7, 2023

xx01cyx commented Feb 7, 2023

fuyufjh commented Feb 7, 2023

xx01cyx commented Feb 7, 2023

BugenZhao commented Feb 7, 2023

BowenXiao1999 commented Feb 7, 2023 •

edited

Loading

liurenjie1024 commented Feb 7, 2023

xxchan commented Feb 8, 2023

liurenjie1024 commented Feb 8, 2023

liurenjie1024 commented Feb 27, 2023

batch task memory leak #7696

batch task memory leak #7696

Comments

xxchan commented Feb 3, 2023 • edited by TennyZhuang Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

BugenZhao commented Feb 6, 2023 • edited Loading

xxchan commented Feb 6, 2023

fuyufjh commented Feb 7, 2023

xx01cyx commented Feb 7, 2023

fuyufjh commented Feb 7, 2023

xx01cyx commented Feb 7, 2023

BugenZhao commented Feb 7, 2023

BowenXiao1999 commented Feb 7, 2023 • edited Loading

liurenjie1024 commented Feb 7, 2023

xxchan commented Feb 8, 2023

liurenjie1024 commented Feb 8, 2023

liurenjie1024 commented Feb 27, 2023

xxchan commented Feb 3, 2023 •

edited by TennyZhuang

Loading

BugenZhao commented Feb 6, 2023 •

edited

Loading

BowenXiao1999 commented Feb 7, 2023 •

edited

Loading