Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static_partitioner + global_control triggers an unbounded memory leak #1403

Open
lennoxho opened this issue Jun 13, 2024 · 2 comments · May be fixed by #1404
Open

static_partitioner + global_control triggers an unbounded memory leak #1403

lennoxho opened this issue Jun 13, 2024 · 2 comments · May be fixed by #1404

Comments

@lennoxho
Copy link

lennoxho commented Jun 13, 2024

Description

We found using tbb::static_partitioner while a tbb::global_control is active causes steady and seemingly unbounded memory leaks as tasks were executed.

A minimal repro is attached below.

The issue goes away if we do any of the following

  • Replace tbb::static_partitioner with tbb::auto_partitioner
  • Remove the call to tbb::global_control
  • Wrap the tbb::parallel_for call with tbb::task_arena::execute

Reproduced with the following setup

  • GCC 13.1.0 & GCC 13.2.0
  • Ubuntu 20.04LTS native & Ubuntu 22.04 LTS on WSL 2
  • A range of x86-64 Intel/AMD workstation & server-grade hardware
  • oneTBB 2021.12.0 & the latest commit (fdf1fdb)

Minimal Repro

Makefile

TBB_DIR ?= /mnt/d/Users/LennoxHo/source/onetbb/oneTBB-2021.12.0-install

COMPILE_FLAGS := -g -O2 -Wall -Wextra -fPIC -isystem $(TBB_DIR)/include
LINK_FLAGS := -Wl,-rpath,$(TBB_DIR)/lib -L$(TBB_DIR)/lib -ltbb

TEST_NAME := tbb-leak-test

all : test
.PHONY : all test clean

$(TEST_NAME) : tbb-leak-test.o
	g++ $(COMPILE_FLAGS) $^ -o $@ $(LINK_FLAGS)

%.o : %.cpp
	g++ -c $(COMPILE_FLAGS) $< -o $@

clean:
	rm -f *.o
	rm -f tbb-leak-test

test: $(TEST_NAME)
	./$(TEST_NAME)

tbb-leak-test.cpp

#include <cassert>
#include <cstdio>
#include <sys/resource.h>

#include <oneapi/tbb/global_control.h>
#include <oneapi/tbb/parallel_for.h>
#include <oneapi/tbb/partitioner.h>

void print_rss() {
    struct rusage ru;
    int result = getrusage(RUSAGE_SELF, &ru);
    assert(result == 0);

    printf("Max RSS = %ld kB\n", ru.ru_maxrss);
}

void busy_work(int x) {
    volatile int result = x * x;
    (void)result;
}

int main() {
    using tbb_partitioner = tbb::static_partitioner;

    constexpr int num_threads = 8;
    constexpr auto num_tasks_per_iter = 1'000ull;

    constexpr auto num_iterations = 1'000'000ull;
    constexpr auto num_rss_reporting_interval = 1'000ull;

    tbb::global_control gbl_ctrl{ tbb::global_control::max_allowed_parallelism, num_threads };

    fputs("Starting ", stdout);
    print_rss();

    for (auto i = 0ull; i < num_iterations; ++i) {
        tbb::parallel_for(0ull, num_tasks_per_iter, busy_work, tbb_partitioner{});

        if (i % num_rss_reporting_interval == 0) {
            print_rss();
        }
    }

    fputs("Ending ", stdout);
    print_rss();
}

Steps to reproduce:

  • Copy the attached Makefile & tbb-leak-test.cpp
  • make TBB_DIR=<path to oneTBB installation>
  • Observe the steady increase in memory usage (my runs end with Ending Max RSS = 3079648 kB)
@lennoxho
Copy link
Author

Here are my run logs
leak.txt

@pavelkumbrasev
Copy link
Contributor

Hi @lennoxho, I was able to reproduce described behavior.
TL;DR
It seems that task_arena usage is a correct way to work-around this problem. Meanwhile, we will think what to do with the leak if task_arena is not used.

So basically it happens because with global_control you limit concurrency that all the internal arenas can share between each other. Therefore, each particular arena doesn't not know its actual concurrency limit but only one you explicitly set during construction or for default arena (one that will be used for simple parallel_for call for example) the concurrency will be a whole machine.
When you start parallel_for with static_partitioner it will create as many internal proxy tasks as normal tasks (proxy tasks are used to assign tasks to specific threads). Proxy tasks have a property they should be executed twice. When execute is called for the proxy task for the first time it will return actual task that it propagated. When execute is called for the second time proxy task can be deleted.
In your case each time you call parallel_for with static_partitioner it will create hardware_concurrency proxy tasks but because global_control is present the concurrency of the default arena will not be fully satisfied and some of the proxy tasks will be called only once so they never be destroyed.
Perhaps, the solution will be to check min(arena::concurrency, global_control::limitation) during execution of static_partitioner.

@pavelkumbrasev pavelkumbrasev linked a pull request Jun 13, 2024 that will close this issue
13 tasks
@pavelkumbrasev pavelkumbrasev linked a pull request Jun 13, 2024 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants