Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAN leaks memory #13171

Closed
hppritcha opened this issue Mar 31, 2025 · 5 comments
Closed

HAN leaks memory #13171

hppritcha opened this issue Mar 31, 2025 · 5 comments
Assignees

Comments

@hppritcha
Copy link
Member

There are various cases where HAN leaks memory. At my site, users are complaining about memory leaks with MPI window creation but the underlying problem has to do with releasing of resources retained other components , which is not being done correctly.

Patch coming eminently.

@hppritcha hppritcha self-assigned this Mar 31, 2025
hppritcha added a commit to hppritcha/ompi that referenced this issue Apr 1, 2025
Related to open-mpi#13171

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@hppritcha
Copy link
Member Author

valgrind also found memory leaks associated with other compnents including btl and ucx osc.

hppritcha added a commit to hppritcha/ompi that referenced this issue Apr 1, 2025
related to open-mpi#13171

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@hppritcha
Copy link
Member Author

I noticed many more memory leaks in HAN on the 5.0.x branch owing to its retaining the previous coll module(s) and not under all cases later releasing them. The result was a memory leak per iteration of the MPI_Win_create/MPI_Win_free call in the test case.

There was a big refactor of HAN and other collective components in main that resulted in these per iteration memory leaks going away. However the refactoring was pretty significant so my recommendations for those using 5.0.x releases is to turn off the han collective component if they are observing significant leaks with MPI_WIn_create/free operations.

@bosilca
Copy link
Member

bosilca commented Apr 1, 2025

If the communicators are correctly cleaned there should be no memory leaks due to the collective internal module use. However, if the communicators are not correctly freed by the user and we rely on the MPI_Finalize cleanup, bad things can happen.

@hppritcha
Copy link
Member Author

i think the should in the previous comment is the most important word. The user test is shown below. With 1000 iters of win_create/free valgrind reported that about 2 MB of memory was being leaked when using 5.0.7 release and HAN was allowed to be used. It was due to the fact that HAN was adding multiple references to TUNED modules so they weren't getting freed properly in MPI_Win_free's communicator destructor step. Disabling HAN causes the memory leak to vanish.

As I note above, there was a lot of restructuring of HAN in main and now the create/free cycle no longer shows a memory leak when HAN is used.

#include <iostream>
#include <mpi.h>
#include <stdio.h>
#include <vector>

int main(int argc, char** argv) {
    // Initialize the MPI environment
    int niters = 1000;
    MPI_Init(NULL, NULL);

    if (argc > 1) {
        niters = atoi(argv[1]);
    }

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);
    // Junk data for the mpi window
    std::vector<double> data(world_size);
    size_t N_create_free = (size_t)niters;
    //size_t N_create_free = 20000;
    for(size_t i=0; i<N_create_free; i++){
    if(i % 100 == 0 && world_rank == 0) {
      std::cout << "completed... " << i << std::endl;
    }
    MPI_Win win;
      std::cerr << "CALLING WINDOW CREATE" << std::endl;
    MPI_Win_create(data.data(), data.size() * sizeof(double), sizeof(double),
                   MPI_INFO_NULL, MPI_COMM_WORLD, &win);
      std::cerr << "CALLED WINDOW CREATE" << std::endl;

    auto errorcode = MPI_Win_fence((MPI_MODE_NOSTORE | MPI_MODE_NOSUCCEED), win);
    if (!(errorcode == MPI_SUCCESS))
      std::cout << "ERROR: MPI ERROR WAS DETECTED" << std::endl;
    MPI_Win_free(&win);
    }
    // Print off a hello world message
    printf("Hello world from processor %s, rank %d out of %d processors\n",
           processor_name, world_rank, world_size);

    // Finalize the MPI environment.
    MPI_Finalize();
}

hppritcha added a commit to hppritcha/ompi that referenced this issue Apr 1, 2025
related to open-mpi#13171

Signed-off-by: Howard Pritchard <howardp@lanl.gov>
@hppritcha
Copy link
Member Author

resolved via #13172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants