UniformMemoryPool does not re-initialize chunks that are freed #530

brian-kelley · 2019-12-13T20:02:42Z

When a UniformMemoryPool is constructed, the entire arena (all chunks) are (optionally) initialized with a value "initialized_value". But this value is not saved in the class, and if a chunk is used and then freed, it is never re-initialized. Effectively, there is no guarantee and no O(1) way to check if a chunk returned by allocate_chunk() is actually initialized or not, which seems to defeat the purpose of an initialized memory pool and seems very likely to cause nondeterministic bugs due to sometimes-initialized memory.

For an example of where this is used, the SPGEMM non-compressed symbolic case (StructureC_NC GPUTag functor, line 609 of src/sparse/impl/KokkosSparse_spgemm_impl_symbolic.hpp) is aware of this problem - it re-initializes hash_begins (the only part of the chunk that needs to be initialized) before freeing it:

    if (is_global_alloced){
      num_elements += used_hash_sizes[1];

      //now thread leaves the memory as it finds. so there is no need to initialize the hash begins
      nnz_lno_t dirty_hashes = globally_used_hash_count[0];
      Kokkos::parallel_for(
          Kokkos::ThreadVectorRange(teamMember, dirty_hashes),
          [&] (nnz_lno_t i) {
        nnz_lno_t dirty_hash = globally_used_hash_indices[i];
        hm2.hash_begins[dirty_hash] = -1;
      });


      Kokkos::single(Kokkos::PerThread(teamMember),[&] () {
        m_space.release_chunk(globally_used_hash_indices);
      });

But this whole memory pool is eagerly initialized to -1 when it is constructed, even though only part of each chunk needs to be initialized.

It would be slow to lazily initialize chunks when allocated, or eagerly when freed because allocate_chunk is normally called inside a Kokkos::single(). So the initialization couldn't be done in parallel because not all threads/vector lanes are running it. allocate_chunk/free_chunk would need to be aware of the team_member and who is sharing the chunk (a team or a thread).

The best way to solve this (which would also slightly improve SPGEMM performance) would be to never initialize the chunks, and let the caller do that however it needs. If we go this route, we should just use the Kokkos memory pool instead.

The text was updated successfully, but these errors were encountered:

srajama1 · 2019-12-13T20:40:48Z

@brian-kelley : Lot of subtle issues here.

First the pattern we want:
Initialize memory pool
Someone gets a chunk and uses it
They release it with values reset to original

This is a common pattern in several algorithms. If someone is releasing memory with arbitrary values, then it could lead it problems. I agree users don't know what it is initialized to as it is not in the class. So far we have been the only user, so we know what it is.

Second, Kokkos memory pool was device level. At least it was.

If you think unitialized is the best we can consider that.

brian-kelley · 2019-12-18T18:27:47Z

@srajama1 I'm just going to close this, since I know now that:

chunk initialization isn't causing the spgemm bug
algorithms using it do correctly initialize what needs to be initialized before releasing chunks

brian-kelley closed this as completed Dec 18, 2019

ndellingwood mentioned this issue Jan 28, 2020

Kokkos + KokkosKernels Promotion To 2.9.99 trilinos/Trilinos#6671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UniformMemoryPool does not re-initialize chunks that are freed #530

UniformMemoryPool does not re-initialize chunks that are freed #530

brian-kelley commented Dec 13, 2019 •

edited

Loading

srajama1 commented Dec 13, 2019

brian-kelley commented Dec 18, 2019

UniformMemoryPool does not re-initialize chunks that are freed #530

UniformMemoryPool does not re-initialize chunks that are freed #530

Comments

brian-kelley commented Dec 13, 2019 • edited Loading

srajama1 commented Dec 13, 2019

brian-kelley commented Dec 18, 2019

brian-kelley commented Dec 13, 2019 •

edited

Loading