Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniformMemoryPool does not re-initialize chunks that are freed #530

Closed
brian-kelley opened this issue Dec 13, 2019 · 2 comments
Closed

UniformMemoryPool does not re-initialize chunks that are freed #530

brian-kelley opened this issue Dec 13, 2019 · 2 comments

Comments

@brian-kelley
Copy link
Contributor

brian-kelley commented Dec 13, 2019

When a UniformMemoryPool is constructed, the entire arena (all chunks) are (optionally) initialized with a value "initialized_value". But this value is not saved in the class, and if a chunk is used and then freed, it is never re-initialized. Effectively, there is no guarantee and no O(1) way to check if a chunk returned by allocate_chunk() is actually initialized or not, which seems to defeat the purpose of an initialized memory pool and seems very likely to cause nondeterministic bugs due to sometimes-initialized memory.

For an example of where this is used, the SPGEMM non-compressed symbolic case (StructureC_NC GPUTag functor, line 609 of src/sparse/impl/KokkosSparse_spgemm_impl_symbolic.hpp) is aware of this problem - it re-initializes hash_begins (the only part of the chunk that needs to be initialized) before freeing it:

    if (is_global_alloced){
      num_elements += used_hash_sizes[1];

      //now thread leaves the memory as it finds. so there is no need to initialize the hash begins
      nnz_lno_t dirty_hashes = globally_used_hash_count[0];
      Kokkos::parallel_for(
          Kokkos::ThreadVectorRange(teamMember, dirty_hashes),
          [&] (nnz_lno_t i) {
        nnz_lno_t dirty_hash = globally_used_hash_indices[i];
        hm2.hash_begins[dirty_hash] = -1;
      });


      Kokkos::single(Kokkos::PerThread(teamMember),[&] () {
        m_space.release_chunk(globally_used_hash_indices);
      });

But this whole memory pool is eagerly initialized to -1 when it is constructed, even though only part of each chunk needs to be initialized.

It would be slow to lazily initialize chunks when allocated, or eagerly when freed because allocate_chunk is normally called inside a Kokkos::single(). So the initialization couldn't be done in parallel because not all threads/vector lanes are running it. allocate_chunk/free_chunk would need to be aware of the team_member and who is sharing the chunk (a team or a thread).

The best way to solve this (which would also slightly improve SPGEMM performance) would be to never initialize the chunks, and let the caller do that however it needs. If we go this route, we should just use the Kokkos memory pool instead.

@srajama1
Copy link
Contributor

@brian-kelley : Lot of subtle issues here.

First the pattern we want:
Initialize memory pool
Someone gets a chunk and uses it
They release it with values reset to original

This is a common pattern in several algorithms. If someone is releasing memory with arbitrary values, then it could lead it problems. I agree users don't know what it is initialized to as it is not in the class. So far we have been the only user, so we know what it is.

Second, Kokkos memory pool was device level. At least it was.

If you think unitialized is the best we can consider that.

@brian-kelley
Copy link
Contributor Author

@srajama1 I'm just going to close this, since I know now that:

  • chunk initialization isn't causing the spgemm bug
  • algorithms using it do correctly initialize what needs to be initialized before releasing chunks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants