Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release logio storage for extents on file unlink #656

Merged
merged 1 commit into from
Jul 30, 2021

Conversation

MichaelBrim
Copy link
Collaborator

Description

Each server is responsible for releasing the local extents for the target file. For extents that have not yet been synced to the local server, the client must release the storage.

Also includes a bug fix for extents that happened to span the shmem and spill portions of the log. The bug was revealed by the new library API test that makes sure we can properly reclaim storage for deleted files (t/api/storage-reuse.c).

Motivation and Context

See issue #197

How Has This Been Tested?

Tested in Ubuntu Docker

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Testing (addition of new tests or update to current tests)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the UnifyFS code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted.

Each server is responsible for releasing the local extents for the
target file. For extents that have not yet been synced to the local
server, the client must release the storage.

Also includes a bug fix for extents that happened to span the
shmem and spill portions of the log. The bug was revealed by the
new library API test that makes sure we can properly reclaim
storage for deleted files (t/api/storage-reuse.c).
@MichaelBrim MichaelBrim marked this pull request as ready for review July 29, 2021 20:33
@adammoody
Copy link
Collaborator

adammoody commented Jul 29, 2021

Nice work, @MichaelBrim ! That was fast!

There are are few more aspects that I think we'll need to handle. But this is already a big improvement, so we could do these other items as future work.

Since the slot map is shared for read/write between the server and client, we'll want to lock it to control concurrent access. The client might write to a file and allocate new slots at the same time the server is releasing slots due to an unlink. Actually, based on your code here, I'm guessing adding locking might be pretty straight-forward.

I think overwriting a file region will leak slots. The server will only hold the most recent extents for all regions. I think we currently forget about (drop) extents for the old data. One fix would be to free old slots as they are overwritten, though that sounds like it would slow down our writes, especially since those old extents could be remote. Alternatively, we could accumulate a list of old extents on the server and delete those on unlink.

Case 1:

rank 0:
write(start=0, len=10)
fsync()
MPI_Barrier()

rank 1:
MPI_Barrier()
write(start=0, len=20)
fsync() <-- server will drop extent from rank 0 write, so rank 0 slot is leaked

I think there is a similar issue on the client if it happens to overwrite a region before it has synced its extents with the server.

Case 2:

write(start=0, len=10)
write(start=0, len=20)
fsync() <-- client only sends second extent to server on fsync, slot from first write is leaked

There will be metadata on the clients about the unlinked file that we probably need to clean up or invalidate. In particular, the metadata cache and the client extent / segment trees. That's less to do with the data storage, but it's one more thing that comes to mind.

@adammoody adammoody merged commit fc7cb5b into LLNL:dev Jul 30, 2021
@adammoody
Copy link
Collaborator

Thanks again, @MichaelBrim !

@MichaelBrim MichaelBrim deleted the unlink-data branch July 30, 2021 14:11
@MichaelBrim MichaelBrim mentioned this pull request Aug 4, 2021
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants