-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized Defragmentation #439
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main issue is with the masking of slots while it should not be done. Provided comments.
src/Paprika/Data/SlottedArray.cs
Outdated
|
||
// Ignore all the even entries from the vector as the preamble data is contained within | ||
// the first 2 bytes (ushort) out of the 4 bytes slot. | ||
slotData = Avx2.BlendVariable(slotData, Vector256<ushort>.Zero, Vector256.Create((ushort)0b_1010_1010_1010_1010)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? What does it do? Isn't it a bug? There's BitwiseAnd
underneath so it should remove what is not needed? The comment states that the slot is 4 bytes long, but after #376 this is no longer the case in a sense that the preambles and hashes are grouped into vector sizes, see: https://github.com/NethermindEth/Paprika/blob/main/docs/design.md#slottedarray-layout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I earlier assumed it to be 2 bytes and recently added this mask. I think I got misled by the total size parameter within Slot:
public const int TotalSize = Size + sizeof(ushort);
Just realized that it mentions in the comment that it's total size of hash and slot together, missed that completely. Maybe I will rename the parameter to TotalSizeWithHash
for better clarity? From the current name TotalSize and Size look exactly similar 😅
src/Paprika/Data/SlottedArray.cs
Outdated
|
||
var preambleData = Vector256.BitwiseAnd(slotData, preambleMask); | ||
|
||
if (Vector256.EqualsAny(preambleData, deletedMask)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd benchmark it. There was the case that xoring and asserting whether equals zero was a bit faster
src/Paprika/Data/SlottedArray.cs
Outdated
if (Vector256.EqualsAny(preambleData, deletedMask)) | ||
{ | ||
// Some slots are deleted in this batch, process individually | ||
for (var j = 0; j < SlotsPerVector; j++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop could be replaced. What we could do is to extract most significant bits after equality and move in larger chunks like we do in TryFind
. Also, it copies slot by slot with no considerations for continuous deleted chunks?
src/Paprika/Data/SlottedArray.cs
Outdated
// No deleted slots in this batch, copy the whole batch | ||
if (i != writeTo) | ||
{ | ||
CopyBatch(i, writeTo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so we do batch copy but only if we find the whole batch valid. If there's a single delete we'll go one by one then?
src/Paprika/Data/SlottedArray.cs
Outdated
} | ||
|
||
// Helper function to copy a batch of slots | ||
private void CopyBatch(int readFrom, int writeTo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a long method. I'd benchmark it heavily.
One more general comment. Have you considered @dipkakwani splitting the search for deletions from the actual compaction of the array? Something that would mimic |
@Scooletz Yes that is an interesting point, I did think in this direction - but I wasn't sure if there is any advantage since eventually the elements have to be moved during defragmentation. The process to move once the deleted mask is formed would require moving all the elements between two deleted items in one shot; the number of these elements could be smaller, equal or longer than the vector size. I am not sure if there if there is any advantage with this move operation vs directly reading and moving vectors whenever possible. |
Eventually you need to move elements, but if you had a set that marks entries that are alive with 1s and 0s where they are dead, you could potentially move in as big chunks as possible for both parts: moving |
The existing de-fragmentation algorithm in SlottedArray scans all the slots sequentially and defragments the data structure by moving non-deleted slots together. This PR re-writes the algorithm with a new vectorized implementation.