-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented iterator for variable length structs. #164
Implemented iterator for variable length structs. #164
Conversation
Current IFasterScanIterator interface is not suited for variable length structs as GetNext would only return a fragment of key/value (up to sizeof(Key)/sizeof(Value)). To solve this issue two new methods were added ref GetKey()/ref GetValue() that return references to actual key/value references in hlog. There is disconnect in GenericScanIterator as it returns a reference to a copy of key/value and not direct reference to hlog. In case this is needed we could have flag to indicate whether we need to read from frame or hlog directly and store current page/offset.
Just found that this implementation (based on blittable allocator) does not correctly handle deleted records that are still in memory - when record is deleted in place its value is overwritten by if (entry.word == Interlocked.CompareExchange(ref bucket->bucket_entries[slot], updatedEntry.word, entry.word))
{
// Apply tombstone bit to the record
hlog.GetInfo(physicalAddress).Tombstone = true;
// Write default value
Value v = default(Value);
functions.ConcurrentWriter(ref hlog.GetKey(physicalAddress), ref v, ref hlog.GetValue(physicalAddress));
status = OperationStatus.SUCCESS;
goto LatchRelease; // Release shared latch (if acquired)
} We effectively lose value and its length so there is no way to find out the true length of a record. |
It seems that there are even more issues with variable length structs (when the records are still in memory)... There is optimization to do an inplace update of records for existing value, however the new value could be longer that previous thus resulting in overflow. |
Ideas I have right now:
|
Added a check for in-place modifications, allocator can override the method to check for available space, if there is not enough space then a new record will be created.
Updated PR with fix for enumerating deleted records (tombstones that overwrite value) and fix fo #167. Extended allocator to expose Added Note that I haven't checked if RMW is not broken (should not be for fixed length allocators, I am not using it in my case) |
Thanks for the PR. We will take a look at this in the coming week. :) |
Looking into comment in #167 - I think I will add an option to either always create copies or add capacity just after |
Should I was continuing to play around with this (my final goal is to have compaction) and found another issue (not sure if it is with iterator or checkpoints). |
There will be holes in the log for reasons other than delete as well. For example, if two threads allocate space for the same key, one of them might have to leave the hole and retry. The way we recommend to skip holes during scan in this case is to move forward one byte at a time and interpret as RecordInfo and check if it's valid. That's what the C++ version, which supports varlen inline, does. |
If we avoid adding record size in header, we can also avoid WriteInfo becoming virtual which is more expensive. |
Regarding this code in Delete: // Write default value
Value v = default(Value);
functions.ConcurrentWriter(ref hlog.GetKey(physicalAddress), ref v, ref hlog.GetValue(physicalAddress)); The reason we do the above is so that class objects are deallocated and GCed when a delete occurs. An alternative approach would be to only do this default write for the Generic allocator, since it does not hurt to keep the struct contents there in other cases. Thus we can avoid an additional header field, which is not desirable. Note that the above-mentioned logic of skipping empty bytes during a scan is still needed, because there may be holes in the log due to failed allocations, as described above. |
This sounds really unsafe or at least unreliable - what if you hit
Does allocator fail to write As for virtual call - there is already one as part of it ( |
Well, there can be empty spaces in the log at the end of a page, where the next record does not fit. These are zeroed out, and the scanner is expected to skip them by looking for a valid record header. For normal allocations, we write the header, key, and value. Assume we fix deletes so that only object pointers (generic allocator) are deleted, but inline keys and values are left on the log. If we let the allocator retain the key and value during deletes, we don't need to waste space on a total record size field. We can simply query the key and value for their lengths, to determine the course of action. |
Yes, that's was one of the options I was considering. If the records are never overwritten (might allow overwriting if lengths are exactly the same) then there is no need to have |
We deal with holes in the C++ version as follows: Line 1834 in 667434e
For Upserts, handling a size increase using CanWriteInPlace is somewhat similar to what we do in C++, where the PutAtomic() is supposed to return false if the upsert cannot be done in place. Note that FASTER is multi-threaded with user-level record locking in the mutable region. CanWriteInPlace is opaque to the user. However, the user may want to atomically mark the older record as read-only, which is why we made the user return a bool from PutAtomic() in C++: they can take a record lock, check the size, mark the record as read-only, release the lock, then return false. Unfortunately, this technique in C# would be a breaking change for the signature of functions.ConcurrentWriter. For Upserts, the atomic marking of read-only is not very important since these are blind upserts and so ordering across threads does not matter. However, handling size increase with RMW is much trickier, since we want to make sure no update is lost (e.g., one thread does read-copy-update, whereas another thread does in place update of old value). This is why we let RmwAtomic() return a bool in C++, so the user can appropriately mark the record as read-only in an atomic manner. For symmetry, PutAtomic() also returns a bool. |
Fix recovery logic to correctly read back page and include begin addr…
I will check out C++ version at some point in near future, but at a glance it seems that it expects that there is always at least Edit: Checked C++, and there will be no issue when it comes to unaligned or less than 8 bytes holes as |
I kind of have possible breaking implementation ( if (variableFunctions != null)
{
if (variableFunctions.ConcurrentWriter(...))
{
...
}
}
else
{
functions.ConcurrentWriter(...);
...
} Another idea - have delegates instead of condition (one issue with this is that if delegate bool ConcurrentWriterDelegate(ref Key key, ref Value src, ref Value dst);
delegate bool InPlaceUpdaterDelegate(ref Key key, ref Input input, ref Value value);
...
private readonly ConcurrentWriterDelegate concurrentWriter;
private readonly InPlaceUpdaterDelegate inPlaceUpdater;
...
// in constructor
if (this.functions is IVariableLengthFunctions<Key, Value, Input, Output, Context> variableLengthFunctions)
{
concurrentWriter = variableLengthFunctions.ConcurrentWriter;
inPlaceUpdater = variableLengthFunctions.InPlaceUpdater;
}
else
{
var wrapper = new FunctionsWrapper(functions);
concurrentWriter = wrapper.ConcurrentWriter;
inPlaceUpdater = wrapper.InPlaceUpdater;
} I could make breaking changes. Or maybe I can come up with a way to provide a simple wrapper (struct) - the biggest issue with this one is the fact that constructors do not have type inference. |
Edit: thinking about it, this approach below is bad because the atomic work (check if I can update + do the update) has to be split into two places. For example, with RMW, the first function has to take a record-level lock and then check if the operation can be done in place. If no, it sets the value to be read-only, releases the lock, and returns false. If yes, it returns without releasing the lock, and then the second function has to perform the operation and release the lock. See next comment for the alternative. (1) Create a type
(2) Have
Unrelated, note that with this overall design, the notion of capacity, with an additional capacity value in the value header, can be implemented by the user if that makes sense for their payload. It's not hard coded at the FASTER layer.
Exactly. |
Second attempt at non-breaking solution. Almost the same as your first proposal, except the new type only implements the new overriding methods. (1) Create a new type
(2) Have (3) Let the main Upsert/RMW code call into the interface if it is non-null If null, fall back to regular User can create a single class that provides both interfaces, if they want. |
That said, I am actually leaning towards the breaking solution (change return type to bool) as it is cleaner in the longer term. It is actually interesting even for the non-varlen case. For example, if a user wants to disable in-place updates for some keys (for instance, they want to log every change to the value), they can simply define the ConcurrentWriter and InPlaceUpdater to simply return false for those keys. @gunaprsd, any comments on the design, or issues with recovery? |
…s possible to update record in place. Initial implementation of variable length compaction.
Updated PR with partially breaking changes - |
…antavicius/FASTER into varlen_scan_iterator
… return bool instead of having separate IVariableLengthFunctions.
Just pushed a commit with breaking changes to |
It's looking good, thanks! Are you done with the PR or will there be more checkins/testcases? |
Ideally I would like to add a few unit tests:
However I won't have time for the next few days, after that I could either modify this PR or just create a new one. Otherwise I am more or less satisfied with PR. |
Added unit tests for varlen iteratorm, copy-on-write functions and recovery.
Added |
Just found another issue with iterator - will fix soon. var recordSize = hlog.GetRecordSize(hlog.GetPhysicalAddress(currentAddress));
// Check if record fits on page, if not skip to next page
if ((currentAddress & hlog.PageSizeMask) + recordSize > hlog.PageSize)
{
currentAddress = (1 + (currentAddress >> hlog.LogPageSizeBits)) << hlog.LogPageSizeBits;
continue;
} We are trying to read record from |
…log memory depending on whether the current address is in hlog memory or on disk (loaded to frame).
It was ok for blittable because GetRecordSize is essentially constant, independent of address. But yeah, now that we support varlen, it makes sense to keep them consistent. |
Current IFasterScanIterator interface is not suited for variable length structs as GetNext would only return a fragment of key/value (up to sizeof(Key)/sizeof(Value)). To solve this issue two new methods were added ref GetKey()/ref GetValue() that return references to actual key/value references in hlog.
There is disconnect in GenericScanIterator as it returns a reference to a copy of key/value and not direct reference to hlog. In case this is needed we could have flag to indicate whether we need to read from frame or hlog directly and store current page/offset.
This implementation does not support compaction for variable length structs. To support it compaction worker needs access to KeyLength and ValueLength. Maybe allocator could be used as a factory for log compaction functions.