Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented iterator for variable length structs. #164

Merged
merged 16 commits into from
Aug 27, 2019

Conversation

marius-klimantavicius
Copy link
Contributor

@marius-klimantavicius marius-klimantavicius commented Aug 9, 2019

Current IFasterScanIterator interface is not suited for variable length structs as GetNext would only return a fragment of key/value (up to sizeof(Key)/sizeof(Value)). To solve this issue two new methods were added ref GetKey()/ref GetValue() that return references to actual key/value references in hlog.

There is disconnect in GenericScanIterator as it returns a reference to a copy of key/value and not direct reference to hlog. In case this is needed we could have flag to indicate whether we need to read from frame or hlog directly and store current page/offset.

This implementation does not support compaction for variable length structs. To support it compaction worker needs access to KeyLength and ValueLength. Maybe allocator could be used as a factory for log compaction functions.

Current IFasterScanIterator interface is not suited for variable length structs as GetNext would only return a fragment of key/value (up to sizeof(Key)/sizeof(Value)). To solve this issue two new methods were added ref GetKey()/ref GetValue() that return references to actual key/value references in hlog.

There is disconnect in GenericScanIterator as it returns a reference to a copy of key/value and not direct reference to hlog. In case this is needed we could have flag to indicate whether we need to read from frame or hlog directly and store current page/offset.
@msftclas
Copy link

msftclas commented Aug 9, 2019

CLA assistant check
All CLA requirements met.

@marius-klimantavicius
Copy link
Contributor Author

marius-klimantavicius commented Aug 14, 2019

Just found that this implementation (based on blittable allocator) does not correctly handle deleted records that are still in memory - when record is deleted in place its value is overwritten by default:

if (entry.word == Interlocked.CompareExchange(ref bucket->bucket_entries[slot], updatedEntry.word, entry.word))
{
  // Apply tombstone bit to the record
  hlog.GetInfo(physicalAddress).Tombstone = true;

  // Write default value
  Value v = default(Value);
  functions.ConcurrentWriter(ref hlog.GetKey(physicalAddress), ref v, ref hlog.GetValue(physicalAddress));

  status = OperationStatus.SUCCESS;
  goto LatchRelease; // Release shared latch (if acquired)
}

We effectively lose value and its length so there is no way to find out the true length of a record.

@marius-klimantavicius
Copy link
Contributor Author

It seems that there are even more issues with variable length structs (when the records are still in memory)... There is optimization to do an inplace update of records for existing value, however the new value could be longer that previous thus resulting in overflow.

@marius-klimantavicius
Copy link
Contributor Author

marius-klimantavicius commented Aug 15, 2019

Ideas I have right now:

  1. Add actual capacity record - not needed for fixed size key/value, somehow limit to varlen only.
  2. Never overwrite existing records - higher memory usage, might be more issues the with the way memory area is implemented.
  3. Append record with encoded padding indicating how much padding is there. This could also be used to align records to 4/8-byte boundaries. Not sure how to deal with concurrent writers in this case.

Added a check for in-place modifications, allocator can override the method to check for available space, if there is not enough space then a new record will be created.
@marius-klimantavicius
Copy link
Contributor Author

Updated PR with fix for enumerating deleted records (tombstones that overwrite value) and fix fo #167.

Extended allocator to expose CanWriteInPlace to check whether the existing record has enough space.

Added capacity:int32 (just after RecordInfo) to indicate actual record size. Added 8-byte padding to records when using varlen allocator (this might be needed as CLR requires [at least according to spec] natural alignment when accessing built-in types and RecordInfo is essentially a long thus should be aligned to 8-bytes).

Note that I haven't checked if RMW is not broken (should not be for fixed length allocators, I am not using it in my case)

@badrishc
Copy link
Collaborator

Thanks for the PR. We will take a look at this in the coming week. :)

@marius-klimantavicius
Copy link
Contributor Author

Looking into comment in #167 - I think I will add an option to either always create copies or add capacity just after RecordInfo (like I do in this PR).

@marius-klimantavicius
Copy link
Contributor Author

Should AllocatorBase.BeginAddress be part of checkpoint/recovery?

I was continuing to play around with this (my final goal is to have compaction) and found another issue (not sure if it is with iterator or checkpoints).
What I see is that after compaction the BeginAddress is set to some value, but after restart (+recovery) it is set back to initial value of kFirstValidAddress (64).

@badrishc
Copy link
Collaborator

There will be holes in the log for reasons other than delete as well. For example, if two threads allocate space for the same key, one of them might have to leave the hole and retry.

The way we recommend to skip holes during scan in this case is to move forward one byte at a time and interpret as RecordInfo and check if it's valid. That's what the C++ version, which supports varlen inline, does.

@badrishc
Copy link
Collaborator

If we avoid adding record size in header, we can also avoid WriteInfo becoming virtual which is more expensive.

@badrishc
Copy link
Collaborator

Regarding this code in Delete:

// Write default value
Value v = default(Value);
functions.ConcurrentWriter(ref hlog.GetKey(physicalAddress), ref v, ref hlog.GetValue(physicalAddress));

The reason we do the above is so that class objects are deallocated and GCed when a delete occurs. An alternative approach would be to only do this default write for the Generic allocator, since it does not hurt to keep the struct contents there in other cases. Thus we can avoid an additional header field, which is not desirable.

Note that the above-mentioned logic of skipping empty bytes during a scan is still needed, because there may be holes in the log due to failed allocations, as described above.

@marius-klimantavicius
Copy link
Contributor Author

The way we recommend to skip holes during scan in this case is to move forward one byte at a time and interpret as RecordInfo and check if it's valid. That's what the C++ version, which supports varlen inline, does

This sounds really unsafe or at least unreliable - what if you hit RecordInfo-like random bytes (is it guaranteed that hole bytes are 0?), this will result in random memory read (with possibly garbage data being returned from scan and then skipping over possibly valid records; that's what I was seeing when trying to iterate over deleted records).

There will be holes in the log for reasons other than delete as well. For example, if two threads allocate space for the same key, one of them might have to leave the hole and retry.

Does allocator fail to write RecordInfo in such cases? If it does not write out RecordInfo then of course additional field would not help (as it would be missing). But if RecordInfo is written out then I'd rather have a setting that specifies whether additional header field is written or not (still working on that).

As for virtual call - there is already one as part of it (ref GetInfo(physicalAddress) which was being called to actually get reference to RecordInfo to write to) and I'd expect jit to devirtualize it (as it is a virtual call in a sealed class) so in the end we still should end up with only one virtual call.

@badrishc
Copy link
Collaborator

Well, there can be empty spaces in the log at the end of a page, where the next record does not fit. These are zeroed out, and the scanner is expected to skip them by looking for a valid record header. For normal allocations, we write the header, key, and value.

Assume we fix deletes so that only object pointers (generic allocator) are deleted, but inline keys and values are left on the log.

If we let the allocator retain the key and value during deletes, we don't need to waste space on a total record size field. We can simply query the key and value for their lengths, to determine the course of action.

@marius-klimantavicius
Copy link
Contributor Author

Yes, that's was one of the options I was considering. If the records are never overwritten (might allow overwriting if lengths are exactly the same) then there is no need to have capacity field (right now this is implemented as a [virtual] call to CanWriteInPlace). I have implemented settings to enable/disable capacity and alignment in marius-klimantavicius/mister@19d6d5f I am not adding it to this PR yet as I might still might remove capacity altogether (not sure regarding alignment though) and even though using marker generic arguments is really efficient (at least in .net core) it is somewhat painful to map bool to a correct type argument (that's why I have VariableLengthBlittableAllocator<Key, Value>.Create).
And I still am unable to find a satisfactory way to deal with holes at the end of page (that does not require guessing where the record starts). Maybe iterator could know that it's at the end of page and could skip to the next page iff all remaining bytes in page are 0.

@badrishc
Copy link
Collaborator

We deal with holes in the C++ version as follows:

if(record->header.IsNull()) {

For Upserts, handling a size increase using CanWriteInPlace is somewhat similar to what we do in C++, where the PutAtomic() is supposed to return false if the upsert cannot be done in place.

Note that FASTER is multi-threaded with user-level record locking in the mutable region. CanWriteInPlace is opaque to the user. However, the user may want to atomically mark the older record as read-only, which is why we made the user return a bool from PutAtomic() in C++: they can take a record lock, check the size, mark the record as read-only, release the lock, then return false. Unfortunately, this technique in C# would be a breaking change for the signature of functions.ConcurrentWriter.

For Upserts, the atomic marking of read-only is not very important since these are blind upserts and so ordering across threads does not matter. However, handling size increase with RMW is much trickier, since we want to make sure no update is lost (e.g., one thread does read-copy-update, whereas another thread does in place update of old value). This is why we let RmwAtomic() return a bool in C++, so the user can appropriately mark the record as read-only in an atomic manner. For symmetry, PutAtomic() also returns a bool.

@marius-klimantavicius
Copy link
Contributor Author

marius-klimantavicius commented Aug 19, 2019

I will check out C++ version at some point in near future, but at a glance it seems that it expects that there is always at least sizeof(header) (8?) bytes in the hole. What if we end up 4-bytes short of the end of page (or in unaligned case 3-bytes short to the end of page) - will look into C++ version, probably will find an answer there.
As for CanWriteInPlace I am still trying to understand. It seems that we cannot rely on KeyLength and ValueLength (unless the user makes sure to correctly handle multiple writers); CanWriteInPlace can break RMW if it ever returns true and ideally it would be a decision for user to make (PutAtomic/RmwAtomic).

Edit: Checked C++, and there will be no issue when it comes to unaligned or less than 8 bytes holes as template <class key_t, class value_t>struct Record::size() returns value aligned to alignof(RecordInfo).
Probably should modify VariableLengthAllocator to align to 8 as well.

@marius-klimantavicius
Copy link
Contributor Author

marius-klimantavicius commented Aug 19, 2019

I kind of have possible breaking implementation (ConcurrentWriter and InPlaceUpdater returning bool). Trying to figure out a way to maintain backwards compatibility (we basically only need that with var len structs). One way I could do that is to check if Functions is let's say IVariableLengthFunctions<...> and have a condition (not really pleasant to look into):

if (variableFunctions != null)
{
  if (variableFunctions.ConcurrentWriter(...))
  {
    ...
  }
}
else
{
  functions.ConcurrentWriter(...);
  ...
}

Another idea - have delegates instead of condition (one issue with this is that if Functions is struct we do create a copy/box it when casting to IVariableLengthFunctions) :

        delegate bool ConcurrentWriterDelegate(ref Key key, ref Value src, ref Value dst);
        delegate bool InPlaceUpdaterDelegate(ref Key key, ref Input input, ref Value value);
...
        private readonly ConcurrentWriterDelegate concurrentWriter;
        private readonly InPlaceUpdaterDelegate inPlaceUpdater;
...
// in constructor
            if (this.functions is IVariableLengthFunctions<Key, Value, Input, Output, Context> variableLengthFunctions)
            {
                concurrentWriter = variableLengthFunctions.ConcurrentWriter;
                inPlaceUpdater = variableLengthFunctions.InPlaceUpdater;
            }
            else
            {
                var wrapper = new FunctionsWrapper(functions);
                concurrentWriter = wrapper.ConcurrentWriter;
                inPlaceUpdater = wrapper.InPlaceUpdater;
            }

I could make breaking changes. Or maybe I can come up with a way to provide a simple wrapper (struct) - the biggest issue with this one is the fact that constructors do not have type inference.

@badrishc
Copy link
Collaborator

badrishc commented Aug 19, 2019

Edit: thinking about it, this approach below is bad because the atomic work (check if I can update + do the update) has to be split into two places. For example, with RMW, the first function has to take a record-level lock and then check if the operation can be done in place. If no, it sets the value to be read-only, releases the lock, and returns false. If yes, it returns without releasing the lock, and then the second function has to perform the operation and release the lock. See next comment for the alternative.

How about this non-breaking approach (similar to what you mention):

(1) Create a type IVariableLengthFunctions<Key, Value> with two methods:

  • bool CanUpsertInPlace(ref Key key, ref Value src, ref Value dst)
  • bool CanRMWInPlace(ref Key key, ref Input input, ref Value value)

(2) Have IVariableLenthFunctions be a field in VariableLengthStructSettings -- when using varlen, the user has to provide an implementation for this interface, via settings. Thus, the actual IFunctions is left untouched. Store it as a readonly variable in FASTER.

(3) Let the main Upsert/RMW code call into the interface if it is non-null If null, assume it is true.

Unrelated, note that with this overall design, the notion of capacity, with an additional capacity value in the value header, can be implemented by the user if that makes sense for their payload. It's not hard coded at the FASTER layer.

Probably should modify VariableLengthAllocator to align to 8 as well.

Exactly.

@badrishc
Copy link
Collaborator

Second attempt at non-breaking solution. Almost the same as your first proposal, except the new type only implements the new overriding methods.

(1) Create a new type IVariableLengthFunctions<Key, Value, Input> that will be used to override the default functions provided in IFunctions:

  • bool ConcurrentWriter(ref Key key, ref Value src, ref Value dst)
  • bool InPlaceUpdater(ref Key key, ref Input input, ref Value value)

(2) Have IVariableLenthFunctions be a field in VariableLengthStructSettings -- when using varlen, the user has to provide an implementation for this interface, via settings. Thus, the actual IFunctions is left untouched. Store it as a readonly variable in FASTER.

(3) Let the main Upsert/RMW code call into the interface if it is non-null If null, fall back to regular IFunctions.

User can create a single class that provides both interfaces, if they want.

@badrishc
Copy link
Collaborator

That said, I am actually leaning towards the breaking solution (change return type to bool) as it is cleaner in the longer term.

It is actually interesting even for the non-varlen case. For example, if a user wants to disable in-place updates for some keys (for instance, they want to log every change to the value), they can simply define the ConcurrentWriter and InPlaceUpdater to simply return false for those keys.

@gunaprsd, any comments on the design, or issues with recovery?

…s possible to update record in place.

Initial implementation of variable length compaction.
@marius-klimantavicius
Copy link
Contributor Author

Updated PR with partially breaking changes - IFunctions remains the same, however VariableLengthStructSettings takes in additional type parameter. I can also break IFunctions as well.

Marius Klimantavičius added 2 commits August 20, 2019 05:59
@marius-klimantavicius
Copy link
Contributor Author

Just pushed a commit with breaking changes to IFunctions - I can revert it if you decide to go with IVariableLengthFunctions.

@badrishc
Copy link
Collaborator

It's looking good, thanks! Are you done with the PR or will there be more checkins/testcases?

@marius-klimantavicius
Copy link
Contributor Author

Ideally I would like to add a few unit tests:

  1. Skipping null records at the end of page
  2. Checking that new records are created if ConcurrentWriter/InPlaceUpdater returned false

However I won't have time for the next few days, after that I could either modify this PR or just create a new one.

Otherwise I am more or less satisfied with PR.

Added unit tests for varlen iteratorm, copy-on-write functions and recovery.
@marius-klimantavicius
Copy link
Contributor Author

Added Delete method to IFasterKV.
Added a few unit tests.
This is now finished from my point of view (unless you need me to change anything). If I find new issues I will create new PR/issue.

@marius-klimantavicius
Copy link
Contributor Author

Just found another issue with iterator - will fix soon.
I this these lines are not correct (both blittable and varlen [which is based on blittable]):

                var recordSize = hlog.GetRecordSize(hlog.GetPhysicalAddress(currentAddress));
                // Check if record fits on page, if not skip to next page
                if ((currentAddress & hlog.PageSizeMask) + recordSize > hlog.PageSize)
                {
                    currentAddress = (1 + (currentAddress >> hlog.LogPageSizeBits)) << hlog.LogPageSizeBits;
                    continue;
                }

We are trying to read record from hlog memory regardless whether the address is in memory or on disk.

…log memory depending on whether the current address is in hlog memory or on disk (loaded to frame).
@badrishc
Copy link
Collaborator

It was ok for blittable because GetRecordSize is essentially constant, independent of address. But yeah, now that we support varlen, it makes sense to keep them consistent.

@badrishc badrishc merged commit 4d94966 into microsoft:master Aug 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants