Implemented iterator for variable length structs. #164

marius-klimantavicius · 2019-08-09T08:47:01Z

Current IFasterScanIterator interface is not suited for variable length structs as GetNext would only return a fragment of key/value (up to sizeof(Key)/sizeof(Value)). To solve this issue two new methods were added ref GetKey()/ref GetValue() that return references to actual key/value references in hlog.

There is disconnect in GenericScanIterator as it returns a reference to a copy of key/value and not direct reference to hlog. In case this is needed we could have flag to indicate whether we need to read from frame or hlog directly and store current page/offset.

This implementation does not support compaction for variable length structs. To support it compaction worker needs access to KeyLength and ValueLength. Maybe allocator could be used as a factory for log compaction functions.

Current IFasterScanIterator interface is not suited for variable length structs as GetNext would only return a fragment of key/value (up to sizeof(Key)/sizeof(Value)). To solve this issue two new methods were added ref GetKey()/ref GetValue() that return references to actual key/value references in hlog. There is disconnect in GenericScanIterator as it returns a reference to a copy of key/value and not direct reference to hlog. In case this is needed we could have flag to indicate whether we need to read from frame or hlog directly and store current page/offset.

msftclas · 2019-08-09T08:47:16Z

All CLA requirements met.

…leAllocator.Scan

marius-klimantavicius · 2019-08-14T09:59:06Z

Just found that this implementation (based on blittable allocator) does not correctly handle deleted records that are still in memory - when record is deleted in place its value is overwritten by default:

if (entry.word == Interlocked.CompareExchange(ref bucket->bucket_entries[slot], updatedEntry.word, entry.word))
{
  // Apply tombstone bit to the record
  hlog.GetInfo(physicalAddress).Tombstone = true;

  // Write default value
  Value v = default(Value);
  functions.ConcurrentWriter(ref hlog.GetKey(physicalAddress), ref v, ref hlog.GetValue(physicalAddress));

  status = OperationStatus.SUCCESS;
  goto LatchRelease; // Release shared latch (if acquired)
}

We effectively lose value and its length so there is no way to find out the true length of a record.

marius-klimantavicius · 2019-08-14T10:54:41Z

It seems that there are even more issues with variable length structs (when the records are still in memory)... There is optimization to do an inplace update of records for existing value, however the new value could be longer that previous thus resulting in overflow.

marius-klimantavicius · 2019-08-15T07:26:25Z

Ideas I have right now:

Add actual capacity record - not needed for fixed size key/value, somehow limit to varlen only.
Never overwrite existing records - higher memory usage, might be more issues the with the way memory area is implemented.
Append record with encoded padding indicating how much padding is there. This could also be used to align records to 4/8-byte boundaries. Not sure how to deal with concurrent writers in this case.

Added a check for in-place modifications, allocator can override the method to check for available space, if there is not enough space then a new record will be created.

marius-klimantavicius · 2019-08-16T04:28:23Z

Updated PR with fix for enumerating deleted records (tombstones that overwrite value) and fix fo #167.

Extended allocator to expose CanWriteInPlace to check whether the existing record has enough space.

Added capacity:int32 (just after RecordInfo) to indicate actual record size. Added 8-byte padding to records when using varlen allocator (this might be needed as CLR requires [at least according to spec] natural alignment when accessing built-in types and RecordInfo is essentially a long thus should be aligned to 8-bytes).

Note that I haven't checked if RMW is not broken (should not be for fixed length allocators, I am not using it in my case)

…Write.

badrishc · 2019-08-17T06:03:18Z

Thanks for the PR. We will take a look at this in the coming week. :)

marius-klimantavicius · 2019-08-17T15:02:35Z

Looking into comment in #167 - I think I will add an option to either always create copies or add capacity just after RecordInfo (like I do in this PR).

marius-klimantavicius · 2019-08-17T15:16:47Z

Should AllocatorBase.BeginAddress be part of checkpoint/recovery?

I was continuing to play around with this (my final goal is to have compaction) and found another issue (not sure if it is with iterator or checkpoints).
What I see is that after compaction the BeginAddress is set to some value, but after restart (+recovery) it is set back to initial value of kFirstValidAddress (64).

badrishc · 2019-08-17T21:08:36Z

There will be holes in the log for reasons other than delete as well. For example, if two threads allocate space for the same key, one of them might have to leave the hole and retry.

The way we recommend to skip holes during scan in this case is to move forward one byte at a time and interpret as RecordInfo and check if it's valid. That's what the C++ version, which supports varlen inline, does.

badrishc · 2019-08-17T21:19:27Z

If we avoid adding record size in header, we can also avoid WriteInfo becoming virtual which is more expensive.

badrishc · 2019-08-17T23:29:51Z

Regarding this code in Delete:

// Write default value
Value v = default(Value);
functions.ConcurrentWriter(ref hlog.GetKey(physicalAddress), ref v, ref hlog.GetValue(physicalAddress));

The reason we do the above is so that class objects are deallocated and GCed when a delete occurs. An alternative approach would be to only do this default write for the Generic allocator, since it does not hurt to keep the struct contents there in other cases. Thus we can avoid an additional header field, which is not desirable.

Note that the above-mentioned logic of skipping empty bytes during a scan is still needed, because there may be holes in the log due to failed allocations, as described above.

marius-klimantavicius · 2019-08-18T03:35:34Z

The way we recommend to skip holes during scan in this case is to move forward one byte at a time and interpret as RecordInfo and check if it's valid. That's what the C++ version, which supports varlen inline, does

This sounds really unsafe or at least unreliable - what if you hit RecordInfo-like random bytes (is it guaranteed that hole bytes are 0?), this will result in random memory read (with possibly garbage data being returned from scan and then skipping over possibly valid records; that's what I was seeing when trying to iterate over deleted records).

There will be holes in the log for reasons other than delete as well. For example, if two threads allocate space for the same key, one of them might have to leave the hole and retry.

Does allocator fail to write RecordInfo in such cases? If it does not write out RecordInfo then of course additional field would not help (as it would be missing). But if RecordInfo is written out then I'd rather have a setting that specifies whether additional header field is written or not (still working on that).

As for virtual call - there is already one as part of it (ref GetInfo(physicalAddress) which was being called to actually get reference to RecordInfo to write to) and I'd expect jit to devirtualize it (as it is a virtual call in a sealed class) so in the end we still should end up with only one virtual call.

badrishc · 2019-08-18T04:24:06Z

Well, there can be empty spaces in the log at the end of a page, where the next record does not fit. These are zeroed out, and the scanner is expected to skip them by looking for a valid record header. For normal allocations, we write the header, key, and value.

Assume we fix deletes so that only object pointers (generic allocator) are deleted, but inline keys and values are left on the log.

If we let the allocator retain the key and value during deletes, we don't need to waste space on a total record size field. We can simply query the key and value for their lengths, to determine the course of action.

marius-klimantavicius · 2019-08-18T15:39:34Z

Yes, that's was one of the options I was considering. If the records are never overwritten (might allow overwriting if lengths are exactly the same) then there is no need to have capacity field (right now this is implemented as a [virtual] call to CanWriteInPlace). I have implemented settings to enable/disable capacity and alignment in marius-klimantavicius/mister@19d6d5f I am not adding it to this PR yet as I might still might remove capacity altogether (not sure regarding alignment though) and even though using marker generic arguments is really efficient (at least in .net core) it is somewhat painful to map bool to a correct type argument (that's why I have VariableLengthBlittableAllocator<Key, Value>.Create).
And I still am unable to find a satisfactory way to deal with holes at the end of page (that does not require guessing where the record starts). Maybe iterator could know that it's at the end of page and could skip to the next page iff all remaining bytes in page are 0.

badrishc · 2019-08-18T20:31:03Z

We deal with holes in the C++ version as follows:

FASTER/cc/src/core/faster.h

Line 1834 in 667434e

if(record->header.IsNull()) {

For Upserts, handling a size increase using CanWriteInPlace is somewhat similar to what we do in C++, where the PutAtomic() is supposed to return false if the upsert cannot be done in place.

Note that FASTER is multi-threaded with user-level record locking in the mutable region. CanWriteInPlace is opaque to the user. However, the user may want to atomically mark the older record as read-only, which is why we made the user return a bool from PutAtomic() in C++: they can take a record lock, check the size, mark the record as read-only, release the lock, then return false. Unfortunately, this technique in C# would be a breaking change for the signature of functions.ConcurrentWriter.

For Upserts, the atomic marking of read-only is not very important since these are blind upserts and so ordering across threads does not matter. However, handling size increase with RMW is much trickier, since we want to make sure no update is lost (e.g., one thread does read-copy-update, whereas another thread does in place update of old value). This is why we let RmwAtomic() return a bool in C++, so the user can appropriately mark the record as read-only in an atomic manner. For symmetry, PutAtomic() also returns a bool.

Fix recovery logic to correctly read back page and include begin addr…

marius-klimantavicius · 2019-08-19T03:29:33Z

I will check out C++ version at some point in near future, but at a glance it seems that it expects that there is always at least sizeof(header) (8?) bytes in the hole. What if we end up 4-bytes short of the end of page (or in unaligned case 3-bytes short to the end of page) - will look into C++ version, probably will find an answer there.
As for CanWriteInPlace I am still trying to understand. It seems that we cannot rely on KeyLength and ValueLength (unless the user makes sure to correctly handle multiple writers); CanWriteInPlace can break RMW if it ever returns true and ideally it would be a decision for user to make (PutAtomic/RmwAtomic).

Edit: Checked C++, and there will be no issue when it comes to unaligned or less than 8 bytes holes as template <class key_t, class value_t>struct Record::size() returns value aligned to alignof(RecordInfo).
Probably should modify VariableLengthAllocator to align to 8 as well.

marius-klimantavicius · 2019-08-19T15:12:20Z

I kind of have possible breaking implementation (ConcurrentWriter and InPlaceUpdater returning bool). Trying to figure out a way to maintain backwards compatibility (we basically only need that with var len structs). One way I could do that is to check if Functions is let's say IVariableLengthFunctions<...> and have a condition (not really pleasant to look into):

if (variableFunctions != null)
{
  if (variableFunctions.ConcurrentWriter(...))
  {
    ...
  }
}
else
{
  functions.ConcurrentWriter(...);
  ...
}

Another idea - have delegates instead of condition (one issue with this is that if Functions is struct we do create a copy/box it when casting to IVariableLengthFunctions) :

        delegate bool ConcurrentWriterDelegate(ref Key key, ref Value src, ref Value dst);
        delegate bool InPlaceUpdaterDelegate(ref Key key, ref Input input, ref Value value);
...
        private readonly ConcurrentWriterDelegate concurrentWriter;
        private readonly InPlaceUpdaterDelegate inPlaceUpdater;
...
// in constructor
            if (this.functions is IVariableLengthFunctions<Key, Value, Input, Output, Context> variableLengthFunctions)
            {
                concurrentWriter = variableLengthFunctions.ConcurrentWriter;
                inPlaceUpdater = variableLengthFunctions.InPlaceUpdater;
            }
            else
            {
                var wrapper = new FunctionsWrapper(functions);
                concurrentWriter = wrapper.ConcurrentWriter;
                inPlaceUpdater = wrapper.InPlaceUpdater;
            }

I could make breaking changes. Or maybe I can come up with a way to provide a simple wrapper (struct) - the biggest issue with this one is the fact that constructors do not have type inference.

badrishc · 2019-08-19T23:04:33Z

Edit: thinking about it, this approach below is bad because the atomic work (check if I can update + do the update) has to be split into two places. For example, with RMW, the first function has to take a record-level lock and then check if the operation can be done in place. If no, it sets the value to be read-only, releases the lock, and returns false. If yes, it returns without releasing the lock, and then the second function has to perform the operation and release the lock. See next comment for the alternative.

How about this non-breaking approach (similar to what you mention):
(1) Create a type IVariableLengthFunctions<Key, Value> with two methods:

bool CanUpsertInPlace(ref Key key, ref Value src, ref Value dst)

bool CanRMWInPlace(ref Key key, ref Input input, ref Value value)

(2) Have IVariableLenthFunctions be a field in VariableLengthStructSettings -- when using varlen, the user has to provide an implementation for this interface, via settings. Thus, the actual IFunctions is left untouched. Store it as a readonly variable in FASTER.

~~(3) Let the main Upsert/RMW code call into the interface if it is non-null If null, assume it is true.~~

Unrelated, note that with this overall design, the notion of capacity, with an additional capacity value in the value header, can be implemented by the user if that makes sense for their payload. It's not hard coded at the FASTER layer.

Probably should modify VariableLengthAllocator to align to 8 as well.

Exactly.

badrishc · 2019-08-19T23:45:01Z

Second attempt at non-breaking solution. Almost the same as your first proposal, except the new type only implements the new overriding methods.

(1) Create a new type IVariableLengthFunctions<Key, Value, Input> that will be used to override the default functions provided in IFunctions:

bool ConcurrentWriter(ref Key key, ref Value src, ref Value dst)
bool InPlaceUpdater(ref Key key, ref Input input, ref Value value)

(2) Have IVariableLenthFunctions be a field in VariableLengthStructSettings -- when using varlen, the user has to provide an implementation for this interface, via settings. Thus, the actual IFunctions is left untouched. Store it as a readonly variable in FASTER.

(3) Let the main Upsert/RMW code call into the interface if it is non-null If null, fall back to regular IFunctions.

User can create a single class that provides both interfaces, if they want.

badrishc · 2019-08-19T23:52:54Z

That said, I am actually leaning towards the breaking solution (change return type to bool) as it is cleaner in the longer term.

It is actually interesting even for the non-varlen case. For example, if a user wants to disable in-place updates for some keys (for instance, they want to log every change to the value), they can simply define the ConcurrentWriter and InPlaceUpdater to simply return false for those keys.

@gunaprsd, any comments on the design, or issues with recovery?

…s possible to update record in place. Initial implementation of variable length compaction.

marius-klimantavicius · 2019-08-20T00:36:26Z

Updated PR with partially breaking changes - IFunctions remains the same, however VariableLengthStructSettings takes in additional type parameter. I can also break IFunctions as well.

…antavicius/FASTER into varlen_scan_iterator

… return bool instead of having separate IVariableLengthFunctions.

marius-klimantavicius · 2019-08-20T03:16:09Z

Just pushed a commit with breaking changes to IFunctions - I can revert it if you decide to go with IVariableLengthFunctions.

badrishc · 2019-08-20T06:51:08Z

It's looking good, thanks! Are you done with the PR or will there be more checkins/testcases?

marius-klimantavicius · 2019-08-20T07:12:07Z

Ideally I would like to add a few unit tests:

Skipping null records at the end of page
Checking that new records are created if ConcurrentWriter/InPlaceUpdater returned false

However I won't have time for the next few days, after that I could either modify this PR or just create a new one.

Otherwise I am more or less satisfied with PR.

Added unit tests for varlen iteratorm, copy-on-write functions and recovery.

marius-klimantavicius · 2019-08-23T04:48:41Z

Added Delete method to IFasterKV.
Added a few unit tests.
This is now finished from my point of view (unless you need me to change anything). If I find new issues I will create new PR/issue.

marius-klimantavicius · 2019-08-23T18:00:18Z

Just found another issue with iterator - will fix soon.
I this these lines are not correct (both blittable and varlen [which is based on blittable]):

                var recordSize = hlog.GetRecordSize(hlog.GetPhysicalAddress(currentAddress));
                // Check if record fits on page, if not skip to next page
                if ((currentAddress & hlog.PageSizeMask) + recordSize > hlog.PageSize)
                {
                    currentAddress = (1 + (currentAddress >> hlog.LogPageSizeBits)) << hlog.LogPageSizeBits;
                    continue;
                }

We are trying to read record from hlog memory regardless whether the address is in memory or on disk.

…log memory depending on whether the current address is in hlog memory or on disk (loaded to frame).

badrishc · 2019-08-24T00:25:33Z

It was ok for blittable because GetRecordSize is essentially constant, independent of address. But yeah, now that we support varlen, it makes sense to keep them consistent.

Return VariableLengthBlittableScanIterator from VariableLengthBlittab…

4c9de49

…leAllocator.Scan

Added capacity to record when using variable length struct allocator.

49ee7d6

Added a check for in-place modifications, allocator can override the method to check for available space, if there is not enough space then a new record will be created.

marius-klimantavicius mentioned this pull request Aug 16, 2019

Memory overflow when varlen record is updated in-place in memory #167

Closed

Marius Klimantavičius added 3 commits August 16, 2019 08:17

Call AllocatorBase.WriteInfo instead of directly invoking RecordInfo.…

f4b8a9a

…Write.

Return estimated record size if stored capacity is 0.

3b65d50

Fixed typo that was causing failures in tests.

30b3a14

marius-klimantavicius and others added 3 commits August 19, 2019 06:08

Merge pull request #1 from microsoft/master

d78f550

Fix recovery logic to correctly read back page and include begin addr…

Merge branch 'master' into varlen_scan_iterator

151ba13

Merge branch 'master' into varlen_scan_iterator

76b1be2

ConcurrentWriter/InPlaceUpdater returns bool to indicate whether it i…

5490842

…s possible to update record in place. Initial implementation of variable length compaction.

Marius Klimantavičius added 2 commits August 20, 2019 05:59

Merge branch 'varlen_scan_iterator' of https://github.com/marius-klim…

53497c9

…antavicius/FASTER into varlen_scan_iterator

Modified IFunctions.ConcurrentWriter and IFunctions.InPlaceUpdater to…

34cfbd4

… return bool instead of having separate IVariableLengthFunctions.

Added Delete to IFasterKV.

1dee257

Added unit tests for varlen iteratorm, copy-on-write functions and recovery.

Do not use C# 7.3 features

2594ccc

Variable length/blittable iterator - load record info from frame or h…

5b3f350

…log memory depending on whether the current address is in hlog memory or on disk (loaded to frame).

Merge branch 'master' into varlen_scan_iterator

a28ef2b

badrishc merged commit 4d94966 into microsoft:master Aug 27, 2019

badrishc mentioned this pull request Aug 29, 2019

[C++] Support for log scan (port from C#) #171

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented iterator for variable length structs. #164

Implemented iterator for variable length structs. #164

marius-klimantavicius commented Aug 9, 2019 •

edited

Loading

msftclas commented Aug 9, 2019 •

edited

Loading

marius-klimantavicius commented Aug 14, 2019 •

edited

Loading

marius-klimantavicius commented Aug 14, 2019

marius-klimantavicius commented Aug 15, 2019 •

edited

Loading

marius-klimantavicius commented Aug 16, 2019

badrishc commented Aug 17, 2019

marius-klimantavicius commented Aug 17, 2019

marius-klimantavicius commented Aug 17, 2019

badrishc commented Aug 17, 2019

badrishc commented Aug 17, 2019

badrishc commented Aug 17, 2019

marius-klimantavicius commented Aug 18, 2019

badrishc commented Aug 18, 2019

marius-klimantavicius commented Aug 18, 2019

badrishc commented Aug 18, 2019

marius-klimantavicius commented Aug 19, 2019 •

edited

Loading

marius-klimantavicius commented Aug 19, 2019 •

edited

Loading

badrishc commented Aug 19, 2019 •

edited

Loading

badrishc commented Aug 19, 2019

badrishc commented Aug 19, 2019

marius-klimantavicius commented Aug 20, 2019

marius-klimantavicius commented Aug 20, 2019

badrishc commented Aug 20, 2019

marius-klimantavicius commented Aug 20, 2019

marius-klimantavicius commented Aug 23, 2019

marius-klimantavicius commented Aug 23, 2019

badrishc commented Aug 24, 2019

Implemented iterator for variable length structs. #164

Implemented iterator for variable length structs. #164

Conversation

marius-klimantavicius commented Aug 9, 2019 • edited Loading

msftclas commented Aug 9, 2019 • edited Loading

marius-klimantavicius commented Aug 14, 2019 • edited Loading

marius-klimantavicius commented Aug 14, 2019

marius-klimantavicius commented Aug 15, 2019 • edited Loading

marius-klimantavicius commented Aug 16, 2019

badrishc commented Aug 17, 2019

marius-klimantavicius commented Aug 17, 2019

marius-klimantavicius commented Aug 17, 2019

badrishc commented Aug 17, 2019

badrishc commented Aug 17, 2019

badrishc commented Aug 17, 2019

marius-klimantavicius commented Aug 18, 2019

badrishc commented Aug 18, 2019

marius-klimantavicius commented Aug 18, 2019

badrishc commented Aug 18, 2019

marius-klimantavicius commented Aug 19, 2019 • edited Loading

marius-klimantavicius commented Aug 19, 2019 • edited Loading

badrishc commented Aug 19, 2019 • edited Loading

badrishc commented Aug 19, 2019

badrishc commented Aug 19, 2019

marius-klimantavicius commented Aug 20, 2019

marius-klimantavicius commented Aug 20, 2019

badrishc commented Aug 20, 2019

marius-klimantavicius commented Aug 20, 2019

marius-klimantavicius commented Aug 23, 2019

marius-klimantavicius commented Aug 23, 2019

badrishc commented Aug 24, 2019

marius-klimantavicius commented Aug 9, 2019 •

edited

Loading

msftclas commented Aug 9, 2019 •

edited

Loading

marius-klimantavicius commented Aug 14, 2019 •

edited

Loading

marius-klimantavicius commented Aug 15, 2019 •

edited

Loading

marius-klimantavicius commented Aug 19, 2019 •

edited

Loading

marius-klimantavicius commented Aug 19, 2019 •

edited

Loading

badrishc commented Aug 19, 2019 •

edited

Loading