ObjectRecoveryTest failing when more data is used #226

dangershony · 2020-01-03T16:31:01Z

This PR is not intended to commit but to show a durability failed tests scenario

I have added some more data to ObjectRecoveryTest1 test and increased iterations (to invoke more checkpoints) I would expect this to pass even with the additional changes I made, it seems the stream runs out of data to read and that results in OverflowException when decentralizing a key

Not sure (yet) why exactly, to reproduce just run this test ObjectRecoveryTest1
I came across this issue when writing a POC to simulate the requirements for a project I am currently working on (I can provide the link to the POC if needed but I get the same issues).

Any ideas?

System.OverflowException
  HResult=0x80131516
  Message=Arithmetic operation resulted in an overflow.
  Source=FASTER.test
  StackTrace:
   at FASTER.test.recovery.objects.MyKeySerializer.Deserialize(MyKey& key) in C:\Users\dan\Documents\GitHub\FASTER\cs\test\ObjectRecoveryTest2.cs:line 180
   at FASTER.core.GenericAllocator`2.Deserialize(Byte* raw, Int64 ptr, Int64 untilptr, Record`2[] src, Stream stream) in C:\Users\dan\Documents\GitHub\FASTER\cs\src\core\Allocator\GenericAllocator.cs:line 727
   at FASTER.core.GenericAllocator`2.AsyncReadPageWithObjectsCallback[TContext](UInt32 errorCode, UInt32 numBytes, NativeOverlapped* overlap) in C:\Users\dan\Documents\GitHub\FASTER\cs\src\core\Allocator\GenericAllocator.cs:line 534
   at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)

msftclas · 2020-01-03T16:31:16Z

All CLA requirements met.

badrishc · 2020-01-06T17:00:03Z

The design of FASTER object log is as follows: for each segment in the main log, there is a corresponding segment of the object log that stores all the corresponding objects from the main log, laid out end to end.

This implies a requirement for the object log: an object log segment needs to grow large enough to accommodate all objects within the corresponding main log segment. The resolution we adopt is for the object log segment to be of variable size, i.e., we should not size the object log segment to a specific size a priori. To accomplish this, we need to make sure preAllocate is set to false for the object log during its CreateLogDevice call, because otherwise we would wrongly truncate the segment to the pre-allocate size. As action item, we need this point to be documented and the samples/tests to be updated.

Making this change fixes the error you encounter in this PR.

badrishc · 2020-01-06T17:33:48Z

FYI, the VSTS tests in this PR fail due to a different reason: the modified testcase is now too large to run in 32-bit, so those version of tests throw an overflow exception.

dangershony · 2020-01-06T21:18:31Z

Ok thanks I will close the PR and try to make the changes you proposed on a POC project I created and will report back

badrishc · 2020-01-06T21:55:47Z

It looks like there is a slightly different issue affecting only Snapshot CheckpointType in your repro, where the object log for snapshot is not being read correctly - the fix will be checked in today.

badrishc · 2020-01-06T22:28:52Z

See PR #228.

We have now made it so that even if you create an object log with preallocate set to true, it will not cause an error. To achieve this, we will no longer perform preallocation/truncation for an object log device (recognized internally because the segment size is set to -1 during initialization of an object log device).

badrishc · 2020-01-07T01:21:49Z

Please use the latest from PR #228 for your POC, as it fixes a related issue with object logs.

badrishc · 2020-01-07T21:47:54Z

NuGet with fixes related to this PR has been pushed out: https://www.nuget.org/packages/Microsoft.FASTER

ObjectRecoveryTest failing when more data is used

a4f4fb1

dangershony mentioned this pull request Jan 3, 2020

[C#] Entries Cant Be Recovered After Process Restart #225

Closed

Update ObjectRecoveryTest2.cs

962542d

dangershony closed this Jan 6, 2020

dangershony deleted the faster-poc branch January 6, 2020 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ObjectRecoveryTest failing when more data is used #226

ObjectRecoveryTest failing when more data is used #226

dangershony commented Jan 3, 2020 •

edited

Loading

msftclas commented Jan 3, 2020 •

edited

Loading

badrishc commented Jan 6, 2020 •

edited

Loading

badrishc commented Jan 6, 2020

dangershony commented Jan 6, 2020

badrishc commented Jan 6, 2020 •

edited

Loading

badrishc commented Jan 6, 2020

badrishc commented Jan 7, 2020

badrishc commented Jan 7, 2020

ObjectRecoveryTest failing when more data is used #226

ObjectRecoveryTest failing when more data is used #226

Conversation

dangershony commented Jan 3, 2020 • edited Loading

msftclas commented Jan 3, 2020 • edited Loading

badrishc commented Jan 6, 2020 • edited Loading

badrishc commented Jan 6, 2020

dangershony commented Jan 6, 2020

badrishc commented Jan 6, 2020 • edited Loading

badrishc commented Jan 6, 2020

badrishc commented Jan 7, 2020

badrishc commented Jan 7, 2020

dangershony commented Jan 3, 2020 •

edited

Loading

msftclas commented Jan 3, 2020 •

edited

Loading

badrishc commented Jan 6, 2020 •

edited

Loading

badrishc commented Jan 6, 2020 •

edited

Loading