Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObjectRecoveryTest failing when more data is used #226

Closed
wants to merge 2 commits into from

Conversation

dangershony
Copy link
Contributor

@dangershony dangershony commented Jan 3, 2020

This PR is not intended to commit but to show a durability failed tests scenario

I have added some more data to ObjectRecoveryTest1 test and increased iterations (to invoke more checkpoints) I would expect this to pass even with the additional changes I made, it seems the stream runs out of data to read and that results in OverflowException when decentralizing a key

Not sure (yet) why exactly, to reproduce just run this test ObjectRecoveryTest1
I came across this issue when writing a POC to simulate the requirements for a project I am currently working on (I can provide the link to the POC if needed but I get the same issues).

Any ideas?

System.OverflowException
  HResult=0x80131516
  Message=Arithmetic operation resulted in an overflow.
  Source=FASTER.test
  StackTrace:
   at FASTER.test.recovery.objects.MyKeySerializer.Deserialize(MyKey& key) in C:\Users\dan\Documents\GitHub\FASTER\cs\test\ObjectRecoveryTest2.cs:line 180
   at FASTER.core.GenericAllocator`2.Deserialize(Byte* raw, Int64 ptr, Int64 untilptr, Record`2[] src, Stream stream) in C:\Users\dan\Documents\GitHub\FASTER\cs\src\core\Allocator\GenericAllocator.cs:line 727
   at FASTER.core.GenericAllocator`2.AsyncReadPageWithObjectsCallback[TContext](UInt32 errorCode, UInt32 numBytes, NativeOverlapped* overlap) in C:\Users\dan\Documents\GitHub\FASTER\cs\src\core\Allocator\GenericAllocator.cs:line 534
   at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pOVERLAP)

@msftclas
Copy link

msftclas commented Jan 3, 2020

CLA assistant check
All CLA requirements met.

@badrishc
Copy link
Contributor

badrishc commented Jan 6, 2020

The design of FASTER object log is as follows: for each segment in the main log, there is a corresponding segment of the object log that stores all the corresponding objects from the main log, laid out end to end.

This implies a requirement for the object log: an object log segment needs to grow large enough to accommodate all objects within the corresponding main log segment. The resolution we adopt is for the object log segment to be of variable size, i.e., we should not size the object log segment to a specific size a priori. To accomplish this, we need to make sure preAllocate is set to false for the object log during its CreateLogDevice call, because otherwise we would wrongly truncate the segment to the pre-allocate size. As action item, we need this point to be documented and the samples/tests to be updated.

Making this change fixes the error you encounter in this PR.

@badrishc
Copy link
Contributor

badrishc commented Jan 6, 2020

FYI, the VSTS tests in this PR fail due to a different reason: the modified testcase is now too large to run in 32-bit, so those version of tests throw an overflow exception.

@dangershony
Copy link
Contributor Author

Ok thanks I will close the PR and try to make the changes you proposed on a POC project I created and will report back

@dangershony dangershony closed this Jan 6, 2020
@dangershony dangershony deleted the faster-poc branch January 6, 2020 21:18
@badrishc
Copy link
Contributor

badrishc commented Jan 6, 2020

It looks like there is a slightly different issue affecting only Snapshot CheckpointType in your repro, where the object log for snapshot is not being read correctly - the fix will be checked in today.

@badrishc
Copy link
Contributor

badrishc commented Jan 6, 2020

See PR #228.

We have now made it so that even if you create an object log with preallocate set to true, it will not cause an error. To achieve this, we will no longer perform preallocation/truncation for an object log device (recognized internally because the segment size is set to -1 during initialization of an object log device).

@badrishc
Copy link
Contributor

badrishc commented Jan 7, 2020

Please use the latest from PR #228 for your POC, as it fixes a related issue with object logs.

@badrishc
Copy link
Contributor

badrishc commented Jan 7, 2020

NuGet with fixes related to this PR has been pushed out: https://www.nuget.org/packages/Microsoft.FASTER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants