Functions in session #270

marius-klimantavicius · 2020-05-27T17:29:12Z

For #230.
Not sure about naming (ISynchronizationListener).
There is at least one breaking change (ReadAsyncResult is in different class [different number of type arguments]).
For now I duplicated state machine code for sync and async versions, if this is not ideal I can unify into one method again.
Needed to introduce IClientSession, but this seems to be one-time call so no harm there.

…ne calls.

…bines both the listener and functions. Both legacy kv and client session provide struct based implementation to minimize impact of having interface calls.

marius-klimantavicius · 2020-05-28T17:06:03Z

Fixed naming (at least to my liking).
Still missing tests and example.

badrishc · 2020-05-29T03:48:11Z

Thanks for the contribution. As this is a breaking change, we need to better understand what benefits it brings users, in order to justify the change.

@tli2 what do you think of this: "For now I duplicated state machine code for sync and async versions, if this is not ideal I can unify into one method again."

tli2 · 2020-06-02T01:02:34Z

@badrishc

I only took a cursory look. In principle I think that's the right thing to do. For this particular case there also seems to be not too much actual code duplication.

If I am to be nit-picky I'd say we need a better model for when it is appropriate to use which, and reflect that in the API naming and documentation. We might also consider enforcing that separation in the code (e.g. if we decide that OnThreadAsync can only be called from sessions that support async, we can check that in FasterStateMachine.cs)

Not necessarily within the scope of this PR, of course.

marius-klimantavicius · 2020-06-02T05:24:48Z

Should I experiment with moving Input, Output and Context out of FasterKV (to be part of method signature not of class itself, part of IFunctions and ClientSession)? Rationale - different callbacks might work with different Intput, Output, Context.

marius-klimantavicius · 2020-06-02T15:45:49Z

One (relatively annoying) issue with moving Input, Output and Context is that type inference for NewSession and ResumeSession do not work for primary FasterKV<Key, Value> (there is also legacy FasterKV for backwards compat).

Can slightly reduce the noise by adding helper "builder" (still need to specify Input, Output and Context but functions can be inferred):

Builder<Input, Output, Context> For<Input, Output, Context>() => new Builder<...>(this);

using (var session = fasterKV.For<MyInput, MyOutput, MyContext>().NewSession(new MyFunctions()...)

cs/src/core/Index/FasterLog/FasterLog.cs

cs/src/core/Index/Common/AddressInfo.cs

badrishc · 2020-06-05T04:10:45Z

I think it would be great for us to support net 462 as well, so those changes are welcome, I would think, as long as perf is unaffected.

Fixed a bug where compaction would use stale value if multiple records with different lengths (for varlen structs) where encountered during compaction.

…antavicius/FASTER into functions_in_session

marius-klimantavicius · 2020-06-06T17:57:37Z

Latest commit is for #272 - Added custom compaction functions. Note that in addition to having custom IsDeleted I added Copy and CopyInPlace to be able to deal with concurrency (now that I am writing this - do we really need these? Is it possible to compact inside mutable region? Probably yes as untilAddress can be tail).

Net462 is supported when using FASTER as binary dependency (e.g. via nuget) however I am embedding sources directly (in addition I sometimes do have more changes) thus locally I've removed readonly members, nullables, local static functions, etc. For PRs back to FASTER I am trying to keep it close to original but still some changes might come through (like extracting static local functions to full functions [this is more of a personal preference], I can move them back).

marius-klimantavicius · 2020-06-09T14:36:01Z

I think I will do one more change - I want to add an overload to IVariableLengthStruct.GetLength to take in TInput. It is possible that due to RMW I might want to change the size of the record based on the incoming input.

…ocating new record during RMW.

badrishc · 2020-06-16T06:18:32Z

For now I duplicated state machine code for sync and async versions, if this is not ideal I can unify into one method again.

I think this PR should not duplicate the async and sync versions, and instead should keep them unified as before. It seems orthogonal to the intent of this PR, correct me if that's not the case. We can consider the issue of better handling the async vs sync state machine in a separate PR.

badrishc · 2020-06-05T01:16:00Z

cs/src/core/Index/Synchronization/IndexSnapshotStateMachine.cs

@@ -95,15 +95,36 @@ internal class IndexSnapshotTask : ISynchronizationTask

                    if (async && !faster.IsIndexFuzzyCheckpointCompleted())
                    {
-                        clientSession?.UnsafeSuspendThread();
+                        fasterSession?.UnsafeSuspendThread();


will this get inlined?

badrishc · 2020-06-18T16:22:06Z

Hi @marius-klimantavicius, this is a really nicely done PR, thanks for your contribution. I am done with the review and have no other comments than this one. Let me know what you think.

For now I duplicated state machine code for sync and async versions, if this is not ideal I can unify into one method again.

I think this PR should not duplicate the async and sync versions, and instead should keep them unified as before. It seems orthogonal to the intent of this PR, correct me if that's not the case. We can consider the issue of better handling the async vs sync state machine in a separate PR.

badrishc · 2020-06-18T16:24:45Z

One thing left is to verify that there is no performance regression - have you done some testing in this regard?

@TedHartMS - can we use your PerfTest benchmark PR to see the impact of this PR?

marius-klimantavicius · 2020-06-22T16:16:28Z

Removed sync methods from state machine (it seems that during my experiments I stopped using sync version after all).

As for performance regression - I haven't performed any testing regarding this. Later will try to see if I can use PerfTest for that.

marius-klimantavicius · 2020-07-02T07:36:57Z

I tried running PerfTest a few times (I can't find YCSB files so I am using synthetic test only) however I am unable to get stable results on my machine (even with fixed seed for random number generator). I've tried running only a few times (~10-15) as each run (Release build, Ctrl-F5) takes at least 5 minutes.

TedHartMS · 2020-07-02T16:29:41Z

What's your machine configuration? The Benchmark test's Release build has a working set around 24GB. The PerfTest app @badrishc referred to is a separate branch and will be available soon.

marius-klimantavicius · 2020-07-02T17:33:29Z

I see, I have only 16GB of RAM on this machine maybe it was due swapping (I might try the PerfTest branch a bit later).

badrishc · 2020-07-09T19:10:01Z

We are noticing a 10% drop in performance on a single thread.

Before

FASTER.benchmark.exe -b 0 -t 1 -n 1 -r 50 -d zipf

loading keys from D:\ycsb_files\load_zipf_250M_raw.dat into memory...
loaded 250000000 keys.
loading txns from D:\ycsb_files\run_zipf_250M_1000M_raw.dat into memory...
loaded 1000000000 txns.
Executing setup.
Loading time: 79089ms
Executing experiment.
Thread 0 done; 50645771 reads, 50645996 writes, in 30105 ms.
End tail address = 6000001496
Total 101291767 ops done in 30.107 secs.
##, zipf, 1, 50, 1, 3364392.5665127714, 0

After

FASTER.benchmark.exe -b 0 -t 1 -n 1 -r 50 -d zipf
loading keys from D:\ycsb_files\load_zipf_250M_raw.dat into memory...
loaded 250000000 keys.
loading txns from D:\ycsb_files\run_zipf_250M_1000M_raw.dat into memory...
loaded 1000000000 txns.
Executing setup.
Loading time: 88491ms
Executing experiment.
Thread 0 done; 45542661 reads, 45542155 writes, in 30139 ms.
End tail address = 6000001496
Total 91084816 ops done in 30.141 secs.
##, zipf, 1, 50, 1, 3021957.333864172, 0

badrishc · 2020-07-09T19:37:22Z

With a minor update to ClientSession.cs, I was able to restore performance to the same level as before. Please review if the change is ok with you.

marius-klimantavicius · 2020-07-10T15:54:08Z

Yep these are absolutely OK (I was hoping new AsyncFasterSession(this) to be optimized away but I was wrong). Having readonly functions is the same as before this PR (though once I figure out how to properly benchmark [provided my pc is powerful enough] I will check if perf is not affected when functions is struct [due to defensive copy though only matter for non-empty structs]).

badrishc · 2020-07-10T16:00:24Z

Yeah, 'new AsyncFasterSession(this)' is the main change that got back the 10%. The other change was just a move to be in the safe side. Thanks again!

badrishc · 2020-07-10T17:36:24Z

I tried removing and re-adding the readonly marker for functions, but the results are inconclusive. I'm seeing +/- 10% across runs. It does not block this PR from getting merged, but it would be nice to get to the bottom of whats going on in the critical path, and why certain benchmark runs are regularly (but not always) 10% slower.

badrishc · 2020-09-03T05:35:38Z

One (relatively annoying) issue with moving Input, Output and Context is that type inference for NewSession and ResumeSession do not work for primary FasterKV<Key, Value> (there is also legacy FasterKV for backwards compat).

Can slightly reduce the noise by adding helper "builder" (still need to specify Input, Output and Context but functions can be inferred):
Builder<Input, Output, Context> For<Input, Output, Context>() => new Builder<...>(this);
using (var session = fasterKV.For<MyInput, MyOutput, MyContext>().NewSession(new MyFunctions()...)

@marius-klimantavicius: In PR #310, I have extended the builder slightly, to support writing of slightly cleaner code that produces a proper client session with the right generic params (i.e., specific Function instead of IFunctions<>):

var session = store.For(new MyFunctions()).NewSession<MyFunctions>(...);

It is still ugly (MyFunctions has to be uttered twice), but at least user does not need to utter the types of Input, Output, Context.

marius-klimantavicius and others added 4 commits May 11, 2020 18:03

Moving function callbacks to sessions (microsoft#230)

ea01b56

Use special interface to interact with state machine(s).

8bf13d5

Mark synchronization listener to be struct for jit to be able to inli…

ef51d17

…ne calls.

Introduced IFasterSession instead of ISynchronizationListener. It com…

99e0b15

…bines both the listener and functions. Both legacy kv and client session provide struct based implementation to minimize impact of having interface calls.

Merged from master

a0c11c3

Move out Input, Output, Context from FasterKV into methods.

786677d

Added simple test for multiple sessions

08124d4

marius-klimantavicius marked this pull request as ready for review June 3, 2020 15:27

marius-klimantavicius marked this pull request as draft June 4, 2020 03:53

marius-klimantavicius mentioned this pull request Jun 4, 2020

[C#] Proposal: Allow the user to provide function(s) to determine whether entry is alive #272

Closed

Merge branch 'master' into functions_in_session

bb0d99c

badrishc reviewed Jun 5, 2020

View reviewed changes

cs/src/core/Index/FasterLog/FasterLog.cs Show resolved Hide resolved

badrishc reviewed Jun 5, 2020

View reviewed changes

cs/src/core/Index/Common/AddressInfo.cs Outdated Show resolved Hide resolved

marius-klimantavicius added 2 commits June 6, 2020 20:42

Added `custom compaction functions (microsoft#272).

bbdd47f

Fixed a bug where compaction would use stale value if multiple records with different lengths (for varlen structs) where encountered during compaction.

Merge branch 'functions_in_session' of https://github.com/marius-klim…

21298c7

…antavicius/FASTER into functions_in_session

marius-klimantavicius marked this pull request as ready for review June 6, 2020 17:45

marius-klimantavicius added 2 commits June 9, 2020 17:57

Provide ref Input input for calculating size of new record when all…

d4479fc

…ocating new record during RMW.

Merge branch 'master' into functions_in_session

33a6760

Merge branch 'master' into functions_in_session

79f7212

badrishc reviewed Jun 18, 2020

View reviewed changes

Removed sync methods of state machine.

d65a825

marius-klimantavicius and others added 2 commits July 2, 2020 20:33

Merge branch 'master' into functions_in_session

034bbb8

Merge branch 'master' into functions_in_session

8427ae0

Update ClientSession.cs

39506fb

badrishc merged commit 9b96efc into microsoft:master Jul 10, 2020

badrishc mentioned this pull request Jul 23, 2020

[C#] Input-specific functions in IVariableLengthStruct #289

Closed

marius-klimantavicius deleted the functions_in_session branch September 3, 2020 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functions in session #270

Functions in session #270

marius-klimantavicius commented May 27, 2020 •

edited

Loading

marius-klimantavicius commented May 28, 2020

badrishc commented May 29, 2020

tli2 commented Jun 2, 2020 •

edited

Loading

marius-klimantavicius commented Jun 2, 2020

marius-klimantavicius commented Jun 2, 2020 •

edited

Loading

badrishc commented Jun 5, 2020

marius-klimantavicius commented Jun 6, 2020

marius-klimantavicius commented Jun 9, 2020

badrishc commented Jun 16, 2020

badrishc Jun 5, 2020

badrishc commented Jun 18, 2020

badrishc commented Jun 18, 2020

marius-klimantavicius commented Jun 22, 2020

marius-klimantavicius commented Jul 2, 2020

TedHartMS commented Jul 2, 2020

marius-klimantavicius commented Jul 2, 2020

badrishc commented Jul 9, 2020

badrishc commented Jul 9, 2020

marius-klimantavicius commented Jul 10, 2020

badrishc commented Jul 10, 2020

badrishc commented Jul 10, 2020

badrishc commented Sep 3, 2020 •

edited

Loading

Functions in session #270

Functions in session #270

Conversation

marius-klimantavicius commented May 27, 2020 • edited Loading

marius-klimantavicius commented May 28, 2020

badrishc commented May 29, 2020

tli2 commented Jun 2, 2020 • edited Loading

marius-klimantavicius commented Jun 2, 2020

marius-klimantavicius commented Jun 2, 2020 • edited Loading

badrishc commented Jun 5, 2020

marius-klimantavicius commented Jun 6, 2020

marius-klimantavicius commented Jun 9, 2020

badrishc commented Jun 16, 2020

badrishc Jun 5, 2020

Choose a reason for hiding this comment

badrishc commented Jun 18, 2020

badrishc commented Jun 18, 2020

marius-klimantavicius commented Jun 22, 2020

marius-klimantavicius commented Jul 2, 2020

TedHartMS commented Jul 2, 2020

marius-klimantavicius commented Jul 2, 2020

badrishc commented Jul 9, 2020

Before

After

badrishc commented Jul 9, 2020

marius-klimantavicius commented Jul 10, 2020

badrishc commented Jul 10, 2020

badrishc commented Jul 10, 2020

badrishc commented Sep 3, 2020 • edited Loading

marius-klimantavicius commented May 27, 2020 •

edited

Loading

tli2 commented Jun 2, 2020 •

edited

Loading

marius-klimantavicius commented Jun 2, 2020 •

edited

Loading

badrishc commented Sep 3, 2020 •

edited

Loading