-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP, RFC FS-1072] task and state machine support #6634
Conversation
Note the library implementation is not big - just 300 lines. And it's pretty transparent what is happening apart from the advanced use of SRTP resolution |
The main problem here is adding the signature file, so large chunks are commented out. |
Link suggestion since the RFC isn't out yet: fsharp/fslang-suggestions#581 |
src/fsharp/FSharp.Core/tasks.fs
Outdated
let __newEntryPoint() : int = failwith "__newEntryPoint should always be removed from compiled code" | ||
|
||
[<MethodImpl(MethodImplOptions.NoInlining)>] | ||
let __machine<'T> : 'T = failwith "__newEntryPoint should always be removed from compiled code" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the error message mention __machine
instead of __newEntryPoint
?
I have updated this PR removing one source of allocation, and updated the performance results The allocation profile of the F# code is now identical to that of the C# baselines, except for The main thing now is to consider how to make the state machine feature sufficiently complete to allow its incorporation. From the examples I've worked through I'm convinced of the general utility of the mechanism, however there are some kinds of code that can't currently be compiled to state machines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identified some last easy wins. If that doesn't pull the async profile in line with C# I'd have to pull out the decompiler/profiler to see what the statemachine codegen actually produces ;)
@dsyme end result looks great all things considering. Happy you were able to maintain quite some library level code. Thanks a lot for putting in the significant effort!
// A using statement is just a try/finally with the finally block disposing if non-null. | ||
builder.TryFinally( | ||
(fun () -> __expand_body disp), | ||
(fun () -> if not (isNull (box disp)) then disp.Dispose())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove box by comparing via Object.ReferenceEquals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this done? My understanding is Object.ReferenceEquals(disp, null)
will also box, unless the JIT elmiinates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jit eliminates due to generic specialization via the flexibile type sig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should double check either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely. Though it's quite a simple path, boxes are semantics changing so cannot be eliminated easily while ReferenceEquals gets aggressively inlined to what amounts to cmp disp, null; jne
which for 'T = struct always is false = true
. Branch elimination does the rest.
@NinoFloris Thanks for the review. I've applied the changes and fixed a race condition. Latest perf results are shown below F# is 26% slower for SyncBinds, and 23% faster than C# for AsyncBinds, at least on this test run. Allocations are zero for repeated SyncBindSingleTask, which is as hoped.
|
The asyncbinds result makes me suspicious though. Why is it deviating so much from both C# and Taskbuilder? (which are quite close to each other). Could it point to a behavioral difference? Good to see a full MB of allocations, about 30%, shaved off by those changes, leaving just a tiny difference. Really have to see the IL for that last bit though, could be an FSharpRef or some other hiding in there. |
I'll do a few more runs tomorrow, the results are a bit variable. I suspect the C# one just came in slow on that particular test run
I think the difference is just that the state machines are bigger, and when their state gets boxed this results in more allocated heap size. For example the F# state machine for
where the C# one has this:
Looking at that I'm quite surprised the difference isn't actually great, but I'm not immediately inclined to implement the state machine field sharing that C# has, partly because it can be a nightmare for debugging and hell if it goes wrong. Also we don't do that optimization for sequence expressions. |
The build is failing on Linux and Mac because the bootstrap compiler is not being used, see #6380 |
Good to hear that, local machine runs are hard to get stable, thermal throttling and overactive background services cause night and day differences.
Agreed, it's a bit more work during GC to track the extra references inside most of the awaiters ( That does make me remember something else C# does, which is valuable to add. Also, and this applies more to C# — we don't have |
@dsyme I often have to use this:
would love to see some nice syntactic sugar and/or optimization that doesn't even create a task |
Couldn't you just use Task.FromResult? |
right! Does that create an actual task as overhead? |
As far as I can see here (https://referencesource.microsoft.com/mscorlib/R/11a386e7d7cae64a.html) and here (https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.fromresult?view=netframework-4.8#remarks) it creates a Task with the result immediately set and the status of RanToCompletion |
ok. question is: can we do similar in |
I'm not sure I understand the question. Wouldn't |
yeah maybe I'm stupid here. need to think about my question ... |
If it could be either; but usually is complete; then you can use ValueTask<'T> instead of Task<'T>. |
Closing in favour of #6811 from a feature branch |
[ Closed in favour of #6811 from a feature branch ]
This inserts a heavily modified (but semantically almost completely compatible) version of TaskBuilder.fs into FSharp.Core and adds a general state machine compilation mechanism for F# computation expressions.
Overview
The primary intention is to add quality support for
task { ... }
to F#. This meanstask { ... }
We won't add the support until all of the above are done. We are starting with TaskBuilder.fs as a reference library implementation to help define semantics.
The mechanism we use to do this is to first add general support for "generated state machines" in F#.
Technical Note: Specifying generated state machines
From a high level, state machines are lovely to implement inefficiently in a functional language e.g. using continuations or F#
async { ... }
or TaskBuilder.fstask { ... }
. These tend to allocate continuations and other values like crazy. Thus their efficient implementation (with low or zero allocation rates) means generating C-like constructs - label, goto, jumptables and and other such things.It is hard to recover a low-allocation implementation from a functional/continuation/... implementation directly - but we can if we add enough compiler support for generating exactly the right code. Generation of efficient state machines from needs compiler support. For example both C# and F# support state machine compilation of C# iterator methods and F#
seq { .... }
expressions.The magic heart of a typical generated state machine is a
MoveNext
orStep
function that takes an integerprogram counter
(pc
) and jumps to a target:This is roughly what compiled
seq { ... }
code looks like in F# today and what compiled async/await code looks like in C#, at a very high level.Note you can't write this kind of code directly in F# - there is no
goto
and thegoto
often jump directly into code resuming from the last step of the state machine.In this mechanism we allow the specification of new varieties of generated state machines in library code, normally as part of the implementation of a computation-expression builder. (Note this is an extremely subtle mechanism and its validity is not yet checked by the compiler - "caveat emptor, here be dragons").
To help define generated state machines we use some primitives, currently
plus some special magic value names such
__expand_XYZ
and the very obscure__machine_step$cont
.A generated state machine expression is any expression of the form
Here
SomeOverride
will be compiled as a multi-entry method where the entry point is determined bypc
using a jumptable at the start of the method.Important The content of a generated state machine is specified by
__expand_code
which must be fully inlined code - that is, fully inlined to reveal the full implementation of the state machine. You can think of everything beginning with__expand_ABC
as a macro where macro expansion is implemented by F# inlining.For example,
__expand_code
could be a call to :where
bindTask
is inlined:and
returnTask
is inlined:This shows the use of some of the other constructs in state machine specification - you can see
__newEntryPoint()
__entryPoint
__expand_continuation
this
pointer) via__machine
__return
The constructs that can be used in the (inlined, expanded) state machine code is limited and in some cases (e.g. try/with blocks and
while
loops) extremely subtle. It is easy to create incorrect and invalid code using this mechanism.For full details of current status see the implementation in the PR, details may have changed from the above.
When specifying state machines, it is common to return a typed "dummy" struct such as
TaskStep<'T>
from each call to theStep
function whereT
represents the result of the task. This is sort of a phantom type.Example:
sync { ... }
As a micro "no-op" example of defining a builder which gets compiled using state machines, we can define
sync { ... }
which is for entirely synchronous computation with no special semantics.Implementation: https://github.com/dsyme/visualfsharp/blob/tasks/tests/fsharp/core/state-machines/sync.fs
Examples of use:
Code performance will be approximately the same as normal F# code except for one allocation for each execution of each
sync { .. }
as we allocate the "SyncMachine". In later work we may be able to remove this.Example:
task { ... }
See the implementation in tasks.fs. There is complication due to the need to bind to task-pattern tasks and asyncs.
Example:
taskSeq { ... }
This is for state machine compilation of computation expressions that generate
IAsyncEnumerable<'T>
values. This is a headline C# 8.0 feature and a very large feature for C#. It appears to mostly drop out as library code once general-purpose state machine support is available.See the example in taskSeq.fs. Not everything is implemented yet but the basics work.
Example
seq2 { ... }
See https://github.com/dsyme/visualfsharp/blob/tasks/tests/fsharp/core/state-machines/seq2.fs
This is an example showing how to do state machine compilation for
seq2 { ... }
expressions, akin toseq { ... }
expressions, for which we bake-in state machine compilation into the F# compiler today. Caveats:https://github.com/microsoft/visualfsharp/pull/6634/files#diff-4837d60671e85e130108370a6a8c0597R169
I think it's possible this version actually gives better stack traces than the current sequence expression support in the F# compiler.
This is essentially trimming the task support out of
taskSeq { ... }
.Examples
list { ... }
,array { ... }
See https://github.com/dsyme/visualfsharp/blob/tasks/tests/fsharp/core/state-machines/list.fs
This example defines
list { .. }
,array { .. }
andrsarray { .. }
for collections, where the computations generate directly into aResizeArray
(System.Collections.Generic.List<'T>
).F#'s existing
[ .. ]
and[| ... |]
andseq { .. } |> Seq.toResizeArray
all use an intermediateIEnumerable
which is then iterated to populate aResizeArray
and then converted to the final immutable collection. In contrast, generating directly into aResizeArray
is potentially more efficient (and forlist { ... }
further perf improvements are possible if we put this inFSharp.Core
and use the mutate-tail-cons-cell trick to generate the list directly). This technique has been known for a while and can give faster collection generation but it has not been possible to get good code generation for the expressions in many cases. Note these aren't really "state machines" because there are no resumption points - there is just an implicit collection we are yielding into in otherwise synchronous code.Using a directly-generating
list { ... }
seems to give a significant speedup over[ ... ]
in the example I just tried, included in the code.Technical Note: Expected allocation profile for
task { ... }
The allocation performance of the current approach should be:
task { ... }
task { ... }
let!
ordo!
bind in atask { .. }
- I'm not quite sure how many (we may be able to remove these when binding to another task produced bytask { ... }
, though I'm not sure).one allocation on each let mutable used inside the task - these currently get turned into ref cells through * [x] the autobox transformation when
let mutable
is used in a taskMore improvements are needed - see discussion below. We should compare with TaskBuilder.fs, Ply and C#.
Performance Status
Systematic perf testing of
task { ... }
is required.Some benchmarks are at
tests\fsharp\perf\tasks
in the PR. Please help improve this.Currently compile and run with:
The build/run cycle is a bit irritating as BenchmarkDotNet seems to run the "FSharpAsync" slow benchmarks around 100 times. Please help fix that.
Here are results at last run: