Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm/ffi] Optimize ffi calls with multiple Dart_Handle arguments #42341

Closed
dcharkes opened this issue Jun 15, 2020 · 3 comments
Closed

[vm/ffi] Optimize ffi calls with multiple Dart_Handle arguments #42341

dcharkes opened this issue Jun 15, 2020 · 3 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi type-enhancement A request for a change that isn't a bug type-performance Issue relates to performance or code size

Comments

@dcharkes
Copy link
Contributor

FfiCall.Handlex01(RunTime): 347.8671304347826 us.
FfiCall.Handlex02(RunTime): 384.86376755820663 us.
FfiCall.Handlex04(RunTime): 492.896007885658 us.
FfiCall.Handlex10(RunTime): 846.1214043993232 us.
FfiCall.Handlex20(RunTime): 1193.412291169451 us.

We currently call into C for every handle with

Fragment FlowGraphBuilder::AllocateHandle(LocalVariable* api_local_scope) {
Fragment code;
if (api_local_scope != nullptr) {
// Use the reference the scope we created in the trampoline.
code += LoadLocal(api_local_scope);
} else {
// Or get a reference to the top handle scope.
code += GetTopHandleScope();
}
Value* api_local_scope_value = Pop();
auto* instr = new (Z) AllocateHandleInstr(api_local_scope_value);
Push(instr);
code <<= instr;
return code;
}

extern "C" LocalHandle* DLRT_AllocateHandle(ApiLocalScope* scope) {
CHECK_STACK_ALIGNMENT;
TRACE_RUNTIME_CALL("AllocateHandle %p", scope);
LocalHandle* return_value = scope->local_handles()->AllocateHandle();
TRACE_RUNTIME_CALL("AllocateHandle returning %p", return_value);
return return_value;
}
DEFINE_RAW_LEAF_RUNTIME_ENTRY(
AllocateHandle,
1,
false /* is_float */,
reinterpret_cast<RuntimeFunction>(&DLRT_AllocateHandle));

Instead, we can extend the EnterHandleScope with an int how many handles we want, and do some integer arithmetic in IL to compute the addresses of those handles. (Up to the number of handles that fit into one single block.)

Fragment FlowGraphBuilder::EnterHandleScope() {
auto* instr = new (Z)
EnterHandleScopeInstr(EnterHandleScopeInstr::Kind::kEnterHandleScope);
Push(instr);
return Fragment(instr);
}

@dcharkes dcharkes added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi labels Jun 15, 2020
dart-bot pushed a commit that referenced this issue Jun 15, 2020
On x64 desktop in JIT the trampolines with Handles are a bit slower
than trampolines passing integers or Pointers.

FfiCall.Uint64x01(RunTime): 200.8482627033541 us.
FfiCall.PointerUint8x01(RunTime): 261.3910088865656 us.
FfiCall.Handlex01(RunTime): 355.4978670458585 us.
FfiCall.Handlex02(RunTime): 384.86376755820663 us.
FfiCall.Handlex04(RunTime): 492.896007885658 us.
FfiCall.Handlex10(RunTime): 846.1214043993232 us.
FfiCall.Handlex20(RunTime): 1193.412291169451 us.

Issue: #36858
Issue: #41319

New issue for optimizing multiple handles:
#42341

Also cleans up dart2/native which was erroneously left over from the
benchmark duplication for NNBD.

Change-Id: I81223fefc47129d00984492efb9502d5449f0d4a
Cq-Include-Trybots: luci.dart.try:benchmark-linux-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/145593
Commit-Queue: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Jonas Termansen <sortie@google.com>
@mkustermann mkustermann added type-enhancement A request for a change that isn't a bug type-performance Issue relates to performance or code size labels Dec 16, 2020
@dcharkes
Copy link
Contributor Author

dcharkes commented Nov 3, 2021

When using a lot of handles, the old native calls are faster (AOT).

NativeCall.Handlex01(RunTime): 735.7863184994483 us.
NativeCall.Handlex20(RunTime): 836.6783772480134 us.

FfiCall.Handlex01(RunTime): 357.4213724088635 us.
FfiCall.Handlex20(RunTime): 1152.427995391705 us.

source: https://dart-review.googlesource.com/c/sdk/+/218920

@cskau-g What is the distribution of the number of handle arguments in the dart:ui conversion from natives to dart:ffi?

If we have a lot of calls with many handles, we should address this issue.

Related PR (pre-work for dart:ui conversion):

@ghost
Copy link

ghost commented Nov 3, 2021

What is the distribution of the number of handle arguments in the dart:ui conversion from natives to dart:ffi?

Quickly ran the numbers for the 169 APIs I've had a look at and see:

  • 71 interfaces taking and returning no Handles at all (potential leaf calls).
  • 91 interfaces taking 0 Handles among their parameters.
  • 36 interfaces taking 1 Handle ...
  • 27 interfaces taking 2 Handles ...
  • 8 interfaces taking 3 Handles ...
  • a few rare interfaces taking more (up to a maximum of 15) Handles ...

So I imagine optimising for lots of handles won't have a measurable impact on dart:ui in practice.

@mkustermann
Copy link
Member

The handles for arguments are now allocated efficiently on the stack (see 4258a59). Further work about Enter/Exit scope is tracked at #48989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi type-enhancement A request for a change that isn't a bug type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

2 participants