Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254231: Implementation of Foreign Linker API (Incubator) #634

Closed
wants to merge 95 commits into from

Conversation

mcimadamore
Copy link
Contributor

@mcimadamore mcimadamore commented Oct 13, 2020

This patch contains the changes associated with the first incubation round of the foreign linker access API incubation
(see JEP 389 [1]). This work is meant to sit on top of the foreign memory access support (see JEP 393 [2] and associated pull request [3]).

The main goal of this API is to provide a way to call native functions from Java code without the need of intermediate JNI glue code. In order to do this, native calls are modeled through the MethodHandle API. I suggest reading the writeup [4] I put together few weeks ago, which illustrates what the foreign linker support is, and how it should be used by clients.

Disclaimer: the pull request mechanism isn't great at managing dependent reviews. For this reasons, I'm attaching a webrev which contains only the differences between this PR and the memory access PR. I will be periodically uploading new webrevs, as new iterations come out, to try and make the life of reviewers as simple as possible.

A big thank to Jorn Vernee and Vladimir Ivanov - they are the main architects of all the hotspot changes you see here, and without their help, the foreign linker support wouldn't be what it is today. As usual, a big thank to Paul Sandoz, who provided many insights (often by trying the bits first hand).

Thanks
Maurizio

Webrev:
http://cr.openjdk.java.net/~mcimadamore/8254231_v1/webrev

Javadoc:

http://cr.openjdk.java.net/~mcimadamore/8254231_v1/javadoc/jdk/incubator/foreign/package-summary.html

Specdiff (relative to [3]):

http://cr.openjdk.java.net/~mcimadamore/8254231_v1/specdiff_delta/overview-summary.html

CSR:

https://bugs.openjdk.java.net/browse/JDK-8254232

API Changes

The API changes are actually rather slim:

  • LibraryLookup
    • This class allows clients to lookup symbols in native libraries; the interface is fairly simple; you can load a library by name, or absolute path, and then lookup symbols on that library.
  • FunctionDescriptor
    • This is an abstraction that is very similar, in spirit, to MethodType; it is, at its core, an aggregate of memory layouts for the function arguments/return type. A function descriptor is used to describe the signature of a native function.
  • CLinker
    • This is the real star of the show. A CLinker has two main methods: downcallHandle and upcallStub; the first takes a native symbol (as obtained from LibraryLookup), a MethodType and a FunctionDescriptor and returns a MethodHandle instance which can be used to call the target native symbol. The second takes an existing method handle, and a FunctionDescriptor and returns a new MemorySegment corresponding to a code stub allocated by the VM which acts as a trampoline from native code to the user-provided method handle. This is very useful for implementing upcalls.
    • This class also contains the various layout constants that should be used by clients when describing native signatures (e.g. C_LONG and friends); these layouts contain additional ABI classfication information (in the form of layout attributes) which is used by the runtime to infer how Java arguments should be shuffled for the native call to take place.
    • Finally, this class provides some helper functions e.g. so that clients can convert Java strings into C strings and back.
  • NativeScope
    • This is an helper class which allows clients to group together logically related allocations; that is, rather than allocating separate memory segments using separate try-with-resource constructs, a NativeScope allows clients to use a single block, and allocate all the required segments there. This is not only an usability boost, but also a performance boost, since not all allocation requests will be turned into malloc calls.
  • MemorySegment
    • Only one method added here - namely handoff(NativeScope) which allows a segment to be transferred onto an existing native scope.

Safety

The foreign linker API is intrinsically unsafe; many things can go wrong when requesting a native method handle. For instance, the description of the native signature might be wrong (e.g. have too many arguments) - and the runtime has, in the general case, no way to detect such mismatches. For these reasons, obtaining a CLinker instance is a restricted operation, which can be enabled by specifying the usual JDK property -Dforeign.restricted=permit (as it's the case for other restricted method in the foreign memory API).

Implementation changes

The Java changes associated with LibraryLookup are relative straightforward; the only interesting thing to note here is that library loading does not depend on class loaders, so LibraryLookup is not subject to the same restrictions which apply to JNI library loading (e.g. same library cannot be loaded by different classloaders).

As for NativeScope the changes are again relatively straightforward; it is an API which sits neatly on top of the foreign meory access API, providing some kind of allocation service which shares the same underlying memory segment(s), and turns an allocation request into a segment slice, which is a much less expensive operation. NativeScope comes in two variants: there are native scopes for which the allocation size is known a priori, and native scopes which can grow - these two schemes are implemented by two separate subclasses of AbstractNativeScopeImpl.

Of course the bulk of the changes are to support the CLinker downcall/upcall routines. These changes cut pretty deep into the JVM; I'll briefly summarize the goal of some of this changes - for further details, Jorn has put together a detailed writeup which explains the rationale behind the VM support, with some references to the code [5].

The main idea behind foreign linker is to infer, given a Java method type (expressed as a MethodType instance) and the description of the signature of a native function (expressed as a FunctionDescriptor instance) a recipe that can be used to turn a Java call into the corresponding native call targeting the requested native function.

This inference scheme can be defined in a pretty straightforward fashion by looking at the various ABI specifications (for instance, see [6] for the SysV ABI, which is the one used on Linux/Mac). The various CallArranger classes, of which we have a flavor for each supported platform, do exactly that kind of inference.

For the inference process to work, we need to attach extra information to memory layouts; it is no longer sufficient to know e.g. that a layout is 32/64 bits - we need to know whether it is meant to represent a floating point value, or an integral value; this knowledge is required because floating points are passed in different registers by most ABIs. For this reason, CLinker offers a set of pre-baked, platform-dependent layout constants which contain the required classification attributes (e.g. a Clinker.TypeKind enum value). The runtime extracts this attribute, and performs classification accordingly.

A native call is decomposed into a sequence of basic, primitive operations, called Binding (see the great javadoc on the Binding.java class for more info). There are many such bindings - for instance the Move binding is used to move a value into a specific machine register/stack slot. So, the main job of the various CallingArranger classes is to determine, given a Java MethodType and FunctionDescriptor what is the set of bindings associated with the downcall/upcall.

At the heart of the foreign linker support is the ProgrammableInvoker class. This class effectively generates a MethodHandle which follows the steps described by the various bindings obtained by CallArranger. There are actually various strategies to interpret these bindings - listed below:

  • basic intepreted mode; in this mode, all bindings are interpreted using a stack-based machine written in Java (see BindingInterpreter), except for the Move bindings. For these bindings, the move is implemented by allocating a buffer (whose size is ABI specific) and by moving all the lowered values into positions within this buffer. The buffer is then passed to a piece of assembly code inside the VM which takes values from the buffer and moves them in their expected registers/stack slots (note that each position in the buffer corresponds to a different register). This is the most general invocation mode, the more "customizable" one, but also the slowest - since for every call there is some extra allocation which takes place.

  • specialized interpreted mode; same as before, but instead of interpreting the bindings with a stack-based interpreter, we generate a method handle chain which effectively interprets all the bindings (again, except Move ones).

  • intrinsified mode; this is typically used in combination with the specialized interpreted mode described above (although it can also be used with the Java-based binding interpreter). The goal here is to remove the buffer allocation and copy by introducing an additional JVM intrinsic. If a native call recipe is constant (e.g. the set of bindings is constant, which is probably the case if the native method handle is stored in a static, final field), then the VM can generate specialized assembly code which interprets the Move binding without the need to go for an intermediate buffer. This gives us back performances that are on par with JNI.

For upcalls, the support is not (yet) as advanced, and only the basic interpreted mode is available there. We plan to add support for intrinsified modes there as well, which should considerably boost perfomances (probably well beyond what JNI can offer at the moment, since the upcall support in JNI is not very well optimized).

Again, for more readings on the internals of the foreign linker support, please refer to [5].

Test changes

Many new tests have been added to validate the foreign linker support; we have high level tests (see StdLibTest) which aim at testing the linker from the perspective of code that clients could write. But we also have deeper combinatorial tests (see TestUpcall and TestDowncall) which are meant to stress every corner of the ABI implementation. There are also some great tests (see the callarranger folder) which test the various CallArrangers for all the possible platforms; these tests adopt more of a white-box approach - that is, instead of treating the linker machinery as a black box and verify that the support works by checking that the native call returned the results we expected, these tests aims at checking that the set of bindings generated by the call arranger is correct. This also mean that we can test the classification logic for Windows, Mac and Linux regardless of the platform we're executing on.

Some additional microbenchmarks have been added to compare the performances of downcall/upcall with JNI.

[1] - https://openjdk.java.net/jeps/389
[2] - https://openjdk.java.net/jeps/393
[3] - https://git.openjdk.java.net/jdk/pull/548
[4] - https://github.com/openjdk/panama-foreign/blob/foreign-jextract/doc/panama_ffi.md
[5] - http://cr.openjdk.java.net/~jvernee/docs/Foreign-abi%20downcall%20intrinsics%20technical%20description.html


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Linux x86 Windows x64 macOS x64
Build ❌ (1/6 failed) ❌ (2/2 failed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (9/9 passed)

Failed test tasks

Issue

  • JDK-8254231: Implementation of Foreign Linker API (Incubator)

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/634/head:pull/634
$ git checkout pull/634

…ator)

This patch contains the changes associated with the third incubation round of the foreign memory access API incubation  (see JEP 393 [1]). This iteration focus on improving the usability of the API in 3 main ways:

* first, by providing a way to obtain truly *shared* segments, which can be accessed and closed concurrently from multiple threads
* second, by providing a way to register a memory segment against a `Cleaner`, so as to have some (optional) guarantee that the memory will be deallocated, eventually
* third, by not requiring users to dive deep into var handles when they first pick up the API; a new `MemoryAccess` class has been added, which defines several useful dereference routines; these are really just thin wrappers around memory access var handles, but they make the barrier of entry for using this API somewhat lower.

A big conceptual shift that comes with this API refresh is that the role of `MemorySegment` and `MemoryAddress` is not the same as it used to be; it used to be the case that a memory address could (sometimes, not always) have a back link to the memory segment which originated it; additionally, memory access var handles used `MemoryAddress` as a basic unit of dereference.

This has all changed as per this API refresh;  now a `MemoryAddress` is just a dumb carrier which wraps a pair of object/long addressing coordinates; `MemorySegment` has become the star of the show, as far as dereferencing memory is concerned. You cannot dereference memory if you don't have a segment. This improves usability in a number of ways - first, it is a lot easier to wrap native addresses (`long`, essentially) into a `MemoryAddress`; secondly, it is crystal clear what a client has to do in order to dereference memory: if a client has a segment, it can use that; otherwise, if the client only has an address, it will have to create a segment *unsafely* (this can be done by calling `MemoryAddress::asSegmentRestricted`).

A list of the API, implementation and test changes is provided below. If  you have any questions, or need more detailed explanations, I (and the  rest of the Panama team) will be happy to point at existing discussions,  and/or to provide the feedback required.

A big thank to Erik Osterlund, Vladimir Ivanov and David Holmes, without whom the work on shared memory segment would not have been possible.

Thanks
Maurizio

Javadoc:

http://cr.openjdk.java.net/~mcimadamore/8254162_v1/javadoc/jdk/incubator/foreign/package-summary.html

Specdiff:

http://cr.openjdk.java.net/~mcimadamore/8254162_v1/specdiff/jdk/incubator/foreign/package-summary.html

CSR:

https://bugs.openjdk.java.net/browse/JDK-8254163

* `MemorySegment`
  * drop factory for restricted segment (this has been moved to `MemoryAddress`, see below)
  * added a no-arg factory for a native restricted segment representing entire native heap
  * rename `withOwnerThread` to `handoff`
  * add new `share` method, to create shared segments
  * add new `registerCleaner` method, to register a segment against a cleaner
  * add more helpers to create arrays from a segment e.g. `toIntArray`
  * add some `asSlice` overloads (to make up for the fact that now segments are more frequently used as cursors)
  * rename `baseAddress` to `address` (so that `MemorySegment` can implement `Addressable`)
* `MemoryAddress`
  * drop `segment` accessor
  * drop `rebase` method and replace it with `segmentOffset` which returns the offset (a `long`) of this address relative to a given segment
* `MemoryAccess`
  * New class supporting several static dereference helpers; the helpers are organized by carrier and access mode, where a carrier is one of the usual suspect (a Java primitive, minus `boolean`); the access mode can be simple (e.g. access base address of given segment), or indexed, in which case the accessor takes a segment and either a low-level byte offset,or a high level logical index. The classification is reflected in the naming scheme (e.g. `getByte` vs. `getByteAtOffset` vs `getByteAtIndex`).
* `MemoryHandles`
  * drop `withOffset` combinator
  * drop `withStride` combinator
  * the basic memory access handle factory now returns a var handle which takes a `MemorySegment` and a `long` - from which it is easy to derive all the other handles using plain var handle combinators.
* `Addressable`
  * This is a new interface which is attached to entities which can be projected to a `MemoryAddress`. For now, both `MemoryAddress` and `MemorySegment` implement it; we have plans, with JEP 389 [2] to add more implementations. Clients can largely ignore this interface, which comes in really handy when defining native bindings with tools like `jextract`.
* `MemoryLayouts`
  * A new layout, for machine addresses, has been added to the mix.

There are two main things to discuss here: support for shared segments, and the general simplification of the memory access var handle support.

The support for shared segments cuts in pretty deep in the VM. Support for shared segments is notoriously hard to achieve, at least in a way that guarantees optimal access performances. This is caused by the fact that, if a segment is shared, it would be possible for a thread to close it while another is accessing it.

After considering several options (see [3]), we zeroed onto an approach which is inspired by an happy idea that Andrew Haley had (and that he reminded me of at this year OpenJDK committer workshop - thanks!). The idea is that if we could *freeze* the world (e.g. with a GC pause), while a segment is closed, we could then prevent segments from being accessed concurrently to a close operation. For this to work, it  is crucial that no GC safepoints can occur between a segment liveness check and the access itself (otherwise it would be possible for the accessing thread to stop just right before an unsafe call). It also relies on the fact that hotspot/C2 should not be able to propagate loads across safepoints.

Sadly, none of these conditions seems to be valid in the current implementation, so we needed to resort to a bit of creativity. First, we noted that, if we could mark so called *scoped* method with an annotation, it would be very simply to check as to whether a thread was in the middle of a scoped method when we stopped the world for a close operation (btw, instead of stopping the world, we do a much more efficient, thread-local polling, thanks to JEP 312 [4]).

The question is, then, once we detect that a thread is accessing the very segment we're about to close, what should happen? We first experimented with a solution which would install an *asynchronous* exception on the accessing thread, thus making it fail. This solution has some desirable properties, in that a `close` operation always succeeds. Unfortunately the machinery for async exceptions is a bit fragile (e.g. not all the code in hotspot checks for async exceptions); to minimize risks, we decided to revert to a simpler strategy, where `close` might fail when it finds that another thread is accessing the segment being closed.

As written in the javadoc, this doesn't mean that clients should just catch and try again; an exception on `close` is a bug in the user code, likely arising from lack of synchronization, and should be treated as such.

In terms of gritty implementation, we needed to centralize memory access routines in a single place, so that we could have a set of routines closely mimicking the primitives exposed by `Unsafe` but which, in addition, also provided a liveness check. This way we could mark all these routines with the special `@Scoped` annotation, which tells the VM that something important is going on.

To achieve this, we created a new (autogenerated) class, called `ScopedMemoryAccess`. This class contains all the main memory access primitives (including bulk access, like `copyMemory`, or `setMemory`), and accepts, in addition to the access coordinates, also a scope object, which is tested before access. A reachability fence is also thrown in the mix to make sure that the scope is kept alive during access (which is important when registering segments against cleaners).

Of course, to make memory access safe, memory access var handles, byte buffer var handles, and byte buffer API should use the new `ScopedMemoryAccess` class instead of unsafe, so that a liveness check can be triggered (in case a scope is present).

`ScopedMemoryAccess` has a `closeScope` method, which initiates the thread-local handshakes, and returns `true` if the handshake completed successfully.

The implementation of `MemoryScope` (now significantly simplified from what we had before), has two implementations, one for confined segments and one for shared segments; the main difference between the two is what happens when the scope is closed; a confined segment sets a boolean flag to false, and returns, whereas a shared segment goes into a `CLOSING` state, then starts the handshake, and then updates the state again, to either `CLOSED` or `ALIVE` depending on whether the handshake was successful or not. Note that when a shared segment is in the `CLOSING` state, `MemorySegment::isAlive` will still return `true`, while the liveness check upon memory access will fail.

The key realization here was that if all memory access var handles took a coordinate pair of `MemorySegment` and `long`, all other access types could be derived from this basic var handle form.

This allowed us to remove the on-the-fly var handle generation, and to simply derive structural access var handles (such as those obtained by calling `MemoryLayout::varHandle`) using *plain* var handle combinators, so that e.g. additional offset is injected into a base memory access var handle.

This also helped in simplifying the implementation by removing the special `withStride` and `withOffset` combinators, which previously needed low-level access on the innards of the memory access var handle. All that code is now gone.

Not much to see here - most of the tests needed to be updated because of the API changes. Some were beefed up (like the array test, since now segments can be projected into many different kinds of arrays). A test has been added to test the `Cleaner` functionality, and another stress test has been added for shared segments (`TestHandshake`). Some of the microbenchmarks also needed some tweaks - and some of them were also updated to also test performance in the shared segment case.

[1] - https://openjdk.java.net/jeps/393
[2] - https://openjdk.java.net/jeps/389
[3] - https://mail.openjdk.java.net/pipermail/panama-dev/2020-May/009004.html
[4] - https://openjdk.java.net/jeps/312
Added tests to make sure no spurious exception is thrown when:
* handing off a segment from A to A
* sharing an already shared segment
@bridgekeeper
Copy link

bridgekeeper bot commented Oct 13, 2020

👋 Welcome back mcimadamore! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@mcimadamore mcimadamore changed the title 8254231 linker 8254231: Implementation of Foreign Linker API (Incubator) Oct 13, 2020
@openjdk
Copy link

openjdk bot commented Oct 13, 2020

@mcimadamore The following labels will be automatically applied to this pull request:

  • build
  • core-libs
  • hotspot
  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added security security-dev@openjdk.org hotspot hotspot-dev@openjdk.org build build-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Oct 13, 2020
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 13, 2020
@mlbridge
Copy link

mlbridge bot commented Oct 13, 2020

@openjdk openjdk bot removed merge-conflict Pull request has merge conflict with target branch rfr Pull request is ready for review labels Nov 12, 2020
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 12, 2020
@mcimadamore
Copy link
Contributor Author

I've just merged against master - which now contains the foreign memory API changes that this JEP depends on. I believe reviewing the changes should now be easier, as only the relevant changes should be presented in the "File Changed" tab.

@@ -384,6 +384,20 @@ boolean open() {
}
}

public static NativeLibrary defaultLibrary = new NativeLibraryImpl(Object.class, "<default>", true, true) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field can be final.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - I already made this change in the latest revision.

Copy link
Contributor

@iwanowww iwanowww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler changes look good.

@mcimadamore
Copy link
Contributor Author

/integrate

@openjdk openjdk bot closed this Nov 23, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 23, 2020
@openjdk
Copy link

openjdk bot commented Nov 23, 2020

@mcimadamore Since your change was applied there have been 113 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

Pushed as commit 0fb31db.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk-notifier bot referenced this pull request Nov 23, 2020
Reviewed-by: coleenp, ihse, dholmes, vlivanov
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated security security-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

10 participants