Allow building on JDK9+ by getting rid of DirectByteBuffer dependency #155

plevart · 2021-03-28T20:31:11Z

Hi,

This is my 1st attempt at fixing #154 .

The problem when building/running with JDK9+ is not sun.misc.Unsafe. It is still there up to JDK15, you just have to --add-module=jdk.unsupported. In JDK16 it is encapsulated (you have to also add --add-exports=...) while in JDK17 it might get removed. The problem are: sun.misc.Cleaner which was removed in JDK9 (moved to jdk.internal.ref package) and sun.nio.ch.DirectBuffer interface which got encapsulated in JDK9.

So instead of building alternative versions of classes such as MemoryUtil compiled with and for JDK9+ and packaged into a multi-release JAR, I opted for simpler approach. I got rid of DirectByteBuffer usage (mostly they were used in TiBlockColumnVector) and replaced them with HeapByteBuffers that don't need sun.misc.Cleaner as they are allocated on heap (with byte[] array as their backing memory). As it turns out, code got much simpler using just public ByteBuffer API. And it should be at least as fast or even faster since various accesses to data types in HeapByteBuffer are performed with Unsafe internally, but in a safe and encapsulated way. No more manipulation with addresses and clumsy releasing of off-heap memory.

The only change to the build procedure is the addition of Maven profile jdk9plus which is activated when building with JDK9+ and just adds a dependency with scope provided (which means it is not packaged into the resulting artifact). This is needed because in JDK9+ javax.annotation.Generated annotation which is present on protobuf generated classes is missing (was renamed to javax.annotation.processing.Generated in JDK9) so this additional dependency which is added only when building with JDK9+ allows the project to be successfully built with JDK9+ (I tested with JDK11).

There is only one place left where sun.misc.Unsafe is still used: in org.tikv.common.util.FastByteComparisons.LexicographicalComparerHolder.UnsafeComparer. But this is not critical since as I said, sun.misc.Unsafe is still there for compilation at least up to JDK15 and in the code it has a fallback to somewhat slower variant that doesn't use unsafe. There is a fast java.util.Arrays#compare(byte[], byte[]) method in JDK9+ but that would need separate compilation with JDK9+ and packaging into multi-release JAR. So this is still open.

So, WDYT?

marsishandsome · 2021-03-29T02:09:34Z

/run-all-tests

marsishandsome · 2021-03-29T03:17:02Z

please add sign-off

marsishandsome · 2021-03-29T03:19:21Z

@birdstorm we should modify the ci to add java9 building step.

birdstorm · 2021-03-29T03:21:18Z

@birdstorm we should modify the ci to add java9 building step.

sure

Signed-off-by: Peter Levart <peter.levart@gmail.com>

plevart · 2021-03-29T06:32:20Z

please add sign-off

Done.

plevart · 2021-03-29T06:33:21Z

/run-all-tests

birdstorm · 2021-03-29T08:57:52Z

@plevart Nice work! I think we should hold a contributor meeting this week about this change.

There is some work to check before merge:

TiBlockColumnVector should behave the same way as it does, its related logic lies in TiSpark. Perhaps we can first move this logic out of the client and add it in the future?
The current CI script should be tested so that we can make sure it does not break the build.
Add test for JDK 9+ environment on CI. It can be manually tested, and integrated with CI later.
Discuss which release should this PR be included. For now, I suggest we include this in release-4.0. i.e., the next big release.

plevart · 2021-03-29T10:04:03Z

@plevart Nice work! I think we should hold a contributor meeting this week about this change.

There is some work to check before merge:
* [ ]  `TiBlockColumnVector` should behave the same way as it does, its related logic lies in TiSpark. Perhaps we can first move this logic out of the client and add it in the future?

Do you mean to remove TiBlockColumnVector and related logic out of java client? So currently client does not need it? I think the patch preserves the behavior of TiBlockColumnVector as far as it's usage inside the tikv client is concerned. It just replaces usage of DirectByteBuffers with HeapByteBuffer(s), but that, as far as client iself is concerned, is an internal contract. If it is more than that and TiBlockColumnVector is used by other software too (in the sense that other software constructs TiBlockColumnVector instances and passes DirectByteBuffers to its constructors), then perhaps it should be removed from the client and added to some other software. But if this other software just interacts with TiBlockColumnVector public API and always obtains TiBlockColumnVector instances from the tikv client, then the behavior is preserved as this public API does not expose internal ByteBuffer(s). They are just internal implementation detail.
Anyway, if the decision is to remove TiBlockColumnVector from the client, I can look at what needs to be "deleted"...

* [x]  The current CI script should be tested so that we can make sure it does not break the build.

* [ ]  Add test for JDK 9+ environment on CI. It can be manually tested, and integrated with CI later.

* [ ]  Discuss which release should this PR be included. For now, I suggest we include this in release-4.0. i.e., the next big release.

RIght. If the change removes TiBlockColumnVector then it should probably go to next major release.

birdstorm · 2021-03-29T14:01:51Z

Do you mean to remove TiBlockColumnVector and related logic out of java client?

AFAIK, the TiBlockColumnVector is only used for TiSpark for now, it implements a columnar vector so that TiSpark can use columnar APIs to read from TiKV. It may be also useful for future logic which needs to read columnar data. @marsishandsome PTAL, we have to confirm it.

IMHO, this PR is great except that it lacks unit tests, but I can add it later. I will review this PR again tomorrow.

How about we arrange a meeting via zoom for more details, say Wednesday? @plevart @marsishandsome @ilovesoup I can schedule it according to the calendar. @plevart Feel free to tell us if you are occupied.

plevart · 2021-03-29T19:11:40Z

How about we arrange a meeting via zoom for more details, say Wednesday? @plevart @marsishandsome @ilovesoup I can schedule it according to the calendar. @plevart Feel free to tell us if you are occupied.

Wednesday is ok with me. Just note that I live in the GMT+2 time zone and would therefore appreciate the hour to be between say 10am to 8pm local time which is between 08:00 - 18:00 UTC if possible. Thanks.

birdstorm · 2021-03-30T09:10:14Z

@plevart HI, after our discussion, we think this PR has resolved some concerns that:

it uses the profile that activates only when using JDK 9+, so the build process will be the same as it was.
the unsafe comparator for byte arrays is not changed.
its changes to TiBlockColumnVector will be discussed later because the author is busy now.

Overall, I believe there isn't much for us to discuss or share for now. If you have some more questions to ask, we can still schedule this meeting for tomorrow, since the zoom meeting is a bit heavy for just an elegant PR like this.

plevart · 2021-03-31T07:06:40Z

I'm ok with waiting for the author of TiBlockColumnVector to chime in first. No need to discuss anything until then.

birdstorm · 2021-04-02T04:30:36Z

Hi! @plevart Sorry to have kept you waiting.

We have decided to first check the correctness of replacing DirectByteBuffer with HeapByteBuffer in TiSpark. It would take a day or two, and estimated to have a desiring result by next Monday. After that, I will review this PR again, and it should be ready to merge by then.

birdstorm · 2021-04-07T01:14:09Z

Hi, @plevart FYI, the verify tests for getting rid of DirectByteBuffer are undergoing in pingcap/tispark#2005. You may track it for its progress.

birdstorm

LGTM

birdstorm · 2021-04-07T01:49:03Z

In theory, DirectByteBuffer should cost less time when allocating than HeapByteBuffer. I will also benchmark it again when the TiBlockColumnVector is used in a large table to see if there are performance drawbacks.

plevart · 2021-04-07T07:32:33Z

In theory, DirectByteBuffer should cost less time when allocating than HeapByteBuffer. I will also benchmark it again when the TiBlockColumnVector is used in a large table to see if there are performance drawbacks.

This might be a common misconception - that native code can do allocations of "native" memory faster than java code can do heap allocations. Depending on the size of allocated object, java heap allocation translates into JIT-ed code as a mere increment of a pointer - by utilizing TLAB (thread-local allocation buffers) so it can be as fast as possible:

https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/

But do benchmark the code. It's the only sure way to find out.

plevart · 2021-04-07T08:58:47Z

DirectByteBuffer's purpose is mainly to interface native code where large amounts of data have to be exchanged between native code and java code. If consumers and producers of data are both pure-java-based, then heap ByteBuffer(s) are always a better choice.

marsishandsome requested a review from birdstorm March 29, 2021 03:29

Allow building on JDK9+ by getting rid of DirectByteBuffer dependency

c0f5bde

Signed-off-by: Peter Levart <peter.levart@gmail.com>

plevart force-pushed the jdk9plus branch from 8a19c97 to c0f5bde Compare March 29, 2021 06:26

birdstorm mentioned this pull request Apr 7, 2021

[DNM] Test Replace DirectByteBuffer with HeapByteBuffer pingcap/tispark#2005

Closed

birdstorm approved these changes Apr 7, 2021

View reviewed changes

birdstorm merged commit 1978727 into tikv:master Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow building on JDK9+ by getting rid of DirectByteBuffer dependency #155

Allow building on JDK9+ by getting rid of DirectByteBuffer dependency #155

plevart commented Mar 28, 2021 •

edited

Loading

marsishandsome commented Mar 29, 2021

marsishandsome commented Mar 29, 2021

marsishandsome commented Mar 29, 2021

birdstorm commented Mar 29, 2021

plevart commented Mar 29, 2021

plevart commented Mar 29, 2021

birdstorm commented Mar 29, 2021 •

edited

Loading

plevart commented Mar 29, 2021 •

edited

Loading

birdstorm commented Mar 29, 2021

plevart commented Mar 29, 2021

birdstorm commented Mar 30, 2021

plevart commented Mar 31, 2021

birdstorm commented Apr 2, 2021

birdstorm commented Apr 7, 2021 •

edited

Loading

birdstorm left a comment

birdstorm commented Apr 7, 2021

plevart commented Apr 7, 2021

plevart commented Apr 7, 2021

Allow building on JDK9+ by getting rid of DirectByteBuffer dependency #155

Allow building on JDK9+ by getting rid of DirectByteBuffer dependency #155

Conversation

plevart commented Mar 28, 2021 • edited Loading

marsishandsome commented Mar 29, 2021

marsishandsome commented Mar 29, 2021

marsishandsome commented Mar 29, 2021

birdstorm commented Mar 29, 2021

plevart commented Mar 29, 2021

plevart commented Mar 29, 2021

birdstorm commented Mar 29, 2021 • edited Loading

plevart commented Mar 29, 2021 • edited Loading

birdstorm commented Mar 29, 2021

plevart commented Mar 29, 2021

birdstorm commented Mar 30, 2021

plevart commented Mar 31, 2021

birdstorm commented Apr 2, 2021

birdstorm commented Apr 7, 2021 • edited Loading

birdstorm left a comment

Choose a reason for hiding this comment

birdstorm commented Apr 7, 2021

plevart commented Apr 7, 2021

plevart commented Apr 7, 2021

plevart commented Mar 28, 2021 •

edited

Loading

birdstorm commented Mar 29, 2021 •

edited

Loading

plevart commented Mar 29, 2021 •

edited

Loading

birdstorm commented Apr 7, 2021 •

edited

Loading