Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow building on JDK9+ by getting rid of DirectByteBuffer dependency #155

Merged
merged 1 commit into from
Apr 7, 2021

Conversation

plevart
Copy link
Contributor

@plevart plevart commented Mar 28, 2021

Hi,

This is my 1st attempt at fixing #154 .

The problem when building/running with JDK9+ is not sun.misc.Unsafe. It is still there up to JDK15, you just have to --add-module=jdk.unsupported. In JDK16 it is encapsulated (you have to also add --add-exports=...) while in JDK17 it might get removed. The problem are: sun.misc.Cleaner which was removed in JDK9 (moved to jdk.internal.ref package) and sun.nio.ch.DirectBuffer interface which got encapsulated in JDK9.

So instead of building alternative versions of classes such as MemoryUtil compiled with and for JDK9+ and packaged into a multi-release JAR, I opted for simpler approach. I got rid of DirectByteBuffer usage (mostly they were used in TiBlockColumnVector) and replaced them with HeapByteBuffers that don't need sun.misc.Cleaner as they are allocated on heap (with byte[] array as their backing memory). As it turns out, code got much simpler using just public ByteBuffer API. And it should be at least as fast or even faster since various accesses to data types in HeapByteBuffer are performed with Unsafe internally, but in a safe and encapsulated way. No more manipulation with addresses and clumsy releasing of off-heap memory.

The only change to the build procedure is the addition of Maven profile jdk9plus which is activated when building with JDK9+ and just adds a dependency with scope provided (which means it is not packaged into the resulting artifact). This is needed because in JDK9+ javax.annotation.Generated annotation which is present on protobuf generated classes is missing (was renamed to javax.annotation.processing.Generated in JDK9) so this additional dependency which is added only when building with JDK9+ allows the project to be successfully built with JDK9+ (I tested with JDK11).

There is only one place left where sun.misc.Unsafe is still used: in org.tikv.common.util.FastByteComparisons.LexicographicalComparerHolder.UnsafeComparer. But this is not critical since as I said, sun.misc.Unsafe is still there for compilation at least up to JDK15 and in the code it has a fallback to somewhat slower variant that doesn't use unsafe. There is a fast java.util.Arrays#compare(byte[], byte[]) method in JDK9+ but that would need separate compilation with JDK9+ and packaging into multi-release JAR. So this is still open.

So, WDYT?

@marsishandsome
Copy link
Collaborator

/run-all-tests

@marsishandsome
Copy link
Collaborator

image

please add sign-off

@marsishandsome
Copy link
Collaborator

@birdstorm we should modify the ci to add java9 building step.

@birdstorm
Copy link
Collaborator

@birdstorm we should modify the ci to add java9 building step.

sure

Signed-off-by: Peter Levart <peter.levart@gmail.com>
@plevart
Copy link
Contributor Author

plevart commented Mar 29, 2021

please add sign-off

Done.

@plevart
Copy link
Contributor Author

plevart commented Mar 29, 2021

/run-all-tests

@birdstorm
Copy link
Collaborator

birdstorm commented Mar 29, 2021

@plevart Nice work! I think we should hold a contributor meeting this week about this change.

There is some work to check before merge:

  • TiBlockColumnVector should behave the same way as it does, its related logic lies in TiSpark. Perhaps we can first move this logic out of the client and add it in the future?
  • The current CI script should be tested so that we can make sure it does not break the build.
  • Add test for JDK 9+ environment on CI. It can be manually tested, and integrated with CI later.
  • Discuss which release should this PR be included. For now, I suggest we include this in release-4.0. i.e., the next big release.

@plevart
Copy link
Contributor Author

plevart commented Mar 29, 2021

@plevart Nice work! I think we should hold a contributor meeting this week about this change.

There is some work to check before merge:

* [ ]  `TiBlockColumnVector` should behave the same way as it does, its related logic lies in TiSpark. Perhaps we can first move this logic out of the client and add it in the future?

Do you mean to remove TiBlockColumnVector and related logic out of java client? So currently client does not need it? I think the patch preserves the behavior of TiBlockColumnVector as far as it's usage inside the tikv client is concerned. It just replaces usage of DirectByteBuffers with HeapByteBuffer(s), but that, as far as client iself is concerned, is an internal contract. If it is more than that and TiBlockColumnVector is used by other software too (in the sense that other software constructs TiBlockColumnVector instances and passes DirectByteBuffers to its constructors), then perhaps it should be removed from the client and added to some other software. But if this other software just interacts with TiBlockColumnVector public API and always obtains TiBlockColumnVector instances from the tikv client, then the behavior is preserved as this public API does not expose internal ByteBuffer(s). They are just internal implementation detail.
Anyway, if the decision is to remove TiBlockColumnVector from the client, I can look at what needs to be "deleted"...

* [x]  The current CI script should be tested so that we can make sure it does not break the build.

* [ ]  Add test for JDK 9+ environment on CI. It can be manually tested, and integrated with CI later.

* [ ]  Discuss which release should this PR be included. For now, I suggest we include this in release-4.0. i.e., the next big release.

RIght. If the change removes TiBlockColumnVector then it should probably go to next major release.

@birdstorm
Copy link
Collaborator

Do you mean to remove TiBlockColumnVector and related logic out of java client?

AFAIK, the TiBlockColumnVector is only used for TiSpark for now, it implements a columnar vector so that TiSpark can use columnar APIs to read from TiKV. It may be also useful for future logic which needs to read columnar data. @marsishandsome PTAL, we have to confirm it.

IMHO, this PR is great except that it lacks unit tests, but I can add it later. I will review this PR again tomorrow.

How about we arrange a meeting via zoom for more details, say Wednesday? @plevart @marsishandsome @ilovesoup I can schedule it according to the calendar. @plevart Feel free to tell us if you are occupied.

@plevart
Copy link
Contributor Author

plevart commented Mar 29, 2021

How about we arrange a meeting via zoom for more details, say Wednesday? @plevart @marsishandsome @ilovesoup I can schedule it according to the calendar. @plevart Feel free to tell us if you are occupied.

Wednesday is ok with me. Just note that I live in the GMT+2 time zone and would therefore appreciate the hour to be between say 10am to 8pm local time which is between 08:00 - 18:00 UTC if possible. Thanks.

@birdstorm
Copy link
Collaborator

@plevart HI, after our discussion, we think this PR has resolved some concerns that:

  1. it uses the profile that activates only when using JDK 9+, so the build process will be the same as it was.
  2. the unsafe comparator for byte arrays is not changed.
  3. its changes to TiBlockColumnVector will be discussed later because the author is busy now.

Overall, I believe there isn't much for us to discuss or share for now. If you have some more questions to ask, we can still schedule this meeting for tomorrow, since the zoom meeting is a bit heavy for just an elegant PR like this.

@plevart
Copy link
Contributor Author

plevart commented Mar 31, 2021

I'm ok with waiting for the author of TiBlockColumnVector to chime in first. No need to discuss anything until then.

@birdstorm
Copy link
Collaborator

Hi! @plevart Sorry to have kept you waiting.

We have decided to first check the correctness of replacing DirectByteBuffer with HeapByteBuffer in TiSpark. It would take a day or two, and estimated to have a desiring result by next Monday. After that, I will review this PR again, and it should be ready to merge by then.

@birdstorm
Copy link
Collaborator

birdstorm commented Apr 7, 2021

Hi, @plevart FYI, the verify tests for getting rid of DirectByteBuffer are undergoing in pingcap/tispark#2005. You may track it for its progress.

Copy link
Collaborator

@birdstorm birdstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@birdstorm
Copy link
Collaborator

In theory, DirectByteBuffer should cost less time when allocating than HeapByteBuffer. I will also benchmark it again when the TiBlockColumnVector is used in a large table to see if there are performance drawbacks.

@birdstorm birdstorm merged commit 1978727 into tikv:master Apr 7, 2021
@plevart
Copy link
Contributor Author

plevart commented Apr 7, 2021

In theory, DirectByteBuffer should cost less time when allocating than HeapByteBuffer. I will also benchmark it again when the TiBlockColumnVector is used in a large table to see if there are performance drawbacks.

This might be a common misconception - that native code can do allocations of "native" memory faster than java code can do heap allocations. Depending on the size of allocated object, java heap allocation translates into JIT-ed code as a mere increment of a pointer - by utilizing TLAB (thread-local allocation buffers) so it can be as fast as possible:

https://shipilev.net/jvm/anatomy-quarks/4-tlab-allocation/

But do benchmark the code. It's the only sure way to find out.

@plevart
Copy link
Contributor Author

plevart commented Apr 7, 2021

DirectByteBuffer's purpose is mainly to interface native code where large amounts of data have to be exchanged between native code and java code. If consumers and producers of data are both pure-java-based, then heap ByteBuffer(s) are always a better choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants