-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-7610: [Java] Finish support for 64 bit int allocations #6323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Some limitations of this PR:
|
|
@liyafan82 I removed myself as assignee for ARROW-6111 feel free to pick it up if you have time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be simpler to make TYPE_WIDTH long for here an other places? Are there downsides to that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making it a long is a reasonable choice.
I am not sure if is an overkill as TYPE_WIDTH is often a small number, and often used in scenarios where a small number is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emkornfield After more thoughts, I kind of prefer leaving the offset width int32. This makes sure we promote the related operations to int64 only when necessary.
That being said, I will follow your advice and make the offset width int 64, if you have a reason, or feel strong about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * the data buffer.. | |
| * the data buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jacques-n thoughts on making the default Unsafe to avoid any dependencies on Netty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emkornfield I have changed the code to make the default unsafe, so that we support large buffers by default.
|
@liyafan82 can you elaborate on how flight impacts the normal IPC format? (perhaps a follow-up JIRA explaining it?) |
Sure. I will try to proivide an implementation. |
Sure. I will open a JIRA tracking it. |
2866104 to
df08774
Compare
df08774 to
6ff821a
Compare
Did you get a chance, I'd like to understand what is left? |
Sure. I opened https://issues.apache.org/jira/browse/ARROW-7746 to track it. |
6ff821a to
d346518
Compare
|
@emkornfield Could you please triage this PR to a committer, as it is blocking another issue (#6782 (comment))? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warn might be too high, I think info is probably more appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised to info. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like these should match up with the enums exactly? or probably do an non case sensitive comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Revised the code to match up with the enum exactly. Thank you.
|
A couple of small comments. Sorry to flip-flop but lets keep the default allocator as Netty and then change it in the next release, when some of the other Memory PRs have been committed? Can you also add details on how to configure the new allocator to the PR description so it gets put into release notes (we should also make sure to make note of this). |
|
Thanks a lot for your effort.
Sounds good. The default allocator is reverted to Netty.
Good suggestion. I have added details to the description. Thank you. |
emkornfield
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments around logging. I would like to make sure we aren't going to cause problems for users of the library by ignoring log configuration since we are logging in a static initializer. The other is a small nit. I think this can be merged after we resolve those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens when logging during static initialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your question correctly. You are concerned that when calling getDefaultAllocationManagerFactory, the logger may not have been initialized. If so, a NullPoinerException will be thrown.
There is no need to worry about that. According to the Java specification (https://docs.oracle.com/javase/specs/jls/se7/html/jls-12.html#jls-12.4.2), static members in a class are initialized in textual order, as though they were a single block.
In our code, initialization of the logger appears before the DEFAULT_ALLOCATION_MANAGER_FACTORY, so it always gets initialized first.
In fact, this code path has been tested by all our test cases, as by default, no environmental variable or system property are specified when running the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: LOGGER seems to be more common in the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Revised. Thank you.
Thanks a lot for your comments. Please see if I understand your question correctly. |
|
@liyafan82 I don't really see an answer to the question @jacques-n asked on the JIRA and mailing list:
I agree with this that it is overkill to introduce different types of allocators if it can be handled with minimal changes to the current Netty based allocator. What are yours and @emkornfield thoughts on taking this approach? |
|
I thought I replied on the mailing list to this but it would be nice to get rid of a hard dependency on netty for two reasons:
There are other jiras which are also trying to trim the dependency on netty |
|
Are there benefits to going through the netty facade/existing implementation? |
@BryanCutler Thanks for your attention. |
One that comes to my mind is backward compatiblity? |
|
I'm all for removing dependencies on Netty, but that doesn't have to be done here. This could be done a lot easier by using the existing NettyAllocationManager instead of creating a new one, plus that also gives the benefit of using the default allocator that can also make large allocations. Adding an alternative allocation manager to Netty should probably be done as a separate issue with it's own discussion. |
I'm OK with this. @liyafan82 I think it is probably easy to move the new allocator and selection code to a different PR and make the changes directly to the netty allocator in this PR? Does that sound reasonable to you? |
@emkornfield Sounds reasonable to me. Thanks for the good suggestion @BryanCutler |
e1d9a26 to
6c0edf6
Compare
|
@BryanCutler @emkornfield Please take a look. Thanks again for the good suggestion. |
|
|
||
| NettyAllocationManager(BaseAllocator accountingAllocator, int requestedSize) { | ||
| super(accountingAllocator); | ||
| this.memoryChunk = INNER_ALLOCATOR.allocate(requestedSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should remove this which effectively replaces all allocations done in Arrow Java, which is a big change. INNER_ALLOCATOR also uses a pool which has some benefits. Instead, can you just change requestedSize to be a long, then check if it is over the max Int size and only then use PlatformDependent.allocateMemory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised. Thank you for the good suggestion.
| * To run this test, please | ||
| *<li>Make sure there are 4GB memory available in the system.</li> | ||
| * <li> | ||
| * Make sure the default allocation manager type is unsafe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Thank you.
| System.out.println("Successfully released the large buffer."); | ||
| } | ||
|
|
||
| public static void main(String[] args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be converted to a standard unit test now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
The problem is that we set arrow.vector.max_allocation_bytes to 1048576 for every test case (to avoid OOM). Please see the pom.xml file.
So if we convert it to a test case, we cannot allocate too much memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding an additional constructor for NettyAllocationManager that also takes a cut-off value that will determine if PlatformDependent.allocateMemory(requestedSize) is used with a default of Int.MAX_VALUE? Then you could create a new NettyAllocationManager for testing with a small cut-off value that can be used to run normal tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
I have revised the code to make the cut-off value configurable, and added cases to test the scenarios when the request size is below/above the cut-off value. Please see if it looks good to you. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is not run as part of the normal test suite correct? Does it need to run manually? if so could you put that in the javadoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it needs to run manually.
I have updated the javadoc accordingly.
| * To run this test, please | ||
| *<li>Make sure there are 4GB memory available in the system.</li> | ||
| * <li> | ||
| * Make sure the default allocation manager type is unsafe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised. Thank you.
| System.out.println("Successfully released the large vector."); | ||
| } | ||
|
|
||
| public static void main(String[] args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here about making these standard unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed above, if we make it a unit test, we cannot allocate too much memory.
| * Get the underlying memory chunk managed by this AllocationManager. | ||
| * @return buffer | ||
| */ | ||
| UnsafeDirectLittleEndian getMemoryChunk() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of removing this, could you deprecate and note it will be removed in future releases? also note that it will return null if allocated size is more than max int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Reverted this accordingly.
| allocatedAddress = PlatformDependent.allocateMemory(requestedSize); | ||
| } else { | ||
| this.memoryChunk = INNER_ALLOCATOR.allocate(requestedSize); | ||
| allocatedAddress = memoryChunk.memoryAddress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you keep allocatedSize = memoryChunk.capacity() for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Revised. Please check.
| UnsafeDirectLittleEndian getMemoryChunk() { | ||
| return memoryChunk; | ||
| if (requestedSize > Integer.MAX_VALUE) { | ||
| memoryChunk = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use this.memoryChunk to be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised accordingly. Thank you.
| System.out.println("Successfully released the large buffer."); | ||
| } | ||
|
|
||
| public static void main(String[] args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding an additional constructor for NettyAllocationManager that also takes a cut-off value that will determine if PlatformDependent.allocateMemory(requestedSize) is used with a default of Int.MAX_VALUE? Then you could create a new NettyAllocationManager for testing with a small cut-off value that can be used to run normal tests.
| /** | ||
| * The cut-off value for switching allocation strategies. | ||
| */ | ||
| private final long allocationCutOffValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an int because it can't go above Int.MAX_VALUE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to int. Thanks for the good suggestion.
| @Override | ||
| protected void release0() { | ||
| memoryChunk.release(); | ||
| if (allocatedSize > allocationCutOffValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just check if memoryChunk is null here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revised accordingly. Thank you.
| System.out.println("Successfully released the large buffer."); | ||
| } | ||
|
|
||
| public static void main(String[] args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is not run as part of the normal test suite correct? Does it need to run manually? if so could you put that in the javadoc
| } | ||
| DEFAULT_ALLOCATION_CUTOFF_VALUE = cutOffValue; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of this is necessary, please remove. you can create a custom NettyAllocationManager for tests and specify a different value there. See https://github.com/apache/arrow/blob/master/java/memory/src/test/java/org/apache/arrow/memory/TestBaseAllocator.java#L393
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Thank you for the good suggestion.
| @Deprecated | ||
| UnsafeDirectLittleEndian getMemoryChunk() { | ||
| return memoryChunk; | ||
| return allocatedSize > allocationCutOffValue ? null : memoryChunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add this? just return memoryChunk, it with either be null or a value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. This is not needed. Reverted.
| final long bufSize = 2048L; | ||
| try (RootAllocator allocator = new RootAllocator(bufSize); | ||
| ArrowBuf buffer = allocator.buffer(bufSize)) { | ||
| // make sure the buffer is large enough, so we wil use the allocation strategy for large buffers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misspelled wil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the typo. This comment is removed.
| */ | ||
| @Test | ||
| public void testLargeBufferAllocation() { | ||
| final long bufSize = 2048L; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about set this to NettyAllocationManager.DEFAULT_ALLOCATION_CUTOFF_VALUE + someval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Revised accordingly.
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
test failure is unrelated, merged to master. Thanks @liyafan82 ! |
|
@BryanCutler @emkornfield Thanks a lot for your effort. |
This is in response to the discussion in #6323 (comment) In this issue, we provide an allocation manager that is capable of allocation large (> 2GB) buffers. In addition, it does not depend on the netty library, which is aligning with the general trend of removing netty dependencies. In the future, we are going to make it the default allocation manager. Closes #6956 from liyafan82/fly_0416_unsf Lead-authored-by: liyafan82 <fan_li_ya@foxmail.com> Co-authored-by: emkornfield <emkornfield@gmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
This is in response to the discussion in apache/arrow#6323 (comment) In this issue, we provide an allocation manager that is capable of allocation large (> 2GB) buffers. In addition, it does not depend on the netty library, which is aligning with the general trend of removing netty dependencies. In the future, we are going to make it the default allocation manager. Closes #6956 from liyafan82/fly_0416_unsf Lead-authored-by: liyafan82 <fan_li_ya@foxmail.com> Co-authored-by: emkornfield <emkornfield@gmail.com> Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Modified the `NettyAllocationManager` to allow for 64 bit allocations. If the requested size is > `Int.MAX_VALUE`, memory will be allocated using `PlatformDependent.allocateMemory(long)` otherwise the existing allocation method remains unchanged. Unit tests were added that uses a custom AllocationManager with a smaller cutoff value to be able to test the new functionality within our current CI memory limits. Additional tests were added that allocate large buffers, but are required to run manually. Closes apache#6323 from liyafan82/fly_0123_alloc Authored-by: liyafan82 <fan_li_ya@foxmail.com> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Modified the
NettyAllocationManagerto allow for 64 bit allocations. If the requested size is >Int.MAX_VALUE, memory will be allocated usingPlatformDependent.allocateMemory(long)otherwise the existing allocation method remains unchanged.Unit tests were added that uses a custom AllocationManager with a smaller cutoff value to be able to test the new functionality within our current CI memory limits. Additional tests were added that allocate large buffers, but are required to run manually.