Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed #33744

Closed
wants to merge 3 commits into from

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Jan 18, 2023

See apache/arrow-java#223

What changes are included in this PR?

  1. Close outstanding buffer(ledger)s during allocator-close;
  2. Report warning messages into log after the outstanding buffers were closed;
  3. To help implement the feature, construct a linked-list of buffer ledgers and hold a reference to tail element from buffer alloctor.

Are these changes tested?

  1. Current UTs and added UTs.
  2. Benchmarking by AllocatorBenchmarks

Are there any user-facing changes?

No

@github-actions
Copy link

@github-actions
Copy link

⚠️ GitHub issue apache/arrow-java#223 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose? I don't think we want this.

This silently suppresses an application logic error, and forces all allocations to go through a lock. The exception you get if there are unclosed buffers is very intentional.

The close() method's docstring is admittedly misleading, but I do not think the intent was to just close all buffers for you. (I think it's speaking more towards releasing any pooled or cached memory, which is definitely true for the Netty implementation.)

@zhztheplayer
Copy link
Member Author

What is the purpose? I don't think we want this.

Manual reference counting of Arrow Java library can become very tricky, even becomes the bottle neck of daily development's efficiency in codes of complicated Java systems that relies on Arrow. This is just one way to make Arrow just be usable somehow in that case.

Or a more complete solution I can come up with is to include:

  1. A clean-up feature for allocator like this
  2. A mechanism to let buffer instances be managed by GC by leveraging e.g. Java weak ref or cleaner toolkit.

I understand performance should be considered emphatically in Arrow, however it's true that we must rely on smarter
strategies of memory management, more or less, in software development including the development around Arrow. I would suggest we provide such a smarter way to Java developers to let them choose between using manual and using auto. All of the current restrictions including the strong leak-checking can be preserved when user chooses the legacy manual way.

Would you think this topic is worth discussion? I am indeed not hurry to get this patch merged until the best solution is shaped. Thanks.

@lidavidm
Copy link
Member

This patch forces everyone to take the opposite tradeoff, however.

Some sort of nursery that auto-closes buffers may be useful, but I think it should be separate from the base allocator. (The allocator also has a 'listener' to be notified of allocation/deallocation events; possibly you could build this mechanism on top of that.) And nondeterministic collection of buffers is liable to bite you; the GC does not see into native allocations and you can run out of memory despite having 'free' memory because the GC has not collected the small Java-side objects that hold the large native buffers.

@zhztheplayer
Copy link
Member Author

Thanks and would you suggest to add a new implementation of interface BufferAllocator (e.g. AutoCleanBufferAllocator)? If yes I can try start from there.

the GC does not see into native allocations and you can run out of memory despite having 'free' memory because the GC has not collected the small Java-side objects that hold the large native buffers.

Yes I can image something like that too. What I can try is to reuse the JVM's direct memory counter which can trigger GC when reaching its limit, or rebuild a similar counter like that. Not sure if there are some better solutions but this one is probably OK since it was at least from JVM itself.

@lidavidm
Copy link
Member

I don't think it needs a new interface.

@zhztheplayer
Copy link
Member Author

I don't think it needs a new interface.

I meant a new implementation of the interface, e.g.

class AutoCleanBufferAllocator implements BufferAllocator

@lidavidm
Copy link
Member

Ah, ok. That sounds fine. I would explore the allocation listener further, though...

@zhztheplayer
Copy link
Member Author

zhztheplayer commented Jan 18, 2023

I see. I'll find out to what extent can allocation listener be leveraged in this feature. Thanks for the suggestion.

@amol-
Copy link
Member

amol- commented Mar 30, 2023

Closing because it has been untouched for a while, in case it's still relevant feel free to reopen and move it forward 👍

@amol- amol- closed this Mar 30, 2023
Copy link

⚠️ GitHub issue #33743 has no components, please add labels for components.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java] Release outstanding buffers when BaseAllocator is being closed
3 participants