Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Snapshot] Use java.lang.ref.Cleaner to close cloned IndexInputs #6351

Merged
merged 1 commit into from
Feb 17, 2023

Conversation

andrross
Copy link
Member

As detailed in this issue, Lucene does not close the cloned IndexInput instances, so we are using the Cleaner mechanism from the JDK to close any unclosed clones.

A single static Cleaner instance to ensure any unclosed clone of an IndexInput is closed. This instance creates a single daemon thread on which it performs the cleaning actions. For an already-closed IndexInput, the cleaning action is a no-op. For an open IndexInput, the close action will decrement a reference count.

Issues Resolved

Closes #5243

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@andrross andrross changed the title Use java.lang.ref.Cleaner to close cloned IndexInputs [Searchable Snapshot] Use java.lang.ref.Cleaner to close cloned IndexInputs Feb 17, 2023
@@ -301,12 +304,10 @@ public static void testSlice(OnDemandBlockSnapshotIndexInput blockedSnapshotFile

newSlice.seek(0);
assertEquals(0, newSlice.getFilePointer());
assertEquals(0, newSlice.currentBlockId);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I did remove a couple assertions around currentBlockId because I wanted to make that field private instead of protected. We're not losing any coverage here because computing currentBlockPosition() requires the currentBlockId so we still (indirectly) asserting on the currentBlockId value.

* [1]: https://github.com/apache/lucene/blob/8340b01c3cc229f33584ce2178b07b8984daa6a9/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L32-L33
*/
private static class BlockHolder {
private IndexInput block;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't block be declared volatile? it is cleaned in cleaner's thread but will be queried in other threads ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleaner won't touch it until the reference is phantom reachable. I guess I was assuming there would not be any concurrency issues, but maybe that's not a valid assumption. There obviously won't be anything else reading or writing when the cleaner action is running. But I don't actually know if a memory barrier is involved so would the cleaner thread be guaranteed to see the latest value of block? I can make it volatile to be safe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to come up with any other possible scenario when close() is called but not by cleaner, probably should never happen

@andrross andrross force-pushed the index-input-cleaner branch from c55f61e to b8463e2 Compare February 17, 2023 01:22
}
}

private void closeQuietly() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOUtils::closeWhileHandlingException

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the logging statement I added here on failure useful?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to be fair, there is no way to propagate it ... however it seems like Cleaner keeps references on logger as well :) What we could do probably is to thrown UncheckedIOException instead

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cleaner will silently swallow any unhandled exceptions. I think the logger reference is fine because its a static field and not tied to an instance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the JDK's cleaner implementations do throw UncheckedIOException in this case, no logging, just a note

this.block = Objects.requireNonNull(block);
}

private void close() throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like AutoCloseable, right?

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@@ -144,56 +146,56 @@ public long length() {

@Override
public byte readByte() throws IOException {
if (currentBlock == null) {
if (blockHolder.block == null) {
Copy link
Collaborator

@reta reta Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be a race (fe between readByte() and close()) when blockHolder.block becomes null after the check, the safer way would be:

       final IndexInput block = blockHolder.block;
       if (block == null) {
            // seek to the beginning
            seek(0);
        } else if (currentBlockPosition() >= blockSize) {
            int blockId = currentBlockId + 1;
            demandBlock(blockId);
        }
        return block.readByte();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class (like org.apache.lucene.store.IndexInput) is not thread safe (I'll add it to the class doc). The cleaning action is technically on a separate thread, but also guaranteed to happen when the instance can no longer be used so there's no chance of it racing with any of these methods. I agree your suggestion is the right way to handle a concurrency issue like this, but probably not necessary here. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class (like org.apache.lucene.store.IndexInput) is not thread safe (I'll add it to the class doc).

Yes, should be enough to clarify, thank you

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Feb 17, 2023

Codecov Report

Merging #6351 (e9e3646) into main (7914c04) will increase coverage by 0.04%.
The diff coverage is 61.40%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    #6351      +/-   ##
============================================
+ Coverage     70.73%   70.78%   +0.04%     
- Complexity    59024    59061      +37     
============================================
  Files          4800     4799       -1     
  Lines        282453   282432      -21     
  Branches      40718    40716       -2     
============================================
+ Hits         199799   199924     +125     
+ Misses        66259    66140     -119     
+ Partials      16395    16368      -27     
Impacted Files Coverage Δ
...tore/remote/directory/RemoteSnapshotDirectory.java 3.12% <0.00%> (ø)
...arch/index/store/remote/utils/TransferManager.java 3.03% <ø> (ø)
...dex/store/remote/file/OnDemandBlockIndexInput.java 79.13% <61.81%> (+6.12%) ⬆️
...e/remote/file/OnDemandBlockSnapshotIndexInput.java 72.34% <100.00%> (+0.91%) ⬆️
...search/indices/recovery/RecoveryTargetHandler.java 0.00% <0.00%> (-100.00%) ⬇️
...h/action/ingest/SimulateDocumentVerboseResult.java 60.71% <0.00%> (-39.29%) ⬇️
...cluster/coordination/PublishClusterStateStats.java 33.33% <0.00%> (-37.51%) ⬇️
.../opensearch/client/indices/CloseIndexResponse.java 56.25% <0.00%> (-35.00%) ⬇️
...ion/admin/cluster/node/info/PluginsAndModules.java 53.12% <0.00%> (-34.38%) ⬇️
...arch/search/aggregations/pipeline/LinearModel.java 23.07% <0.00%> (-30.77%) ⬇️
... and 496 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

As detailed [in this issue][1], Lucene does not close the cloned
IndexInput instances, so we are using the Cleaner mechanism from the JDK
to close any unclosed clones.

A single static Cleaner instance to ensure any unclosed clone of an
IndexInput is closed. This instance creates a single daemon thread on
which it performs the cleaning actions. For an already-closed
IndexInput, the cleaning action is a no-op. For an open IndexInput, the
close action will decrement a reference count.

[1]: opensearch-project#5243 (comment)

Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross force-pushed the index-input-cleaner branch from b8463e2 to e9e3646 Compare February 17, 2023 01:58
@andrross andrross added backport 2.x Backport to 2.x branch v2.6.0 'Issues and PRs related to version v2.6.0' labels Feb 17, 2023
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@andrross
Copy link
Member Author

WhiteSource is complaining about #5576 still, though no dependencies changed in this PR.

@andrross andrross merged commit dae1566 into opensearch-project:main Feb 17, 2023
@andrross andrross deleted the index-input-cleaner branch February 17, 2023 18:03
opensearch-trigger-bot bot pushed a commit that referenced this pull request Feb 17, 2023
As detailed [in this issue][1], Lucene does not close the cloned
IndexInput instances, so we are using the Cleaner mechanism from the JDK
to close any unclosed clones.

A single static Cleaner instance to ensure any unclosed clone of an
IndexInput is closed. This instance creates a single daemon thread on
which it performs the cleaning actions. For an already-closed
IndexInput, the cleaning action is a no-op. For an open IndexInput, the
close action will decrement a reference count.

[1]: #5243 (comment)

Signed-off-by: Andrew Ross <andrross@amazon.com>
(cherry picked from commit dae1566)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
andrross pushed a commit that referenced this pull request Feb 17, 2023
As detailed [in this issue][1], Lucene does not close the cloned
IndexInput instances, so we are using the Cleaner mechanism from the JDK
to close any unclosed clones.

A single static Cleaner instance to ensure any unclosed clone of an
IndexInput is closed. This instance creates a single daemon thread on
which it performs the cleaning actions. For an already-closed
IndexInput, the cleaning action is a no-op. For an open IndexInput, the
close action will decrement a reference count.

[1]: #5243 (comment)


(cherry picked from commit dae1566)

Signed-off-by: Andrew Ross <andrross@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch skip-changelog v2.6.0 'Issues and PRs related to version v2.6.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Searchable Snapshot] IndexInput clones close handling
3 participants