Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Branch 2.7][Fix][broker] Fix NPE when ledger id not found in OpReadEntry (#15837) #16966

Merged
merged 1 commit into from
Aug 7, 2022

Conversation

Jason918
Copy link
Contributor

@Jason918 Jason918 commented Aug 7, 2022

(cherry picked from commit 7a3ad61)

Motivation

In current implementation, We have more than one potential NPE relate ManagedLedgerImpl#startReadOperationOnLedger method.

1. Unbox NPE

PositionImpl startReadOperationOnLedger(PositionImpl position, OpReadEntry opReadEntry) {
Long ledgerId = ledgers.ceilingKey(position.getLedgerId());
if (null == ledgerId) {
opReadEntry.readEntriesFailed(new ManagedLedgerException.NoMoreEntriesToReadException("The ceilingKey(K key"
+ ") method is used to return the least key greater than or equal to the given key, "
+ "or null if there is no such key"), null);
}
if (ledgerId != position.getLedgerId()) {
// The ledger pointed by this position does not exist anymore. It was deleted because it was empty. We need
// to skip on the next available ledger
position = new PositionImpl(ledgerId, 0);
}
return position;

According to this code, we can know that when the ledger id is null, we will fail OpReadEntry. However, there is no direct return here after failure. NPE will be thrown on line 2238. Because we want to unbox null.

2. null OpReadEntry context

According to the code above, we can see when we fail OpReadEntry, we pass a null value as context(line 2235). there is an NPE when the dispatcher gets the callback.

private synchronized void internalReadEntriesFailed(ManagedLedgerException exception, Object ctx) {
havePendingRead = false;
ReadEntriesCtx readEntriesCtx = (ReadEntriesCtx) ctx;
Consumer c = readEntriesCtx.getConsumer();
readEntriesCtx.recycle();
long waitTimeMillis = readFailureBackoff.next();

3. OpReadEntry#create

public static OpReadEntry create(ManagedCursorImpl cursor, PositionImpl readPositionRef, int count,
ReadEntriesCallback callback, Object ctx, PositionImpl maxPosition) {
OpReadEntry op = RECYCLER.get();
op.readPosition = cursor.ledger.startReadOperationOnLedger(readPositionRef, op);
op.cursor = cursor;
op.count = count;
op.callback = callback;
op.entries = Lists.newArrayList();
if (maxPosition == null) {
maxPosition = PositionImpl.LATEST;
}
op.maxPosition = maxPosition;
op.ctx = ctx;
op.nextReadPosition = PositionImpl.get(op.readPosition);
return op;
}

When we create OpReadEntry and then call ManagedCursor#startReadOperationOnLedger , assuming the ledger id is equal to null. At this point, we will fail this OpReadEntry. But at the current time, other parameters are not initialized. When we call OpReadEntry#readEntriesFailed. The cursor will be null and the NPE will be thrown.

@Override
public void readEntriesFailed(ManagedLedgerException exception, Object ctx) {
cursor.readOperationCompleted();
if (!entries.isEmpty()) {

Modifications

  • Remove fail OpReadEntry logic in ManagedCursor#startReadOperationOnLedger, when ledger id equals null, we can return the original value. the ManagedLedger will validate if this operation is legal.

From the perspective of the overall design, the OpReadEntry is just a middle state object, that may have illegal value, we have to check this OpReadEntry is valid in ManagedLedger#asyncReadEntries(Current we already do that).

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • doc-not-needed
    (Please explain why)

@mattisonchao mattisonchao merged commit 9abab5b into apache:branch-2.7 Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants