Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker cache eviction evicts entries that haven't been read by active consumers #16054

Closed
lhotari opened this issue Jun 14, 2022 · 4 comments · Fixed by #17273
Closed

Broker cache eviction evicts entries that haven't been read by active consumers #16054

lhotari opened this issue Jun 14, 2022 · 4 comments · Fixed by #17273
Labels
area/broker type/bug The PR fixed a bug or issue reported a bug
Milestone

Comments

@lhotari
Copy link
Member

lhotari commented Jun 14, 2022

Describe the bug

There's a performance regression with large fan-outs (large number of consumers).

#12045 changed cache eviction. The problem is that cache eviction is done to the position whatever happens to be the first cursor in the "heap" list in ManagedCursorContainer. There's a chance for a race condition: another cursor in the list might have an older position, but that hasn't been updated in the ordered data structure ("heap") when the method is called. The cache eviction will take the read position directly from the first cursor and it ignores and older positions.
When the consumer for the other cursor does a read, the entries are no more in the cache and there will be a cache miss.

To Reproduce

It should be possible to reproduce this in a multi-consumer scenario where the speed of the consumers varies.

Expected behavior

Cache eviction should properly consider all active consumers. This was the state before #12045 and #14985 changes.

Additional context
#12045, #14985, #16049

This impacts all maintenance branches since #12045 was cherry-picked to branch-2.7 (2.7.4), branch-2.8 (2.8.2) and branch-2.9 (2.9.0) at the time.

@lhotari
Copy link
Member Author

lhotari commented Jun 14, 2022

@315157973 @lordcheng10 @merlimat @codelipenghui What's your thoughts on this issue?

I wonder if the need for #14985 was caused by the regression described in this issue?

@lhotari
Copy link
Member Author

lhotari commented Jul 12, 2022

@315157973 @lordcheng10 @merlimat @codelipenghui please take a look at this issue since it seems that the broker cache is broken atm.

@eolivelli
Copy link
Contributor

this is partially related
#16421

@lordcheng10
Copy link
Contributor

I wonder if the need for #14985 was caused by the regression described in this issue?

In #14985, miss cache is caused by data re-pushing due to restart of some consumer clients @lhotari

lhotari added a commit to lhotari/pulsar that referenced this issue Aug 23, 2022
…on of active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes apache#16054

- calculate the sorted list of when a read position gets updated
- this resolves apache#9958 in a proper way
  - apache#12045 broke the caching solution as explained in apache#16054
lhotari added a commit to lhotari/pulsar that referenced this issue Aug 25, 2022
… active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes apache#16054

- calculate the sorted list of when a read position gets updated
- this resolves apache#9958 in a proper way
  - apache#12045 broke the caching solution as explained in apache#16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases
lhotari added a commit to lhotari/pulsar that referenced this issue Aug 26, 2022
… active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes apache#16054

- calculate the sorted list of when a read position gets updated
- this resolves apache#9958 in a proper way
  - apache#12045 broke the caching solution as explained in apache#16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases
lhotari added a commit that referenced this issue Aug 31, 2022
…sors (#17273)

* [fix][broker] Fix broken build caused by conflict between #17195 and #16605

- #17195 changed the method signature that #16605 depended upon

* [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes #16054

- calculate the sorted list of when a read position gets updated
- this resolves #9958 in a proper way
  - #12045 broke the caching solution as explained in #16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases

* Address review comment

* Handle durable & non-durable in the correct way

* Fix cache tests since now entries get evicted reactively

* Address review comment about method names

* Change signature for add method so that position must be passed

- this is more consistent with cursorUpdated method where the position is passed

* Update javadoc for ManagedCursorContainer

* Address review comment

* Simplify ManagedCursorContainer

* Clarify javadoc

* Ensure that cursors are tracked by making sure that initial position isn't null unintentionally

* Prevent race in updating activeCursors
Technoboy- pushed a commit that referenced this issue Oct 31, 2022
…sors (#17273)

* [fix][broker] Fix broken build caused by conflict between #17195 and #16605

- #17195 changed the method signature that #16605 depended upon

* [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes #16054

- calculate the sorted list of when a read position gets updated
- this resolves #9958 in a proper way
  - #12045 broke the caching solution as explained in #16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases

* Address review comment

* Handle durable & non-durable in the correct way

* Fix cache tests since now entries get evicted reactively

* Address review comment about method names

* Change signature for add method so that position must be passed

- this is more consistent with cursorUpdated method where the position is passed

* Update javadoc for ManagedCursorContainer

* Address review comment

* Simplify ManagedCursorContainer

* Clarify javadoc

* Ensure that cursors are tracked by making sure that initial position isn't null unintentionally

* Prevent race in updating activeCursors
congbobo184 pushed a commit that referenced this issue Nov 17, 2022
…sors (#17273)

* [fix][broker] Fix broken build caused by conflict between #17195 and #16605

- #17195 changed the method signature that #16605 depended upon

* [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes #16054

- calculate the sorted list of when a read position gets updated
- this resolves #9958 in a proper way
  - #12045 broke the caching solution as explained in #16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases

* Address review comment

* Handle durable & non-durable in the correct way

* Fix cache tests since now entries get evicted reactively

* Address review comment about method names

* Change signature for add method so that position must be passed

- this is more consistent with cursorUpdated method where the position is passed

* Update javadoc for ManagedCursorContainer

* Address review comment

* Simplify ManagedCursorContainer

* Clarify javadoc

* Ensure that cursors are tracked by making sure that initial position isn't null unintentionally

* Prevent race in updating activeCursors

(cherry picked from commit 856ef15)
congbobo184 pushed a commit that referenced this issue Dec 7, 2022
…sors (#17273)

* [fix][broker] Fix broken build caused by conflict between #17195 and #16605

- #17195 changed the method signature that #16605 depended upon

* [fix][broker] Keep sorted list of cursors ordered by read position of active cursors when cacheEvictionByMarkDeletedPosition=false

Fixes #16054

- calculate the sorted list of when a read position gets updated
- this resolves #9958 in a proper way
  - #12045 broke the caching solution as explained in #16054
- remove invalid tests
- fix tests
- add more tests to handle corner cases

* Address review comment

* Handle durable & non-durable in the correct way

* Fix cache tests since now entries get evicted reactively

* Address review comment about method names

* Change signature for add method so that position must be passed

- this is more consistent with cursorUpdated method where the position is passed

* Update javadoc for ManagedCursorContainer

* Address review comment

* Simplify ManagedCursorContainer

* Clarify javadoc

* Ensure that cursors are tracked by making sure that initial position isn't null unintentionally

* Prevent race in updating activeCursors

(cherry picked from commit 856ef15)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants