Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the reader skips compacted data which original ledger been removed #12522

Conversation

codelipenghui
Copy link
Contributor

@codelipenghui codelipenghui commented Oct 28, 2021

The compactor update the compaction cursor(mark delete) first and then update the compactionHorizon of the compacted topic. During the compaction cursor move forward, the original ledger will be removed if no other durable cursors. At the same time, if the reader is reading data from the original ledger, the reader will skip the data while the original ledger been removed, details to see #6787. So the reader might skip the compacted data since the compactionHorizon have not updated yet.

The approach is:

  1. Update the compactionHorizon before the compaction cursor move forward,
    so that the reader will not skip the original data before compactionHorizon updated.
    If the broker crashes before the new compacted Ledger ID been persistent,
    after the topic been loaded, the compaction can be trigger again and will not loss any data,
    but we will have an orphan ledger cannot be delete in the BookKeeper cluster.
  2. Remove the previous compacted Ledger after the compaction cursor move forward, make sure the new compacted Ledger ID been persistent,Otherwise, we might lost compacted ledger if broker crashes.
  • doc-not-needed

The compactor update the compaction cursor(mark delete) first and then update the `compactionHorizon` of the compacted topic.
During the compaction cursor move forward, the original ledger will be removed if no other durable cursors.
At the same time, if the reader is reading data from the original ledger, the reader will skip the data while the original ledger
been removed, details to see apache#6787. So the reader might skip the compacted data since the
`compactionHorizon` have not updated yet.

The approach is:

1. Update the `compactionHorizon` before the compaction cursor move forward,
   so that the reader will not skip the original data before `compactionHorizon` updated.
   If the broker crashes before the new compacted Ledger ID been persistent,
   after the topic been loaded, the compaction can be trigger again and will not loss any data,
   but we will have an orphan ledger cannot be delete in the BookKeeper cluster.
2. Remove the previous compacted Ledger after the compaction cursor move forward, make sure the new compacted Ledger ID been persistent,
   Otherwise, we might lost compacted ledger if broker crashes.
@codelipenghui codelipenghui added type/bug The PR fixed a bug or issue reported a bug doc-not-needed Your PR changes do not impact docs release/2.8.2 release/2.9.1 labels Oct 28, 2021
@codelipenghui codelipenghui added this to the 2.10.0 milestone Oct 28, 2021
@codelipenghui codelipenghui self-assigned this Oct 28, 2021
@merlimat merlimat merged commit 74dd9b9 into apache:master Nov 2, 2021
@codelipenghui codelipenghui deleted the penghui/fix-skip-compaction-data-while-remove-original-ledger branch November 3, 2021 11:31
hangc0276 pushed a commit that referenced this pull request Nov 4, 2021
#12522)

* Fix the reader skips compacted data which original ledger been removed

The compactor update the compaction cursor(mark delete) first and then update the `compactionHorizon` of the compacted topic.
During the compaction cursor move forward, the original ledger will be removed if no other durable cursors.
At the same time, if the reader is reading data from the original ledger, the reader will skip the data while the original ledger
been removed, details to see #6787. So the reader might skip the compacted data since the
`compactionHorizon` have not updated yet.

The approach is:

1. Update the `compactionHorizon` before the compaction cursor move forward,
   so that the reader will not skip the original data before `compactionHorizon` updated.
   If the broker crashes before the new compacted Ledger ID been persistent,
   after the topic been loaded, the compaction can be trigger again and will not loss any data,
   but we will have an orphan ledger cannot be delete in the BookKeeper cluster.
2. Remove the previous compacted Ledger after the compaction cursor move forward, make sure the new compacted Ledger ID been persistent,
   Otherwise, we might lost compacted ledger if broker crashes.

* Fix checkstyle

* Fix tests.

* Fix test

(cherry picked from commit 74dd9b9)
@hangc0276 hangc0276 added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Nov 4, 2021
eolivelli pushed a commit to eolivelli/pulsar that referenced this pull request Nov 29, 2021
apache#12522)

* Fix the reader skips compacted data which original ledger been removed

The compactor update the compaction cursor(mark delete) first and then update the `compactionHorizon` of the compacted topic.
During the compaction cursor move forward, the original ledger will be removed if no other durable cursors.
At the same time, if the reader is reading data from the original ledger, the reader will skip the data while the original ledger
been removed, details to see apache#6787. So the reader might skip the compacted data since the
`compactionHorizon` have not updated yet.

The approach is:

1. Update the `compactionHorizon` before the compaction cursor move forward,
   so that the reader will not skip the original data before `compactionHorizon` updated.
   If the broker crashes before the new compacted Ledger ID been persistent,
   after the topic been loaded, the compaction can be trigger again and will not loss any data,
   but we will have an orphan ledger cannot be delete in the BookKeeper cluster.
2. Remove the previous compacted Ledger after the compaction cursor move forward, make sure the new compacted Ledger ID been persistent,
   Otherwise, we might lost compacted ledger if broker crashes.

* Fix checkstyle

* Fix tests.

* Fix test
@eolivelli eolivelli added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Dec 13, 2021
eolivelli pushed a commit that referenced this pull request Dec 13, 2021
#12522)

* Fix the reader skips compacted data which original ledger been removed

The compactor update the compaction cursor(mark delete) first and then update the `compactionHorizon` of the compacted topic.
During the compaction cursor move forward, the original ledger will be removed if no other durable cursors.
At the same time, if the reader is reading data from the original ledger, the reader will skip the data while the original ledger
been removed, details to see #6787. So the reader might skip the compacted data since the
`compactionHorizon` have not updated yet.

The approach is:

1. Update the `compactionHorizon` before the compaction cursor move forward,
   so that the reader will not skip the original data before `compactionHorizon` updated.
   If the broker crashes before the new compacted Ledger ID been persistent,
   after the topic been loaded, the compaction can be trigger again and will not loss any data,
   but we will have an orphan ledger cannot be delete in the BookKeeper cluster.
2. Remove the previous compacted Ledger after the compaction cursor move forward, make sure the new compacted Ledger ID been persistent,
   Otherwise, we might lost compacted ledger if broker crashes.

* Fix checkstyle

* Fix tests.

* Fix test

(cherry picked from commit 74dd9b9)
eolivelli pushed a commit to eolivelli/pulsar that referenced this pull request Feb 25, 2022
apache#12522)

* Fix the reader skips compacted data which original ledger been removed

The compactor update the compaction cursor(mark delete) first and then update the `compactionHorizon` of the compacted topic.
During the compaction cursor move forward, the original ledger will be removed if no other durable cursors.
At the same time, if the reader is reading data from the original ledger, the reader will skip the data while the original ledger
been removed, details to see apache#6787. So the reader might skip the compacted data since the
`compactionHorizon` have not updated yet.

The approach is:

1. Update the `compactionHorizon` before the compaction cursor move forward,
   so that the reader will not skip the original data before `compactionHorizon` updated.
   If the broker crashes before the new compacted Ledger ID been persistent,
   after the topic been loaded, the compaction can be trigger again and will not loss any data,
   but we will have an orphan ledger cannot be delete in the BookKeeper cluster.
2. Remove the previous compacted Ledger after the compaction cursor move forward, make sure the new compacted Ledger ID been persistent,
   Otherwise, we might lost compacted ledger if broker crashes.

* Fix checkstyle

* Fix tests.

* Fix test

(cherry picked from commit 74dd9b9)
(cherry picked from commit df68493)
@codelipenghui codelipenghui restored the penghui/fix-skip-compaction-data-while-remove-original-ledger branch May 17, 2022 01:23
@codelipenghui codelipenghui deleted the penghui/fix-skip-compaction-data-while-remove-original-ledger branch May 17, 2022 01:29
@Technoboy- Technoboy- added the cherry-picked/branch-2.7 Archived: 2.7 is end of life label Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/broker area/compaction cherry-picked/branch-2.7 Archived: 2.7 is end of life cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.7.5 release/2.8.2 release/2.9.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants