Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-9039. Added a test to verify that no compaction log entry is added to compactionLogTable and DAG when tarball creation is in progress. #6171

Closed
wants to merge 1 commit into from

Conversation

hemantk-12
Copy link
Contributor

What changes were proposed in this pull request?

There are two major changes in this PR.

  1. Added a new test to validate that no new compaction log entry is added to compactionLogTable and DAG while tarball creation is in progress.
  2. Made waitForTarballCreation synchronized function so that wait and notify threads are on a same object's monitor.

What is the link to the Apache JIRA

HDDS-9039

How was this patch tested?

Ran newly added test locally and fork branch.

@hemantk-12 hemantk-12 added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Feb 5, 2024
Copy link
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testcase doesn't really test the race condition explained in the comments.

@@ -587,7 +587,7 @@ void addToCompactionLogTable(CompactionLogEntry compactionLogEntry) {
* Check if there is any in_progress tarball creation request and wait till
* all tarball creation finish, and it gets notified.
*/
private void waitForTarballCreation() {
private synchronized void waitForTarballCreation() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this change help in anyway? I don't think rocksdb will do multiple compaction in parallel. onCompactionCompleted method will commit the compaction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we would need a lock on the tarballRequestCount when incrementing and decrementing request and add the compaction log entry within the lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForTarballCreation wouldn't really ensure the count would be zero after the method call. There could be a race condition here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tarballRequestCount is an AtomicInteger so the incrementing and decrementing it would be atomic.

waitForTarballCreation is synchronized because wait/notify are supposed to be allowed on the same object's lock otherwise it throws IllegalMonitorStateException. Doc: wait/notifyAll.

More in discussion on jira: HDDS-9039 why we are adding the test this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I didn't understand is getCheckpoint() method is not taking a synchronized lock. So the question if waitForTarballCreation() passes.

can still run. Thus isn't there a race condition here? So while compaction is running we still end up starting a tarballing for the checkpoint.

Copy link
Contributor Author

@hemantk-12 hemantk-12 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right if we are appending compaction log entries to a txt file but we don't use that anymore and add entries to compactionLog columnFamily.

I gave it more thought and I think this locking is not needed anymore because we use RocksDB column family for compaction now. So compaction entry will be either present in the table or not in the snapshot of the ActiveFS depending on the order of compaction entry append and checkpoint creation. Hence we can simply remove this locking code and just rely on RocksDB.

Original discussion to add lock: #4680 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i agree we don't need the locks since we are writing it into a rocksdb table. This is fine as long as we are using rocksdb checkpoints to take a checkpoint of the rocksdb.

@hemantk-12
Copy link
Contributor Author

Closing it in favor of #6552

@hemantk-12 hemantk-12 closed this Apr 17, 2024
@hemantk-12 hemantk-12 deleted the HDDS-9039 branch October 28, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
snapshot https://issues.apache.org/jira/browse/HDDS-6517
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants