Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Remote Compaction] Fix Compaction Stats #13071

Closed
wants to merge 4 commits into from

Conversation

jaykorean
Copy link
Contributor

@jaykorean jaykorean commented Oct 15, 2024

Summary

Compaction stats code is not so straightforward to understand. Here's a bit of context for this PR and why this change was made.

  • CompactionStats (compaction_stats_.stats): Internal stats about the compaction used for logging and public metrics.
  • CompactionJobStats (compaction_job_stats_): The public stats at job level. It's part of Compaction event listener and included in the CompactionResult.
  • CompactionOutputsStats: output stats only. resides in CompactionOutputs. It gets aggregated toward the CompactionStats (internal stats).

The internal stats, compaction_stats_.stats, has the output information recorded from the compaction iterator, but it does not have any input information (input records, input output files) until UpdateCompactionStats() gets called. We cannot simply call UpdateCompactionStats() to fill in the input information in the remote compaction (which is a subcompaction of the primary host's compaction) because the compaction->inputs() have the full list of input files and UpdateCompactionStats() takes the entire list of records in all files. num_input_records gets double-counted if multiple sub-compactions are submitted to the remote worker.

The job level stats (in the case of remote compaction, it's subcompaction level stat), compaction_job_stats_, has the correct input records, but has no output information. We can use UpdateCompactionJobStats(compaction_stats_.stats) to set the output information (num_output_records, num_output_files, etc.) from the compaction_stats_.stats, but it also sets all other fields including the input information which sets all back to 0.

Therefore, we are overriding UpdateCompactionJobStats() in remote worker only to update job level stats, compaction_job_stats_, with output information of the internal stats.

Baiscally, we are merging the aggregated output info from the internal stats and aggregated input info from the compaction job stats.

In this PR we are also fixing how we are setting is_remote_compaction in CompactionJobStats.

  • OnCompactionBegin event, if options.compaction_service is set, is_remote_compaction=true for all compactions except for trivial moves
  • OnCompactionCompleted event, if any of the sub_compactions were done remotely, compaction level stats's is_remote_compaction will be true

Other minor changes

  • num_output_records is already available in CompactionJobStats. No need to store separately in CompactionResult.
  • total_bytes is not needed.
  • Renamed SubcompactionState::AggregateCompactionStats() to SubcompactionState::AggregateCompactionOutputStats() to make it clear that it's only aggregating output stats.
  • Renamed SetTotalBytes() to AddBytesWritten() to make it more clear that it's adding total written bytes from the compaction output.

Test Plan

Unit Tests added and updated

./compaction_service_test

@jaykorean jaykorean force-pushed the fix_remote_compaction_stats branch 2 times, most recently from 7b2e5d1 to 7e3efc5 Compare October 16, 2024 07:27
@jaykorean jaykorean changed the title Fix Remote Compaction Stats [Remote Compaction] Fix Compaction Stats Oct 16, 2024
@jaykorean jaykorean force-pushed the fix_remote_compaction_stats branch from 7e3efc5 to 543c04c Compare October 16, 2024 07:33
@jaykorean jaykorean force-pushed the fix_remote_compaction_stats branch from 14c7824 to 4a4a8f2 Compare October 16, 2024 17:04
@@ -89,6 +89,8 @@ void CompactionJobStats::Add(const CompactionJobStats& stats) {

num_single_del_fallthru += stats.num_single_del_fallthru;
num_single_del_mismatch += stats.num_single_del_mismatch;

is_remote_compaction |= stats.is_remote_compaction;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically means, if any of the subcompaction was done remotely, parent job level stats' is_remote_compaction = true.

@jaykorean jaykorean marked this pull request as ready for review October 16, 2024 17:29
@jaykorean jaykorean requested review from cbi42 and anand1976 October 16, 2024 17:29
@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@anand1976 anand1976 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although the whole thing is a bit confusing. Maybe at some point, we should refactor into separate CompactionInputStats and CompactionOutputStats.

// Build compaction result
// Build Compaction Job Stats

// 1. Aggregate CompactionOutputStats into Internal Compaction Stats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Comment is a bit confusing as CompactionOutputStats is not a defined type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside compact_->AggregateCompactionStats() it calls sc.AggregateCompactionOutputStats(compaction_stats); (which I renamed from AggregateCompactionStats()), and AggregateCompactionOutputStats() calls compaction_stats.stats.Add(compaction_outputs_.stats_); and compaction_outputs_.stats_ is actually InternalStats::CompactionOutputsStats type.

InternalStats::CompactionOutputsStats stats_;

As you've mentioned, it's honestly very confusing. I do agree that we should invest our time to re-organize and improve this. There was one comment Siying left two years ago and I hope to address it at some point. (#9964 (review))

@facebook-github-bot
Copy link
Contributor

@jaykorean merged this pull request in f225578.

@jaykorean jaykorean deleted the fix_remote_compaction_stats branch October 17, 2024 02:33
jaykorean added a commit that referenced this pull request Oct 17, 2024
Summary:
Compaction stats code is not so straightforward to understand. Here's a bit of context for this PR and why this change was made.

- **CompactionStats (compaction_stats_.stats):** Internal stats about the compaction used for logging and public metrics.
- **CompactionJobStats (compaction_job_stats_)**: The public stats at job level. It's part of Compaction event listener and included in the CompactionResult.
- **CompactionOutputsStats**: output stats only. resides in CompactionOutputs. It gets aggregated toward the CompactionStats (internal stats).

The internal stats, `compaction_stats_.stats`, has the output information recorded from the compaction iterator, but it does not have any input information (input records, input output files) until `UpdateCompactionStats()` gets called. We cannot simply call `UpdateCompactionStats()` to fill in the input information in the remote compaction (which is a subcompaction of the primary host's compaction) because the `compaction->inputs()` have the full list of input files and `UpdateCompactionStats()` takes the entire list of records in all files. `num_input_records` gets double-counted if multiple sub-compactions are submitted to the remote worker.

The job level stats (in the case of remote compaction, it's subcompaction level stat), `compaction_job_stats_`, has the correct input records, but has no output information. We can use `UpdateCompactionJobStats(compaction_stats_.stats)` to set the output information (num_output_records, num_output_files, etc.) from the `compaction_stats_.stats`, but it also sets all other fields including the input information which sets all back to 0.

Therefore, we are overriding `UpdateCompactionJobStats()` in remote worker only to update job level stats, `compaction_job_stats_`, with output information of the internal stats.

Baiscally, we are merging the aggregated output info from the internal stats and aggregated input info from the compaction job stats.

In this PR we are also fixing how we are setting `is_remote_compaction` in CompactionJobStats.
- OnCompactionBegin event, if options.compaction_service is set, `is_remote_compaction=true` for all compactions except for trivial moves
- OnCompactionCompleted event, if any of the sub_compactions were done remotely, compaction level stats's `is_remote_compaction` will be true

Other minor changes
- num_output_records is already available in CompactionJobStats. No need to store separately in CompactionResult.
- total_bytes is not needed.
- Renamed `SubcompactionState::AggregateCompactionStats()` to `SubcompactionState::AggregateCompactionOutputStats()` to make it clear that it's only aggregating output stats.
- Renamed `SetTotalBytes()` to `AddBytesWritten()` to make it more clear that it's adding total written bytes from the compaction output.

Pull Request resolved: #13071

Test Plan:
Unit Tests added and updated
```
./compaction_service_test
```

Reviewed By: anand1976

Differential Revision: D64479657

Pulled By: jaykorean

fbshipit-source-id: a7a776a00dc718abae95d856b661bcbafd3b0ed5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants