Read performance degradation during compaction workload

We ([go-ethereum](https://github.com/ethereum/go-ethereum/)) are experiencing a significant degradation in  database read performance 
whenever a compaction process is initiated. 

Version: `github.com/cockroachdb/pebble v1.1.2`
Hardware: 32GB memory, Samsung 980Pro 2TB SSD, 28 Core i7-14700K

The database configuration is shown as below:

```go
	opt := &pebble.Options{
		// Pebble has a single combined cache area and the write
		// buffers are taken from this too. Assign all available
		// memory allowance for cache.
		Cache:        pebble.NewCache(int64(2 * 1024 * 1024 * 1024)),
		MaxOpenFiles: 524288,

		// The size of memory table(as well as the write buffer).
		// Note, there may have more than two memory tables in the system.
		MemTableSize: uint64(512 * 1024 * 1024),

		MemTableStopWritesThreshold: 2,

		// The default compaction concurrency(1 thread),
		// Here use all available CPUs for faster compaction.
		MaxConcurrentCompactions: runtime.NumCPU,

		// Per-level options. Options for at least one level must be specified. The
		// options for the last level are used for all subsequent levels.
		Levels: []pebble.LevelOptions{
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
		},
	}
```

---

The read performance without the compaction workload is stable. The average time to 
load a single data block (~4KB) from disk (not in cache) during normal read operations 
is **40µs**. (This data was obtained by injecting debug code into Pebble.)

However, when the compaction process starts, the average time to load a single data 
block (~4KB) from disk (not in cache) increases to **80µs**, roughly 2x slower.

Meanwhile, the average time to load a single data block (~4KB) during compaction is 
significantly faster, around **8µs**. I suspect this discrepancy may be related to the following
factors:

- Files involved in compaction are opened with the `FADV_SEQUENTIAL` flag, which 
  optimizes the OS’s write-ahead mechanism.
- Data blocks corresponding to these files in compaction are likely to be found in the OS
  page cache, whereas normal reads often target data in the bottom-most level, where 
  blocks are less likely to be cached. **Although I have no evidence to prove it**

--- 

What I don't really understand is why the data block loading from disk performance could
be 2x slower when compaction is actively running? 

At first I suspected that when there are too many concurrent reads (compaction is concurrent, 
so there may be many concurrent disk reads in the system), the file reading efficiency will 
decrease. However, only the data loading in normal Get slowed down, not compaction. 

And after I changed all concurrency to single-threaded sequential reading, the same phenomenon
still occurred.

Do you have any insights about this weird phenomenon and potentially any suggestion to
address it?

---

The branch I used for debugging: https://github.com/rjl493456442/pebble/commits/gary-debug/



Jira issue: PEBBLE-286

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read performance degradation during compaction workload #4109

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Read performance degradation during compaction workload #4109

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions