-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize branch creation #2101
Optimize branch creation #2101
Conversation
The speedup can be measured by running the |
This doesn't look safe to me. This can happen: In the beginning, there is only one branch, 'main'. Thread A calls
Step 6 will remove data that is still needed by the new timeline. In a nutshell, GC needs to be careful to not remove data that is still needed by child branches. There's a race condition between new branch creation and GC. If a new branch is created after GC has collected its list of branches and their branch-points ( |
pageserver/src/layered_repository.rs
Outdated
// grab mutex to prevent new timelines from being created here. | ||
let _gc_cs = self.gc_cs.lock().unwrap(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part that prevented the race condition I mentioned in my comment.
So I think we'll still need
The first phase is pretty fast. It only needs to look at some in-memory data structures, and it can run concurrently with compaction. We almost have that split already. |
Thanks for taking a look into this. I have separated the GC codes into 2 phases as you described in the later comment. Also added a test Still need to do some cleanups such as adding more comments and updating the documents |
pageserver/src/layered_repository.rs
Outdated
.ok_or_else(|| anyhow::anyhow!("unknown timeline id: {}", &src))? | ||
}; | ||
|
||
let layer_removal_cs = src_timeline.layer_removal_cs.lock().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's OK if layers are removed during branch creation, I don't think this lock is needed here.
} | ||
gc_info.pitr_cutoff = pitr_cutoff_lsn; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to also update the timeline's latest_gc_cutoff_lsn
here. Otherwise essentially the same race condition can still happen:
- B starts GC and scans the list of timelines. The new timeline doesn't exist yet, so it only sees 'main'
- B calls update_gc_info() on the source timeline, setting pitr/horizon_cutoff to 1/22220000
- B releases gc_cs
- A acquires gc_cs
- A reads latest_gc_cutoff_lsn as 1/11110000.
- A finishes creating the new timeline
- B runs second phase of GC, updates latest_gc_cutoff_lsn to 1/22220000, and removes layers older than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't update the gc cutoff there because it's possible to not GC even after updating GC info. For example, that can happen when we get a pageserver shutdown request: https://github.com/neondatabase/neon/pull/2101/files#diff-2dee9987054422607c1ab7369e102927d8477dea6d75c4c7f3891e3d65b15d1eR971-R975
The situation above should not happen because I also added a check for GC info in branch_timeline
:
https://github.com/neondatabase/neon/pull/2101/files#diff-2dee9987054422607c1ab7369e102927d8477dea6d75c4c7f3891e3d65b15d1eR306-R312
Also the new test simulates that situation, so it should fail if there is a race condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The situation above should not happen because I also added a check for GC info in
branch_timeline
:
https://github.com/neondatabase/neon/pull/2101/files#diff-2dee9987054422607c1ab7369e102927d8477dea6d75c4c7f3891e3d65b15d1eR306-R312
Ah, gotcha, I missed that
87973e3
to
ac973ed
Compare
let mut pg_install_dir: PathBuf; | ||
if let Some(postgres_install_dir) = env::var_os("POSTGRES_INSTALL_DIR") { | ||
pg_install_dir = postgres_install_dir.into(); | ||
let mut pg_install_dir = if let Some(postgres_install_dir) = env::var_os("POSTGRES_INSTALL_DIR") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR. This change is to fix a clippy error. Can revert and put to another PR if needed.
7af02e4
to
7050eb4
Compare
@@ -110,8 +110,8 @@ jobs: | |||
target/ | |||
# Fall back to older versions of the key, if no cache for current Cargo.lock was found | |||
key: | | |||
v2-${{ runner.os }}-${{ matrix.build_type }}-cargo-${{ matrix.rust_toolchain }}-${{ hashFiles('Cargo.lock') }} | |||
v2-${{ runner.os }}-${{ matrix.build_type }}-cargo-${{ matrix.rust_toolchain }}- | |||
v3-${{ runner.os }}-${{ matrix.build_type }}-cargo-${{ matrix.rust_toolchain }}-${{ hashFiles('Cargo.lock') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the cache key to fix the incremental compilation error. See https://neondb.slack.com/archives/C0277TKAJCA/p1658169404632249
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I purged the cache so this hopefully won't happen again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's safe to revert the CI changes for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: my bad, it's not safe to revert the changes.
@@ -101,7 +101,7 @@ jobs: | |||
!~/.cargo/registry/src | |||
~/.cargo/git | |||
target | |||
key: ${{ runner.os }}-cargo-${{ hashFiles('./Cargo.lock') }}-rust-${{ matrix.rust_toolchain }} | |||
key: v1-${{ runner.os }}-cargo-${{ hashFiles('./Cargo.lock') }}-rust-${{ matrix.rust_toolchain }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update cache key to fix the incremental compilation error.
horizon_cutoff: Lsn, | ||
|
||
/// In addition to 'retain_lsns', keep everything newer than 'SystemTime::now()' | ||
/// minus 'pitr_interval' | ||
/// In addition to 'retain_lsns' and 'horizon_cutoff', keep everything newer than this | ||
/// point. | ||
/// | ||
pitr: Duration, | ||
/// This is calculated by finding a number such that a record is needed for PITR | ||
/// if only if its LSN is larger than 'pitr_cutoff'. | ||
pitr_cutoff: Lsn, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can combine horizon_cutoff
and pitr_cutoff
into one value. The code in gc
prints different debug messages, and uses separate counters depending on which one was smaller, but I don't think that distinction is needed. Instead, you could just print a debug message in update_gc_info
, to indicate which was dominant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree. However, it's not a trivial change though. I think it's better to leave that to a separate PR/ follow-up
pageserver/src/layered_repository.rs
Outdated
// Besides GC lock, branch creation task doesn't need to hold the `layer_removal_cs` lock, | ||
// hence doesn't need to wait for compaction/GC because it ensures that the starting LSN | ||
// of the child branch is not out of scope in the middle of the creation task by | ||
// 1. holding the GC lock to prevent overwritting timeline GC data | ||
// 2. checking both the latest GC cutoff LSN and latest GC info of the source timeline | ||
// to avoid initializing the new branch using data removed by past GC iterations | ||
// or in-queue GC iterations. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit weird to explain in great detail why this doesn't need to hold layer_removal_cs
lock. Also, this seems like a run-on sentence.
cd37135
to
eb2a2d7
Compare
… `layer_removal_cs` lock
1. Collect the list of timelines and their branchpoints and gc-horizons. 2. Scan the timeline directories to remove files.
eb2a2d7
to
a933902
Compare
One example result when running the new branch creation perf tests comparing with the latest This patch:
main:
In general, looks like a decent improvement. Will merge once tests passed |
Resolves #2054
Context: branch creation needs to wait for GC to acquire
gc_cs
lock, which prevents creating new timelines during GC. However, because individual timeline GC iteration also requirescompaction_cs
lock, branch creation may also need to wait for compactions of multiple timelines. This results in large latency when creating a new branch, which we advertised as "instantly".This PR optimizes the latency of branch creation by separating GC into two phases:
The GC bottleneck comes from step 2, which must wait for compaction of multiple timelines. This PR modifies the branch creation and GC functions to allow GC to hold the GC lock only in step 1. As a result, branch creation doesn't need to wait for compaction to finish but only needs to wait for GC data collection step, which is fast.