-
Notifications
You must be signed in to change notification settings - Fork 730
Optimize WAL storage in safekeeper #1318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fd866e6
to
10dbac2
Compare
Here are results of the test from #1144, to compare results with #1266. These results are for local run on my machine, which has relatively slow fsync (2364 ops/sec, 423 usecs/op) and not very powerful 8 core CPU. I've added "per fsync" metrics, which are useful for disk usage comparison.
These results look the same as in #1266. |
10dbac2
to
af29557
Compare
In a more usual 1wp+3sk EC2 test (#799) results are also similar to previous PR #1266, safekeepers are faster than synchronous replication:
Interesting that fsync calls count is very different between postgres synchronous replicas when pgbench -i is done, that is visible in gist report. |
af29557
to
ac9dccd
Compare
} | ||
|
||
if let Some(mut unflushed_file) = self.file.take() { | ||
self.fdatasync_file(&mut unflushed_file)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option can be matched with ref
keyword to its contents to avoid taking/returning it around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to call self.fdatasync_file
in this if, so with ref
borrow I get an error:
cannot borrow `*self` as immutable because it is also borrowed as mutable
immutable borrow occurs hererustc[E0502](https://doc.rust-lang.org/error-index.html#E0502)
let mut partial; | ||
let mut start_pos = startpos; | ||
const ZERO_BLOCK: &[u8] = &[0u8; XLOG_BLCKSZ]; | ||
if self.write_lsn != pos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this can be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't be, possible only if someone is using private API directly, functions like write_exact
.
When several AppendRequest's can be read from socket without blocking, they are processed together and fsync() to segment file is only called once. Segment file is no longer opened for every write request, now last opened file is cached inside PhysicalStorage. New metric for WAL flushes was added to the storage, FLUSH_WAL_SECONDS. More errors were added to storage for non-sequential WAL writes, now write_lsn can be moved only with calls to truncate_lsn(new_lsn). New messages have been added to ProposerAcceptorMessage enum. They can't be deserialized directly and now are used only for optimizing flushes. Existing protocol wasn't changed and flush will be called for every AppendRequest, as it was before.
ac9dccd
to
47bbe29
Compare
I've updated PR to use fdatasync in most places, and ran this test again. It seems that my machine has also much slower fsync, compared to fdatasync, and now results are almost the same:
|
When several AppendRequest's can be read from socket without blocking,
they are processed together and fsync() to segment file is only called
once. Segment file is no longer opened for every write request, now
last opened file is cached inside PhysicalStorage. New metric for WAL
flushes was added to the storage, FLUSH_WAL_SECONDS. More errors were
added to storage for non-sequential WAL writes, now write_lsn can be
moved only with calls to truncate_lsn(new_lsn).
New messages have been added to ProposerAcceptorMessage enum. They
can't be deserialized directly and now are used only for optimizing
flushes. Existing protocol wasn't changed and flush will be called for
every AppendRequest, as it was before.
This PR replaces #1266, as a cleaner version of the same optimization. Closes #1144.
I'll post test results here when they're ready.