-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auxiliary directory for supporting the multi-directory configuration. #294
Conversation
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #294 +/- ##
==========================================
+ Coverage 97.79% 97.88% +0.08%
==========================================
Files 30 30
Lines 11454 11998 +544
==========================================
+ Hits 11202 11744 +542
- Misses 252 254 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
src/file_pipe_log/pipe.rs
Outdated
/// Main directory path id. | ||
pub const DEFAULT_PATH_ID: PathId = 0; | ||
/// Auxiliary directory path id. | ||
pub const AUXILIARY_PATH_ID: PathId = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt this is necessary. What if there're 100 directories? Maybe just iterate them all whenever needing a dir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hold the different perspective on this point.
- Firstly, the original design to this is to support an EXTRA dir to store logs if and only if the main dir is full, ideally placed on another DEVICE. It serves the target that the directory of storing raft logs could be placed on an individual device to make IOs individual to other dirs' IO operations.
- Secondly, in normal cases, two directories, specified by
dir
andauxiliary_dir
, are enough for the practical production envs. And we do not expect to support 3,4, or more dir lists for users.
So, I still recommend to use this design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is contradicting the whole point of using a vector as paths. If you need to hardcode the path ID, it is clearly better to just use two variables dir1
and dir2
, because it checks the code statically instead of runtime (via vector indexing). And my originally idea of using a vector is to not restricting the amount of directories. For example, if in the future we would support #258, user can configure multiple disks to increase the write parallelism.
src/file_pipe_log/pipe.rs
Outdated
if idx >= paths.len() { | ||
break; | ||
} | ||
let disk_stats = match fs2::statvfs(&paths[idx]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore error, and avoid syscall when there's only one dir.
debug_assert!(!paths.is_empty());
if paths.len() > 1 {
for (i, p) in paths.iter().enumerate() {
if let Ok(stats) = fs2::statvfs(p) {
if stats.space > target {
return i;
}
}
}
}
return 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we need Err
, which means that there exists no free space for generating new files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two reasons: (1) statvfs
is only a hint. There could be another user deleting files or creating files. The result of fetch_dir
doesn't promise anything at all. And it is false to rely on it. (2) All the error handling logic is already in place, adding another code path that handles the error of fetch_dir
is redundant. Even if they (fetch_dir().is_err()
and write().is_err()
) share the same code path, it is always superior to have fewer exit points.
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
…pace_left. Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
src/file_pipe_log/pipe.rs
Outdated
if let Err(te) = writer.truncate() { | ||
panic!( | ||
"error when truncate {} after error: {}, get: {}", | ||
seq, e, te | ||
); | ||
} | ||
return Err(e); | ||
if is_no_space_err(&e) { | ||
// To avoid incorrect `rotate_imp` on the same dir when the size of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. If the available space is less than buf_bytes.len()
, it should panic when actually writing that buf_bytes
, instead of panicking here too early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, it won't panic but fail on engine.write(...)
unexpectedly.
Given that we have a two directories setting with dir
and spill-dir
. If dir
has free space for rotating several new log files, but no enough space for dumping the given buf
. And spill-dir
has free space for dumping this buf
.
That is:
buf_bytes.len()
>target_file_size
== true;n * target_file_size
<dir.spare_space()
&&buf_bytes.len() > dir.spare_space()
&&buf_bytes.len() < spill-dir.spare_space()
.
Then, with the previous rotate_imp(self.target_file_size)
,
-
- First attempt for writing, failed on
self.write()
and try to rotate.
- First attempt for writing, failed on
-
- As
target_file_size
<dir.spare_space()
,rotate_imp(target_file_size)
successfully, and returnedTryAgain(...)
to trigger the second attempt for write. (Naming this newly rotated file asrotate-1
.)
- As
-
- Secondly, it tried to dump this buf into
rotate-1
. Butself.write()
failed again as thedir.spare_space()
<buf_bytes.len()
, and it returned theNOSPAC
error. Buffer still could not be dumped into currentrotate-1
file. Then, it made the nextrotate_imp()
. And unfortunately, it succeed again because thedir
still had enough space for rotating a new file. So, it returnedTryAgain()
error again. It would cause the final failure onengine.write(...)
.
- Secondly, it tried to dump this buf into
-
- The whole write of this LogBatch failed because its trying times on write exceed the trying limitation. However, the
spill-dir
still had enough space for it.
- The whole write of this LogBatch failed because its trying times on write exceed the trying limitation. However, the
For example, given that, target_file_size == 1MB
, dir.spare_space() == 1.5MB
, spill-dir.spare_space() == 100MB
and buf_bytes.len() == 3MB
. The above steps would be executed.
Compiling raft-engine v0.3.0 (/nvme1n1/lucasliang/workspace/raft-engine)
Finished test [unoptimized + debuginfo] target(s) in 13.33s
Running tests/failpoints/mod.rs (target/debug/deps/failpoints-23610de2336687d7)
thread 'test_engine::test_build_engine_with_spill_dir' panicked at 'called `Result::unwrap()` on an `Err` value:
TryAgain("Failed to write logbatch, exceed max_retry_count: (2),
err: failed to write 41 file, get IO Error: Custom { kind: StorageFull, error: \"nospace\" } try to flush it to another dir")',
tests/failpoints/test_engine.rs:37:40
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
src/file_pipe_log/pipe.rs
Outdated
@@ -258,7 +263,7 @@ impl<F: FileSystem> SinglePipe<F> { | |||
fail_point!("file_pipe_log::append"); | |||
let mut writable_file = self.writable_file.lock(); | |||
if writable_file.writer.offset() >= self.target_file_size { | |||
if let Err(e) = self.rotate_imp(&mut writable_file) { | |||
if let Err(e) = self.rotate_imp(&mut writable_file, self.target_file_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ignore this case temporarily (assume buf.len is always smaller than target_file_size). You can improve it in another PR. Because it's a bit complicated than this, you also have to consider recycled file. E.g. recycled_file + available_space is not enough, but all recycled_files + available_space is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. I've considered it and will patch it just in this PR. But I'm struggling with how to add an unit case for this.
I'll update it in next commit, thx for any supplementary comments.
…ll exists recycled files. Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
src/log_batch.rs
Outdated
// `BufState::Sealed` means that `LogBatch` is under a repeated state of dumping. | ||
BufState::Encoded(header_offset, entries_len) | ||
| BufState::Sealed(header_offset, entries_len) => { | ||
self.item_batch.prepare_write(&mut self.buf, file_context)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if item batch is empty?
self.item_batch.prepare_write(&mut self.buf, file_context)?; | |
self.item_batch.prepare_write(&mut self.buf[header_offset + HEADER_LEN + entries_len ..], file_context)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if item batch is empty?
In this case, it will return Ok(0)
directly when calling engine.write(...)
.
Lines 138 to 141 in 404e3fe
pub fn write(&self, log_batch: &mut LogBatch, mut sync: bool) -> Result<usize> { | |
if log_batch.is_empty() { | |
return Ok(0); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better make the function more robust. Skip writing when empty should only be an optimization instead of being depended on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better make the function more robust. Skip writing when empty should only be an optimization instead of being depended on.
Got.
src/file_pipe_log/pipe.rs
Outdated
// current buffer is too huge to be appended directly. So, we should free | ||
// enough space by clearing several recycled log files until this buffer | ||
// could be dumped. | ||
let mut expected_size = expected_file_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(1) recycled file could be on a different disk than the full one
(2) should not modify the expected_file_size
passed into fetch_dir
. There could be another writer occupying the space you just released.
(3) if deleting recycled file is enough, there's no need to rotate, which makes the rotated file smaller than usual and hurt future performance.
This branch needs test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, these cases have already considered in this branch.
Firstly, about:
(1) recycled file could be on a different disk than the full one
That's right, this case has three corner cases(Suppose main-dir
is full, and if and only if the first attempt by engine.write(...)
is failed, will the whole writing progress try to enter this branch.):
-
main-dir
has no recycled logs, andspill-dir
have several recycled logs.
As themain-dir
is full, the followingfind_available_dir
must skip themain-dir
, andspill-dir
should prepare enough space for this write. Otherwise, the whole write progress(engine.write(...)
) will fail finally withErr:TryAgain
.
So, in this case, the deletion is necessary for preparing enough space for write. And if there did not exists enough space for writing thisLogBatch
, it will fail as expected because there truly exists no enough space on both disks.
-
main-dir
has several recycled logs, andsum(recycled_logs.size())
<expected_file_size
, but no recycled logs exist inspill-dir
.
After deletion, themain-dir
still have no enough space for this second attempt. So, the followingfind_available_dir
must skip themain-dir
as expected. And whether this second attempt on write will success or not depends on whether there exists enough space onspill-dir
. It's expected. So, int this case, this deletion is also necessary.
-
- Both
main-dir
andspill-dir
have several recycled logs.
The deletion progress will try to release free space on one dir until it's enough to dump this LogBatch into this dir. If and only if free space is still not enough, that is,deleted_recycled_logs.size() + free_space.size()
<expected_file_size
, does fail this second write. So, the deletion should be kept.
- Both
Secondly, about:
(2) should not modify the expected_file_size passed into fetch_dir. There could be another writer occupying the space you just released.
Here, the expected_file_size
do not be modified, just copy it to expected_size
in this pr.
And for:
(3) if deleting recycled file is enough, there's no need to rotate, which makes the rotated file smaller than usual and hurt future performance.
Yep, unless expected_file_size
<= self.target_file_size
, it will do reuse recycled logs by rotate
. Otherwise, it will try to free enough space and find an available dir to place this LogBatch by the following find_available_dir
directly.
As for testing on this branch, I've do it by the following manual case on centos7
:
#[test]
fn test_build_engine_with_spill_dir() {
fn mount_tmpfs(path: &str) {
debug_assert!(!path.is_empty());
#[cfg(target_os = "linux")]
{
use std::process::Command;
let mut mk_device = Command::new("sudo");
mk_device.args([
"mount",
"-t",
"tmpfs",
"-o",
"size=128M",
"tmpfs",
path.into(),
]);
mk_device.status().expect("process failed to execute");
}
}
fn umount_tmpfs(path: &str) {
debug_assert!(!path.is_empty());
#[cfg(target_os = "linux")]
{
use std::process::Command;
let mut mk_device = Command::new("sudo");
mk_device.args(["umount", path.into()]);
mk_device.status().expect("process failed to execute");
}
}
let main_dir = tempfile::Builder::new()
.prefix("test_build_engine_with_spill_dir_main")
.tempdir()
.unwrap();
let spill_dir = tempfile::Builder::new()
.prefix("test_build_engine_with_spill_dir_spill")
.tempdir()
.unwrap();
{
// Mount a tmpfs with limited size.
mount_tmpfs(main_dir.path().to_str().unwrap());
}
let cfg = Config {
dir: main_dir.path().to_str().unwrap().to_owned(),
spill_dir: Some(spill_dir.path().to_str().unwrap().to_owned()),
target_file_size: ReadableSize::mb(1),
purge_threshold: ReadableSize::mb(128),
enable_log_recycle: true,
prefill_for_recycle: true,
..Default::default()
};
let data = vec![b'x'; 17371];
let engine = Engine::open(cfg.clone()).unwrap();
let _f1 = FailGuard::new("file_pipe_log::force_choose_dir", "return");
for rid in 1..12000 {
assert!(catch_unwind_silent(|| append(&engine, rid, 1, 100, Some(&data))).is_ok());
}
{
// TODO: failed because current pid still occupy this dir.
umount_tmpfs(main_dir.path().to_str().unwrap());
}
}
But I'm wondering how to release the temporarily mounted device and make it approved by CI/CD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I don't even know how to respond to such a long comment. If the situation is indeed so complicated, you should comment it in the code so that we can discuss line by line there.
(1.i) So, in this case, the deletion is necessary for preparing enough space for write.
You didn't check the available space of spill_dir. It might not need deletion at all.
(1.iii) The deletion progress will try to release free space on one dir until it's enough to dump this LogBatch into this dir.
Same thing as before, you don't know if the other directory has enough available space so that it can avoid deletion. And in addition to that, your code won't work if the sum of deleted files from dir1 and dir2 reaches target_size, but the the two individual directories still haven't deleted enough.
(2) Here, the expected_file_size do not be modified, just copy it to expected_size in this pr.
Well, it's a misread, but you shouldn't copy something into another with nearly the same name.
Yep, unless expected_file_size <= self.target_file_size, it will do reuse recycled logs by rotate. Otherwise, it will try to free enough space and find an available dir to place this LogBatch by the following find_available_dir directly.
I think there's some misunderstanding. I meant you don't need to rotate the active file, i.e. no need to close the file writer and open a new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for testing on this branch, I've do it by the following manual case on centos7:
Just some failpoint unit tests are enough. Hardware doesn't really matter to this particular function. I only reason I asked before to test on hardware, is because I don't know if truncating a file on an already full disk can succeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion is to move the whole deletion outside of rotate
. Like this:
fn try_make_space(target) {
let used = vec![0; self.paths.len()];
for f in recycled_files {
used[f.path_id] += f.handle.file_size()?;
}
for i in 0..self.paths.len() {
if used[i] + self.paths[i].available_space > target + HEADER_LEN {
// remove some.
break;
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't check the available space of spill_dir. It might not need deletion at all.
IMO, the too early and unified deletion
is focused on reducing the extra checking of available_space
on each dir, and unify the code into one branch expected_file_size > target_file_size
. For example, the available_space
consists of recycled logs and free space on each dir respectively, and the checking and deletion of recycled files should be separated individually.
As this pr has paid too long comments on this point, making it too ugly to be reviewed, I accept the previous advice that we should open another issue to focus on optimizing this issue.
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
…ospace error. Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is #294 (comment) tested?
for (i, file_pair) in files.windows(2).enumerate() { | ||
// If there exists a black hole or duplicate scenario on FileSeq, these | ||
// files should be skipped and cleared. | ||
if file_pair[1].seq - file_pair[0].seq != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is duplicate file tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this case has already been supplemented in src/engine.rs +2735
as following shows:
// Case 4: abnormal case - duplicate FileSeq among different dirs.
{
// Prerequisite: choose several files and duplicate them to main dir.
let mut file_count = 0;
std::fs::read_dir(paths[1]).unwrap().for_each(|e| {
let p = e.unwrap().path();
let file_name = p.file_name().unwrap().to_str().unwrap();
if let Some(FileId {
queue: LogQueue::Append,
seq: _,
}) = FileId::parse_file_name(file_name)
{
let mut dst_path = PathBuf::from(&paths[0]);
dst_path.push(file_name);
if file_count % 2 == 0 {
std::fs::copy(p, dst_path).unwrap();
}
file_count += 1;
}
});
}
let engine = engine.reopen();
// Duplicate log files will be skipped and cleared.
assert!(engine.file_span(LogQueue::Append).0 > start_5);
update changelog. |
Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>
Done for it in the |
ping |
Also done in the last commit: * Add a new configuration `spill-dir` to allow automatic placement of logs into an auxiliary directory when `dir` is full. |
Brief Introduction
Close #257 .
This pr is used to support the
auxiliary directory
feature, referring to the previous pr: #261.It mainly contains:
auxiliary-dir
to enable this feature. Default isNone
, and if set withSome(...)
, this directory will be used if the main dir is full, specified bydir
option.IMO,
auxiliary-dir
is more conspicuous to understand and use for users, compared to the previous namesecondary-dir
.