-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vfs, table_cache: fadvise FADV_RANDOM on sstable files #215
Conversation
Some comparative benchmarks, command run (on a c5d.4xlarge) with 26gb malloc'd away by another program (see issue #198 for repro steps):
Pebble (before change):
Pebble (after change):
RocksDB (run 1):
RocksDB (run 2):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 9 files reviewed, 2 unresolved discussions (waiting on @ajkr, @itsbilal, and @petermattis)
table_cache.go, line 348 at r1 (raw file):
} if fd, err := c.fs.Fd(f); err == nil { _ = vfs.FadviseRandom(fd)
I'm not sure about this structure as it exposes Fd
which is in an implementation detail. An alternative would be to add vfs.File.AdviseRandom
or add an option to vfs.VFS.Open
.
vfs/fadvise_linux.go, line 14 at r1 (raw file):
func FadviseRandom(f uintptr) error { return unix.Fadvise(int(f), 0, 0, unix.FADV_RANDOM) }
Nit: reviewable is complaining that this file is missing a newline at the end of the file.
16f5866
to
2593217
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about this for a bit, and ended up adding a bool argument to vfs.FS.Open
. Options I considered:
-
Borrow
fileType
frompebble.filename
and passing that as an argument to Open - but that would mean exporting the type across parent/child packages and it wasn't very clear whether vfs should own that type (or if it deserved to have its own iota type defining file types) -
Making a new
vfs.FileType
- see above -
Adding a method to
vfs.File
- but this would take away the efficiency of just using*os.File
as avfs.File
.
Reviewable status: 0 of 23 files reviewed, 1 unresolved discussion (waiting on @ajkr and @petermattis)
table_cache.go, line 348 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I'm not sure about this structure as it exposes
Fd
which is in an implementation detail. An alternative would be to addvfs.File.AdviseRandom
or add an option tovfs.VFS.Open
.
Done.
2593217
to
1dbf01c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the extra parameter more than the other options you mention. There is a nuance to the parameter approach which could shrink down the size of this PR, though with slightly higher overhead to the Open
call. That approach adds an ...OpenOption
parameter. Something like:
type OpenOpen interface {
apply(f File)
}
type randomReadsOption struct{}
func (randomReadsOption) apply(f File) {
// call posix_fadvise
}
var RandomReads randomReadsOption
type FS interface {
Open(name string, opts ...OpenOption)
}
This pattern comes from https://commandcenter.blogspot.com/2014/01/self-referential-functions-and-design.html and we use it in a handful of places in CockroachDB. The downside is that it may involve an extra allocation at the caller, but for opening a file that isn't going to be a problem at all. The upside is that you don't have to specify the option at most callers. That is, fs.Open("foo")
still works.
Reviewable status: 0 of 23 files reviewed, 1 unresolved discussion (waiting on @ajkr and @petermattis)
1dbf01c
to
1804676
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the way that patterns lets us preserve the simpler Open(filename)
call. Made the change - PTAL!
Reviewable status: 0 of 10 files reviewed, 1 unresolved discussion (waiting on @ajkr and @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to test that this is working? Ideally we'd want to know that tableCache
is passing this option and that specifying this option on linux is causing an fadvise
system call. The reason for wanting a test is to make sure this doesn't rot. I think making sure tableCache
is passing the option can be done in tableCacheTestFS.Open
. Not sure about ensuring the option results in an fadvise
system call on linux.
Reviewable status: 0 of 10 files reviewed, 2 unresolved discussions (waiting on @ajkr, @itsbilal, and @petermattis)
vfs/vfs.go, line 148 at r3 (raw file):
// random reads, by calling fadvise() with POSIX_FADV_RANDOM on Linux systems // to disable readahead. Only works when specified to defaultFS. type RandomReadsOption struct{}
Rather than exporting the RandomReadsOption
type, you can make this private and export a RandomReadsOption
variable:
type randomReadsOption struct{}
var RandomReadsOption OpenOption = randomReadsOption{}
This makes the usage of the option every so slightly prettier.
This change calls fadvise with FADV_RANDOM on sstable file descriptors, to ensure that readahead is disabled. This reduces wasted I/Os when reading from sstables, since sstable reads especially for short to medium range scans do not read large enough contiguous blocks to be able to take advantage of readahead. Instead, readahead ends up reducing user-visible I/O performance. See cockroachdb#198 .
1804676
to
dba16cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I've made the tableCacheTest
change as well as the type/var export one.
Can't think of a way to be able to unit test the posix_fadvise
call itself, but the hope is that a change with a performance impact this significant will hopefully show up in our benchmarks/testing elsewhere.
Reviewable status: 0 of 10 files reviewed, 2 unresolved discussions (waiting on @ajkr, @itsbilal, and @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't think of a way to be able to unit test the posix_fadvise call itself, but the hope is that a change with a performance impact this significant will hopefully show up in our benchmarks/testing elsewhere.
Ack.
Reviewable status: 0 of 10 files reviewed, 2 unresolved discussions (waiting on @ajkr, @itsbilal, and @petermattis)
This change calls fadvise with FADV_RANDOM on sstable file descriptors,
to ensure that readahead is disabled. This reduces wasted I/Os when
reading from sstables, since sstable reads especially for short to
medium range scans do not read large enough contiguous blocks to be
able to take advantage of readahead. Instead, readahead ends up reducing
user-visible I/O performance.
See #198 .