-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Immutable tiered storage #962
Conversation
Codecov Report
@@ Coverage Diff @@
## master #962 +/- ##
==========================================
+ Coverage 44.24% 44.74% +0.49%
==========================================
Files 140 147 +7
Lines 11570 11850 +280
==========================================
+ Hits 5119 5302 +183
- Misses 5800 5867 +67
- Partials 651 681 +30
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Will use this to store commits.
5b8f6b0
to
b353aae
Compare
b353aae
to
856e506
Compare
99cb63e
to
b9ac0b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Probably need to unit-test everything here; in particular I suspect file creation might not do what we want (comments marked).
f26f5b2
to
77ed7cb
Compare
add7902
to
21c6826
Compare
834af95
to
fe134d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Sorry it's taking so long. As discussed the largest change is probably to write files outside the scope of the same files as reading them. Specifically we need to be able to close a to-be-written-to-S3 file without copying it out to S3, e.g. to abort it when Pebble has a write error. And that can even happen when Pebble is closing the file, at which point it commits to close its received file.
0f4617f
to
9acd925
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice, and a lot simpler!
Some issues:
- I don't think size-based LRU will make good decisions (I am not familiar with any literature about it either way, though). I think that it always makes cache thrashing worse. This is a criticism of our fork of the LRU cache, I just think it is particularly relevant here in our use-case: most user workloads will have a mix of very large and very small files (say, commits vs. trees), and frequent access to a small file should prefer to displace small files to large ones. We can partly overcome this with multiple pyramids, but then we need to configure each one separately.
- Need to delete empty directories from the cache. Otherwise a long-lived system can end up with high fan-out, which is even less efficient on POSIX than on S3.
- Can we handle paths with zero-length subdirectories? In S3 the paths
s3://foo/a////b
ands3://foo/a///b
are distinct, whereas in POSIX/a////b
,/a///b
and/a/b
all point to the same place.
If we never use pyramid to cache general user files on disk, then we can just fail pathnames with an empty directory component. Otherwise we will need metadata.
} | ||
|
||
func (f *ROFile) Stat() (os.FileInfo, error) { | ||
return f.fh.Stat() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do want to touch here. But... as long as we collect useful metrics from pyramid (not necessarily in this PR!), that's OK either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a touch - metrics in next PR.
return nil | ||
} | ||
|
||
func (tfs *TierFS) removeFromLocal(rPath relativePath) { | ||
removeFromLocal(tfs.fsLocalBaseDir, rPath) | ||
// This will be called by the cache eviction mechanism during entry insert. | ||
// We don't want to wait while the file is being removed from the local disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! But it does mean we will need to be very careful how we handle errors there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing much we can do, right?
For now logging it, will add metrics later on.
pyramid/tierFS.go
Outdated
@@ -320,6 +304,10 @@ func (tfs *TierFS) newLocalFileRef(namespace, filename string) localFileRef { | |||
} | |||
|
|||
func (tfs *TierFS) objPointer(namespace, filename string) block.ObjectPointer { | |||
if runtime.GOOS == "windows" { | |||
filename = strings.ReplaceAll(filename, `\\'`, "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is fundamentally unsafe, e.g. if/when MacOS decides it has a new way of writing file paths. I think that in all files that are not local, we should avoid filepath.Join
in favour of URL-based joiners or our very own, and write our own /
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow - how can we prepare for a scenario when MacOS decides to use '^' as the separator? Filenames can contain the new separator.
pyramid/file_test.go
Outdated
) | ||
|
||
func TestPyramidWriteFile(t *testing.T) { | ||
filename := uuid.Must(uuid.NewRandom()).String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of UUIDs or of random data in tests, but if we're going to use them then this form is documented as exactly the above:
filename := uuid.Must(uuid.NewRandom()).String() | |
filename := uuid.New().String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed - need to some randomness so that subsequent runs after a fail test won't look at the same data (if failed to be deleted)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
2 small requests for follow on PRs:
- Rename tierFS.go to tier_fs.go.
- Monitoring. It's a cache, and we've already disagreed about lru.
return f.fh.Read(p) | ||
} | ||
|
||
func (f *File) ReadAt(p []byte, off int64) (n int, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
off confused me. please change to offset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather keep the same os.File terminology :)
return f.fh.Sync() | ||
} | ||
|
||
func (f *File) Close() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be simpler to use Store instead of Close. Store will close the file.
If someone wants to close without storing - he/she should use of.File
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to unify them. But Close
may be called by a different layer, e.g. while closing sstable writer.
pyramid/file_test.go
Outdated
func TestPyramidWriteFile(t *testing.T) { | ||
filename := uuid.Must(uuid.NewRandom()).String() | ||
filepath := path.Join("/tmp", filename) | ||
defer os.Remove(filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be after create
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outdated comment.
|
||
// ROFile is pyramid wrapper for os.file that implements io.ReadCloser | ||
// with hooks for updating evictions. | ||
type ROFile struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File should be WOFile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? it's also readable.
) | ||
|
||
// File is pyramid wrapper for os.file that triggers pyramid hooks for file actions. | ||
type File struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the file name kept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temp file name, which is only relevant during the write itself, is held by the os.File handle.
The final filename is passed during Store
.
type relativePath string | ||
|
||
// localFileRef consists of all possible local file references | ||
type localFileRef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that the two last properties will be methods on localFileRef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. But then I'll have to keep a pointer to tfs, which is a bit messy.
I don't feel strongly about it, willing to change if you want.
Initial draft of the tiered storage.