-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More Robust GC #3712
More Robust GC #3712
Conversation
@whyrusleeping this is very much a work in progress, just want some early feedback. |
This is based on #3700 use this link for a better comparison until that is merged: kevina/enumerate-children-refactor...kevina/more-robust-gc |
cc305ec
to
886ac2f
Compare
This took longer than I would of like and I am not sure I am happy with the approach but it should work. Everything is tested, but the test could also use some improvements in order to document why what they are testing is important. |
pin/gc/gc.go
Outdated
} | ||
} | ||
if errors { | ||
errOutput <- CouldNotDeleteSomeBlocksError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ErrCouldNotDeleteSomeBlocks
pin/gc/gc.go
Outdated
@@ -22,58 +25,73 @@ var log = logging.Logger("gc") | |||
// | |||
// The routine then iterates over every block in the blockstore and | |||
// deletes any block that is not found in the marked set. | |||
func GC(ctx context.Context, bs bstore.GCBlockstore, ls dag.LinkService, pn pin.Pinner, bestEffortRoots []*cid.Cid) (<-chan *cid.Cid, error) { | |||
// | |||
func GC(ctx context.Context, bs bstore.GCBlockstore, ls dag.LinkService, pn pin.Pinner, bestEffortRoots []*cid.Cid) (<-chan *cid.Cid, <-chan error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a cleaner return value would be to return a channel of 'results'. Where a result would be defined as something like:
type Result struct {
Cid *cid.Cid
Err error
}
Its always simpler to just have to deal with a single channel in cases like these, it can cause tricky race conditions (though the race conditions probably arent an issue here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is one of the things I didn't like also. Working on fixing that today.
pin/gc/gc.go
Outdated
} | ||
} | ||
|
||
var CoundNotFetchAllLinksError = errors.New("garbage collection aborted: could not retrieve some links") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure errors like this have an Err
prefix
test_description="Test ipfs pinning operations" | ||
|
||
. lib/test-lib.sh | ||
set -e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use set -e, sharness handles these sorts of things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(you can run the tests with -i to stop when a test fails)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove it if you want, but note it does serve a very useful purpose. Please see #3721. I already know about -i
and that is not why i am using it.
# MIT Licensed; see the LICENSE file in this repository. | ||
# | ||
|
||
test_description="Test ipfs pinning operations" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to update the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah :)
LEAF4=zb2rhnvAVvfDDnndcfQkwqNgUm94ba3zBSXyJKCfVXwU4FXx | ||
|
||
test_expect_success "remove a leaf node from the repo manually" ' | ||
rm $LEAF1FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quotes around the variable, just in case it changes
|
||
test_expect_success "create a permission problem" ' | ||
chmod 500 `dirname $LEAF2FILE` && | ||
test_must_fail ipfs block rm $LEAF2 2>&1 | tee err && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably fine just to redirect this to 'err'. May also want to name 'err' something a tiny bit more descriptive like 'block_rm_err'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason I do it this was is so rather than a simple '2> err' is so that I can see the output of the command when developing the test. It is far easier than having to do a cat "trash directory XXX/err"
. The sharness framework will suppress both stdout and stderr from a test unless '-v' is given so I don't see extra noise as a problem. If the test does fail for some reason then it is helpful to see the output of the command without having to track down the output file name. If a test fails on a build server the output file may not even be accessible.
Nevertheless, I understand it is not the convention used in the other tests, so if you still don't like I it I will reluctantly change it after I am done developing the tests and the code. Let me know.
4a71e51
to
cdccbae
Compare
pin/gc/gc.go
Outdated
} else { | ||
return gcs, nil | ||
} | ||
} | ||
|
||
var CoundNotFetchAllLinksError = errors.New("garbage collection aborted: could not retrieve some links") | ||
var ErrCoundNotFetchAllLinks = errors.New("garbage collection aborted: could not retrieve some links") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here (not sure if you fixed already, going through commit by commit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -37,6 +39,23 @@ test_gc_robust_part1() { | |||
ipfs repo gc | |||
' | |||
|
|||
test_expect_success "corrupt the root node of 1MB file" ' | |||
ls -l "$HASH1FILE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this ls here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was seeing something. Will remove.
grep -q "aborted" gc_err | ||
' | ||
|
||
test_expect_success "leaf nodes where not removed after gc" ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, s/where/were/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
test_expect_success "leaf nodes where not removed after gc" ' | ||
ipfs cat $LEAF3 > /dev/null && | ||
ipfs cat $LEAF4 > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this test run both with and without a daemon? If yes, we should add a --timeout
flag here, if no, we should make sure to run this test with and without a daemon ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping Um, that is what the "--offline" flag is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Touche
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved via irc, no need for --timeout
flag.
core/commands/repo.go
Outdated
|
||
outChan := make(chan interface{}) | ||
outChan := make(chan interface{}, len(gcOutChan)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does len on a channel tell us? Is it the capacity? or the number of items currently waiting in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, should be cap
. Sorry. Will fix.
core/commands/repo.go
Outdated
default: | ||
res.SetError(corerepo.NewMultiError(errors...), cmds.ErrNormal) | ||
return | ||
err := corerepo.CollectResult(gcOutChan, func(k *cid.Cid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though you probably want to select with a context here to make sure there are no accidental hangs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping I am never sure when I should be using a context and when it is okay to just read from a channel. When is checking for ctx.Done() important in the pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(discussed in IRC, putting here for posterity)
Any time youre sending or receiving on a channel that is outside the control of your immediate goroutine, its a good idea to use a context. In this case, the reader is the commands lib, and the commands lib will stop reading from the channel if the user cancels the operation.
@@ -51,6 +56,7 @@ order to reclaim hard disk space. | |||
}, | |||
Options: []cmds.Option{ | |||
cmds.BoolOption("quiet", "q", "Write minimal output.").Default(false), | |||
cmds.BoolOption("stream-errors", "Stream errors.").Default(false), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I also like this. It is a good option.
pin/gc/gc.go
Outdated
@@ -66,7 +66,7 @@ func GC(ctx context.Context, bs bstore.GCBlockstore, ls dag.LinkService, pn pin. | |||
err := bs.DeleteBlock(k) | |||
if err != nil { | |||
errors = true | |||
output <- Result{Error: &CouldNotDeleteBlockError{k, err}} | |||
output <- Result{Error: &ErrCouldNotDeleteBlock{k, err}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you had originally was good. It seems the go convention for error types is to suffix them with the word Error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Will revert that commit.
cdccbae
to
9cfba7d
Compare
@whyrusleeping Okay, I address all your comments. I think this is ready. Will rebase once #3700 is merged. |
a40d15c
to
b95e413
Compare
@kevina great, thanks! |
Thanks. |
core/commands/repo.go
Outdated
@@ -87,26 +110,29 @@ order to reclaim hard disk space. | |||
return nil, err | |||
} | |||
|
|||
marshal := func(v interface{}) (io.Reader, error) { | |||
obj, ok := v.(*corerepo.KeyRemoved) | |||
for v := range outChan { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should still use the marshal func construction like it previously did. Output for stdout can go through the same style of returned pipe, and we can print to stderr within that.
The reason for this is to avoid the double error printing that you can see in @Kubuxu's sample output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure I am following, but I attempted to do what you want. It did not seam to help with the duplicate error reporting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping for example here is the output from a test case when run with -v
expecting success:
test_must_fail ipfs repo gc --stream-errors 2>&1 | tee repo_gc_out &&
grep -q "Error: could not retrieve links for $LEAF1" repo_gc_out &&
grep -q "Error: could not retrieve links for $LEAF2" repo_gc_out &&
grep -q "Error: garbage collection aborted" repo_gc_out
Error: could not retrieve links for QmSijovevteoY63Uj1uC5b8pkpDU5Jgyk2dYBqz3sMJUPc: merkledag: not found
Error: could not retrieve links for QmTbPEyrA1JyGUHFvmtx1FNZVzdBreMv8Hc8jV9sBRWhNA: The block referred to by 'QmTbPEyrA1JyGUHFvmtx1FNZVzdBreMv8Hc8jV9sBRWhNA' was not a valid merkledag node
Error: garbage collection aborted: could not retrieve some links
01:35:09.788 ERROR commands/h: encountered errors during gc run client.go:247
Error: encountered errors during gc run
ok 48 - 'ipfs repo gc --stream-errors' should abort and report each error separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping since using the marshal func construction didn't fix the double error reporting, do you still want to use the marshal func construction, I think it makes the code more complicated to understand. But I can leave it if you want.
License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>
if !ok { | ||
return nil, u.ErrCast() | ||
} | ||
|
||
buf := new(bytes.Buffer) | ||
if obj.Error != "" { | ||
fmt.Fprintf(res.Stderr(), "Error: %s\n", obj.Error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use fmt.Fprintf and not use a standard ipfs logger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent was for the output to go to the client's stderr. The logger won't do thus as far as I know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this part of code runs on Client CLI (IIRC) not on the daemon so we are handling logging/errors manually here.
core/commands/repo.go
Outdated
@@ -39,6 +40,11 @@ var RepoCmd = &cmds.Command{ | |||
}, | |||
} | |||
|
|||
type GcResult struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every type exported and every attribute that's also exported should have a well described comment above.
core/corerepo/gc.go
Outdated
|
||
func CollectResult(ctx context.Context, gcOut <-chan gc.Result, cb func(*cid.Cid)) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add here also a doc string above the function name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CollectResult collects the output of a garbage collection run and calls the given callback for each object removed. It also collects all errors into a MultiError which is returned after the gc is completed
core/corerepo/gc.go
Outdated
return &MultiError{errs[:len(errs)-1], errs[len(errs)-1]} | ||
} | ||
|
||
type MultiError struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also you could tell that the type implements the Error interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioning that it implements the error interface is pretty redundant and not standard go convention
core/corerepo/gc.go
Outdated
roots, err := BestEffortRoots(n.FilesRoot) | ||
if err != nil { | ||
return nil, err | ||
func NewMultiError(errs ...error) *MultiError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// NewMultiError creates a new MultiError object from a given slice of errors
pin/gc/gc.go
Outdated
@@ -14,6 +16,11 @@ import ( | |||
|
|||
var log = logging.Logger("gc") | |||
|
|||
type Result struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Result represents an incremental output from a garbage collection run. It contains either an error, or the cid of a removed object
pin/gc/gc.go
Outdated
@@ -83,35 +103,74 @@ func Descendants(ctx context.Context, getLinks dag.GetLinks, set *cid.Set, roots | |||
return nil | |||
} | |||
|
|||
func ColoredSet(ctx context.Context, pn pin.Pinner, ls dag.LinkService, bestEffortRoots []*cid.Cid) (*cid.Set, error) { | |||
func ColoredSet(ctx context.Context, pn pin.Pinner, ls dag.LinkService, bestEffortRoots []*cid.Cid, output chan<- Result) (*cid.Set, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ColoredSet computes the set of nodes in the graph that are pinned by the pins in the given pinner
pin/gc/gc.go
Outdated
} | ||
} | ||
|
||
var ErrCouldNotFetchAllLinks = errors.New("garbage collection aborted: could not retrieve some links") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exported fields should have a doc string. Also, can we have the name ErrCannotFetchLinks
?
pin/gc/gc.go
Outdated
|
||
var ErrCouldNotFetchAllLinks = errors.New("garbage collection aborted: could not retrieve some links") | ||
|
||
var ErrCouldNotDeleteSomeBlocks = errors.New("garbage collection incomplete: could not delete some blocks") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string and can we have ErrCannotDeleteBlocks
?
pin/gc/gc.go
Outdated
|
||
var ErrCouldNotDeleteSomeBlocks = errors.New("garbage collection incomplete: could not delete some blocks") | ||
|
||
type CouldNotFetchLinksError struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string.
pin/gc/gc.go
Outdated
Err error | ||
} | ||
|
||
func (e *CouldNotFetchLinksError) Error() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string.
pin/gc/gc.go
Outdated
return fmt.Sprintf("could not retrieve links for %s: %s", e.Key, e.Err) | ||
} | ||
|
||
type CouldNotDeleteBlockError struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc string.
pin/gc/gc.go
Outdated
|
||
if errors { | ||
return nil, ErrCouldNotFetchAllLinks | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can dropthe else
stmt.
This should look like
if errors {
return nil, ErrCouldNotFetchAllLinks
}
return gcs, nil
bfaba39
to
bdd0674
Compare
core/commands/repo.go
Outdated
@@ -39,6 +40,12 @@ var RepoCmd = &cmds.Command{ | |||
}, | |||
} | |||
|
|||
// GcResult is the result returned by "repo gc" command | |||
type GcResult struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can actually remove this whole type in favor of the one in the gc package (just change the field names so the json marshaling matches up correctly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is we can't pass the error type through the API. I get:
02:40:33.946 ERROR commands/h: json: cannot unmarshal object into Go value of type error client.go:247
Error: json: cannot unmarshal object into Go value of type error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, lets figure that out later then
39b2f74
to
fbc4f90
Compare
@Kubuxu the circleCI failure here is the iptb failed to connect problem again. Didnt you have an idea on why that was happening? |
@whyrusleeping nope, couldn't figure it out. It happens much more on Travis and Circle than our CI which means it might be some race or instability. I tried to replicate it locally (1000 runs) didn't find even one failure. |
License: MIT Signed-off-by: Kevin Atkinson <k@kevina.org>
fbc4f90
to
d39d9ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Travis flakiness aside, this LGTM.
Thanks @kevina, making GC better has been needed for a while now.
@Kubuxu did you ever do this. I am very surprised you can't just remove the broken pins with the latest IPFS version. See #3796 (comment) |
No description provided.