-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor ipfs get #1558
Refactor ipfs get #1558
Conversation
75e353e
to
709ee29
Compare
@@ -31,12 +32,18 @@ func DagArchive(ctx cxt.Context, nd *mdag.Node, name string, dag mdag.DAGService | |||
// use a buffered writer to parallelize task | |||
bufw := bufio.NewWriterSize(pipew, DefaultBufSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary if https://github.com/ipfs/go-ipfs/blob/master/unixfs/tar/writer.go#L123 is replaced with dagr.WriteTo(w.TarW)
?
It doesn't have Flush
method, however. Should this be made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather, unixfs/io/dagreader.go
should be rewritten with bufio
to get Flush automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would the dagreader need flush?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok it is not necessary for dagreader's WriteTo
method.
2db63df
to
6de8b7c
Compare
w := &Writer{ | ||
Dag: dag, | ||
GzW: gzw, | ||
ctx: ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i debating putting the ctx here too. I opted for the function as it is only needed there-- otherwise the reader can run indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried putting here because it (and dag) doesn't change after initialization.
To compensate, there is w.Close()
at L56.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbenet Is w.Close()
redundant with pipew.Close()
?
The problem is, in no-tar mode (only gz), there is no Close method.
There could be problem with ctx.
I appreciate the intent to simplify, and always happy to see code removed, i think some changes here though make the codebase more convoluted (particularly the |
return | ||
} | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( i moved all this stuff out of this function and into its own because it was very confusing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplified
also please tend towards decomposition. larger functions with lots of symbols floating in the same scope are much more error prone as people come in and modify things. smaller/clearer functions are easier to understand and survive better modifications by many people. this PR is putting some stuff back into large functions, after i decomposed it out recently |
2eae919
to
bb69b77
Compare
I will revert the DagFS back to DagArchive | tar.extract. |
e161655
to
89ece0e
Compare
45f99bd
to
db7e0dc
Compare
RFCR |
|
||
w.HandleFile = func(dagr *uio.DagReader, fpath string) error { | ||
//unixSize := pb.GetFilesize() | ||
unixSize := dagr.Size() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: this is expensive?
8ed16d5
to
178b568
Compare
if err := w.NodeWalk(nd, ""); err != nil { | ||
return 0, err | ||
} | ||
return size, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is going to write out the whole thing (including pulling out all the objects) just to get the size. is this worth it?
seems like it can also be achieved outside (i.e. not with internal manipulation of the writer) with something like this:
func GetTarSize(...) (uint64, error) {
cw := CountingWriter(io.Discard) // and discard all the written output.
w := tar.Writer(..., cw)
w.WriteDag(root) // do all the writes
size := cw.Count()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the size needs to be computed before the actual write w.WriteDag(root)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this just for the progress bar? i dont think forcing the sender to look through the entire dag before sending a size is a good idea. it might be huge and take many minutes to compute just the size.
if there's no other way, perhaps the progress bar can estimate and make it clear it's an estimate of the size. as long as it doesn't look borked (i.e. negative numbers, or bigger/smaller
) and we say "approx X MB" or "est X MB" or even "~X MB" it can be ok.
(relatedly, @krl had suggested calculating a lot of information about dags locally and keeping it around, things like depth, # of nodes, fanout, etc. but i think that's too far off for this. approx result may be better).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rht did we run into trouble just using the 'size' of the dag (as opposed to the true file size) ? I know its not as accurate as we want, but its free information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whyrusleeping for a dag with 2000 files, This will be ~ 1MB of inaccuracy. And will be further far off with dag deduplications.
Alternative is to send dag archive instead of tar. This way the sender doesn't have to undo the deduplication because it has to create the tar stream. And accurate size.
Then the transported dag archive is converted to tar locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, use option 1 for now, then switch to option 5 later?
What about option 3, how hard is it to add metadata?
As of today, everything in ipfs begins with ~ipfs add
, so the unixfs size is already known at that point. This is a matter of reusing the cable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 -> 5
sounds good to me!
I would like to not depend more unixfs. metadata is not trivial. Also the archive / transfer format changing would obviate the work on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would mean the transport format is not swap-able?
How easy is it to change such that e.g. both the client transport (currently tar) and repo format (flatfs) are gz files (with no deduplications) or git packfiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no what i mean is that calculating "tar-ed size" for every single possible root will be obviated when we do use the dag archive as a transport
easy to change transport. repo format not so easy (needs migrations). (but repo is separate from dag archive)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For metadata of the tar-ed (and even tar.gz) unixfs size, what about #612? And what about using ipfs-ld?
By swapping the transport/repo format, I meant if the other parts of the libs are to be repurposed (e.g. git on ipfs repo or ipfs on git packfiles).
ok, even if it is easy to change/upgrade client transport, the entire network must use only 1 format at a time (the same format for both ipfs add
and ipfs get
).
c929c6d
to
32eed98
Compare
The refactor has been made much smaller that the PR is safe now. |
return | ||
if !archive && compression != gzip.NoCompression { | ||
// the case when the node is a file | ||
dagr, err := uio.NewDagReader(w.ctx, nd, w.Dag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we actually know it's a file here? and not a directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uio.NewDagReader
will check and return err if it's not a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I think the naming could have been clarified, but there is also uio.NewDataFileReader
which doesn't check )
@rht RFM? (if so, just add the label "ready to merge" and assign it to me. |
@rht: also, maybe can start pushing branches + PRing from within this repo? that way we can collab directly on branches? |
(let's try this out as a way to do "hand off" and proper signaling -- cc @whyrusleeping) |
me gusta. |
License: MIT Signed-off-by: rht <rhtbot@gmail.com>
License: MIT Signed-off-by: rht <rhtbot@gmail.com>
No description provided.