Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ipfs get #1558

Merged
merged 2 commits into from
Aug 23, 2015
Merged

Refactor ipfs get #1558

merged 2 commits into from
Aug 23, 2015

Conversation

rht
Copy link
Contributor

@rht rht commented Aug 10, 2015

No description provided.

@jbenet jbenet added the status/in-progress In progress label Aug 10, 2015
@rht rht force-pushed the cleanup-get branch 4 times, most recently from 75e353e to 709ee29 Compare August 10, 2015 03:10
@@ -31,12 +32,18 @@ func DagArchive(ctx cxt.Context, nd *mdag.Node, name string, dag mdag.DAGService
// use a buffered writer to parallelize task
bufw := bufio.NewWriterSize(pipew, DefaultBufSize)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary if https://github.com/ipfs/go-ipfs/blob/master/unixfs/tar/writer.go#L123 is replaced with dagr.WriteTo(w.TarW)?

It doesn't have Flush method, however. Should this be made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather, unixfs/io/dagreader.go should be rewritten with bufio to get Flush automatically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would the dagreader need flush?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok it is not necessary for dagreader's WriteTo method.

@rht rht force-pushed the cleanup-get branch 3 times, most recently from 2db63df to 6de8b7c Compare August 10, 2015 06:33
w := &Writer{
Dag: dag,
GzW: gzw,
ctx: ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i debating putting the ctx here too. I opted for the function as it is only needed there-- otherwise the reader can run indefinitely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried putting here because it (and dag) doesn't change after initialization.
To compensate, there is w.Close() at L56.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbenet Is w.Close() redundant with pipew.Close()?

The problem is, in no-tar mode (only gz), there is no Close method.
There could be problem with ctx.

@jbenet
Copy link
Member

jbenet commented Aug 11, 2015

I appreciate the intent to simplify, and always happy to see code removed, i think some changes here though make the codebase more convoluted (particularly the DagWriter changes). can they be done as part of a separate thing? it is trivial and more flexible to stack io.Readers and io.Writers (and often they're zero copy)

return
}
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( i moved all this stuff out of this function and into its own because it was very confusing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified

@jbenet
Copy link
Member

jbenet commented Aug 11, 2015

also please tend towards decomposition. larger functions with lots of symbols floating in the same scope are much more error prone as people come in and modify things. smaller/clearer functions are easier to understand and survive better modifications by many people. this PR is putting some stuff back into large functions, after i decomposed it out recently

@rht rht force-pushed the cleanup-get branch 3 times, most recently from 2eae919 to bb69b77 Compare August 11, 2015 07:31
@rht
Copy link
Contributor Author

rht commented Aug 11, 2015

I will revert the DagFS back to DagArchive | tar.extract.
The reason is, even if a merkledag node walker exist, it will force ipfs-shell to depend on go-ipfs.
i.e. tar is necessary for transport archive format that is available everywhere.

@rht rht force-pushed the cleanup-get branch 2 times, most recently from e161655 to 89ece0e Compare August 11, 2015 08:03
@rht rht changed the title Remove thirdparty/tar/extractor Refactor ipfs get Aug 11, 2015
@rht rht force-pushed the cleanup-get branch 5 times, most recently from 45f99bd to db7e0dc Compare August 11, 2015 10:29
@rht
Copy link
Contributor Author

rht commented Aug 11, 2015

RFCR


w.HandleFile = func(dagr *uio.DagReader, fpath string) error {
//unixSize := pb.GetFilesize()
unixSize := dagr.Size()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: this is expensive?

@rht rht force-pushed the cleanup-get branch 2 times, most recently from 8ed16d5 to 178b568 Compare August 11, 2015 12:16
if err := w.NodeWalk(nd, ""); err != nil {
return 0, err
}
return size, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is going to write out the whole thing (including pulling out all the objects) just to get the size. is this worth it?

seems like it can also be achieved outside (i.e. not with internal manipulation of the writer) with something like this:

func GetTarSize(...) (uint64, error) {
  cw := CountingWriter(io.Discard) // and discard all the written output.
  w := tar.Writer(..., cw)
  w.WriteDag(root) // do all the writes
  size := cw.Count()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the size needs to be computed before the actual write w.WriteDag(root).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just for the progress bar? i dont think forcing the sender to look through the entire dag before sending a size is a good idea. it might be huge and take many minutes to compute just the size.

if there's no other way, perhaps the progress bar can estimate and make it clear it's an estimate of the size. as long as it doesn't look borked (i.e. negative numbers, or bigger/smaller) and we say "approx X MB" or "est X MB" or even "~X MB" it can be ok.

(relatedly, @krl had suggested calculating a lot of information about dags locally and keeping it around, things like depth, # of nodes, fanout, etc. but i think that's too far off for this. approx result may be better).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rht did we run into trouble just using the 'size' of the dag (as opposed to the true file size) ? I know its not as accurate as we want, but its free information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping for a dag with 2000 files, This will be ~ 1MB of inaccuracy. And will be further far off with dag deduplications.

Alternative is to send dag archive instead of tar. This way the sender doesn't have to undo the deduplication because it has to create the tar stream. And accurate size.
Then the transported dag archive is converted to tar locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, use option 1 for now, then switch to option 5 later?

What about option 3, how hard is it to add metadata?
As of today, everything in ipfs begins with ~ipfs add, so the unixfs size is already known at that point. This is a matter of reusing the cable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 -> 5 sounds good to me!

I would like to not depend more unixfs. metadata is not trivial. Also the archive / transfer format changing would obviate the work on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean the transport format is not swap-able?

How easy is it to change such that e.g. both the client transport (currently tar) and repo format (flatfs) are gz files (with no deduplications) or git packfiles?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no what i mean is that calculating "tar-ed size" for every single possible root will be obviated when we do use the dag archive as a transport

easy to change transport. repo format not so easy (needs migrations). (but repo is separate from dag archive)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For metadata of the tar-ed (and even tar.gz) unixfs size, what about #612? And what about using ipfs-ld?

By swapping the transport/repo format, I meant if the other parts of the libs are to be repurposed (e.g. git on ipfs repo or ipfs on git packfiles).
ok, even if it is easy to change/upgrade client transport, the entire network must use only 1 format at a time (the same format for both ipfs add and ipfs get).

@rht rht force-pushed the cleanup-get branch 6 times, most recently from c929c6d to 32eed98 Compare August 13, 2015 08:07
@rht
Copy link
Contributor Author

rht commented Aug 13, 2015

The refactor has been made much smaller that the PR is safe now.

return
if !archive && compression != gzip.NoCompression {
// the case when the node is a file
dagr, err := uio.NewDagReader(w.ctx, nd, w.Dag)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually know it's a file here? and not a directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uio.NewDagReader will check and return err if it's not a file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think the naming could have been clarified, but there is also uio.NewDataFileReader which doesn't check )

@jbenet
Copy link
Member

jbenet commented Aug 15, 2015

@rht RFM? (if so, just add the label "ready to merge" and assign it to me.

@jbenet
Copy link
Member

jbenet commented Aug 15, 2015

@rht: also, maybe can start pushing branches + PRing from within this repo? that way we can collab directly on branches?

@jbenet
Copy link
Member

jbenet commented Aug 15, 2015

(if so, just add the label "ready to merge" and assign it to me.

(let's try this out as a way to do "hand off" and proper signaling -- cc @whyrusleeping)

@whyrusleeping
Copy link
Member

(let's try this out as a way to do "hand off" and proper signaling -- cc @whyrusleeping)

me gusta.

rht added 2 commits August 20, 2015 14:56
License: MIT
Signed-off-by: rht <rhtbot@gmail.com>
License: MIT
Signed-off-by: rht <rhtbot@gmail.com>
@jbenet jbenet mentioned this pull request Aug 22, 2015
16 tasks
jbenet added a commit that referenced this pull request Aug 23, 2015
@jbenet jbenet merged commit 11a66b3 into ipfs:master Aug 23, 2015
@jbenet jbenet removed the status/in-progress In progress label Aug 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants