Refactor ipfs get #1558

rht · 2015-08-10T02:06:58Z

No description provided.

rht · 2015-08-10T04:12:09Z

unixfs/tar/writer.go

@@ -31,12 +32,18 @@ func DagArchive(ctx cxt.Context, nd *mdag.Node, name string, dag mdag.DAGService
 	// use a buffered writer to parallelize task
 	bufw := bufio.NewWriterSize(pipew, DefaultBufSize)


This is not necessary if https://github.com/ipfs/go-ipfs/blob/master/unixfs/tar/writer.go#L123 is replaced with dagr.WriteTo(w.TarW)?

It doesn't have Flush method, however. Should this be made?

Rather, unixfs/io/dagreader.go should be rewritten with bufio to get Flush automatically.

why would the dagreader need flush?

ok it is not necessary for dagreader's WriteTo method.

jbenet · 2015-08-11T01:49:21Z

unixfs/tar/writer.go

+	w := &Writer{
+		Dag: dag,
+		GzW: gzw,
+		ctx: ctx,


yeah i debating putting the ctx here too. I opted for the function as it is only needed there-- otherwise the reader can run indefinitely.

I tried putting here because it (and dag) doesn't change after initialization.
To compensate, there is w.Close() at L56.

@jbenet Is w.Close() redundant with pipew.Close()?

The problem is, in no-tar mode (only gz), there is no Close method.
There could be problem with ctx.

jbenet · 2015-08-11T01:57:48Z

I appreciate the intent to simplify, and always happy to see code removed, i think some changes here though make the codebase more convoluted (particularly the DagWriter changes). can they be done as part of a separate thing? it is trivial and more flexible to stack io.Readers and io.Writers (and often they're zero copy)

jbenet · 2015-08-11T01:59:34Z

core/commands/get.go

+					return
+				}
+				return
+			}


:( i moved all this stuff out of this function and into its own because it was very confusing here.

jbenet · 2015-08-11T02:01:32Z

also please tend towards decomposition. larger functions with lots of symbols floating in the same scope are much more error prone as people come in and modify things. smaller/clearer functions are easier to understand and survive better modifications by many people. this PR is putting some stuff back into large functions, after i decomposed it out recently

rht · 2015-08-11T07:48:35Z

I will revert the DagFS back to DagArchive | tar.extract.
The reason is, even if a merkledag node walker exist, it will force ipfs-shell to depend on go-ipfs.
i.e. tar is necessary for transport archive format that is available everywhere.

rht · 2015-08-11T10:40:06Z

RFCR

rht · 2015-08-11T11:59:45Z

unixfs/tar/writer.go

+
+	w.HandleFile = func(dagr *uio.DagReader, fpath string) error {
+		//unixSize := pb.GetFilesize()
+		unixSize := dagr.Size()


TODO: this is expensive?

jbenet · 2015-08-11T12:31:49Z

unixfs/tar/writer.go

+	if err := w.NodeWalk(nd, ""); err != nil {
+		return 0, err
+	}
+	return size, nil


this is going to write out the whole thing (including pulling out all the objects) just to get the size. is this worth it?

seems like it can also be achieved outside (i.e. not with internal manipulation of the writer) with something like this:

func GetTarSize(...) (uint64, error) { cw := CountingWriter(io.Discard) // and discard all the written output. w := tar.Writer(..., cw) w.WriteDag(root) // do all the writes size := cw.Count() }

But the size needs to be computed before the actual write w.WriteDag(root).

is this just for the progress bar? i dont think forcing the sender to look through the entire dag before sending a size is a good idea. it might be huge and take many minutes to compute just the size.

if there's no other way, perhaps the progress bar can estimate and make it clear it's an estimate of the size. as long as it doesn't look borked (i.e. negative numbers, or bigger/smaller) and we say "approx X MB" or "est X MB" or even "~X MB" it can be ok.

(relatedly, @krl had suggested calculating a lot of information about dags locally and keeping it around, things like depth, # of nodes, fanout, etc. but i think that's too far off for this. approx result may be better).

@rht did we run into trouble just using the 'size' of the dag (as opposed to the true file size) ? I know its not as accurate as we want, but its free information.

@whyrusleeping for a dag with 2000 files, This will be ~ 1MB of inaccuracy. And will be further far off with dag deduplications.

Alternative is to send dag archive instead of tar. This way the sender doesn't have to undo the deduplication because it has to create the tar stream. And accurate size.
Then the transported dag archive is converted to tar locally.

Basically, use option 1 for now, then switch to option 5 later?

What about option 3, how hard is it to add metadata?
As of today, everything in ipfs begins with ~ipfs add, so the unixfs size is already known at that point. This is a matter of reusing the cable.

1 -> 5 sounds good to me!

I would like to not depend more unixfs. metadata is not trivial. Also the archive / transfer format changing would obviate the work on that.

This would mean the transport format is not swap-able?

How easy is it to change such that e.g. both the client transport (currently tar) and repo format (flatfs) are gz files (with no deduplications) or git packfiles?

no what i mean is that calculating "tar-ed size" for every single possible root will be obviated when we do use the dag archive as a transport

easy to change transport. repo format not so easy (needs migrations). (but repo is separate from dag archive)

For metadata of the tar-ed (and even tar.gz) unixfs size, what about #612? And what about using ipfs-ld?

By swapping the transport/repo format, I meant if the other parts of the libs are to be repurposed (e.g. git on ipfs repo or ipfs on git packfiles).
ok, even if it is easy to change/upgrade client transport, the entire network must use only 1 format at a time (the same format for both ipfs add and ipfs get).

rht · 2015-08-13T08:15:02Z

The refactor has been made much smaller that the PR is safe now.

jbenet · 2015-08-14T12:16:46Z

unixfs/tar/writer.go

-			return
+		if !archive && compression != gzip.NoCompression {
+			// the case when the node is a file
+			dagr, err := uio.NewDagReader(w.ctx, nd, w.Dag)


do we actually know it's a file here? and not a directory?

uio.NewDagReader will check and return err if it's not a file.

(I think the naming could have been clarified, but there is also uio.NewDataFileReader which doesn't check )

jbenet · 2015-08-15T04:38:21Z

@rht RFM? (if so, just add the label "ready to merge" and assign it to me.

jbenet · 2015-08-15T04:39:34Z

@rht: also, maybe can start pushing branches + PRing from within this repo? that way we can collab directly on branches?

jbenet · 2015-08-15T04:40:26Z

(if so, just add the label "ready to merge" and assign it to me.

(let's try this out as a way to do "hand off" and proper signaling -- cc @whyrusleeping)

whyrusleeping · 2015-08-15T18:02:52Z

(let's try this out as a way to do "hand off" and proper signaling -- cc @whyrusleeping)

me gusta.

License: MIT Signed-off-by: rht <rhtbot@gmail.com>

Refactor ipfs get

jbenet added the status/in-progress In progress label Aug 10, 2015

rht mentioned this pull request Aug 10, 2015

Fix progress bar for "get" command #1279

Closed

rht force-pushed the cleanup-get branch 4 times, most recently from 75e353e to 709ee29 Compare August 10, 2015 03:10

rht reviewed Aug 10, 2015
View reviewed changes

rht force-pushed the cleanup-get branch 3 times, most recently from 2db63df to 6de8b7c Compare August 10, 2015 06:33

jbenet reviewed Aug 11, 2015
View reviewed changes

rht force-pushed the cleanup-get branch 3 times, most recently from 2eae919 to bb69b77 Compare August 11, 2015 07:31

rht force-pushed the cleanup-get branch 2 times, most recently from e161655 to 89ece0e Compare August 11, 2015 08:03

rht changed the title ~~Remove thirdparty/tar/extractor~~ Refactor ipfs get Aug 11, 2015

rht force-pushed the cleanup-get branch 5 times, most recently from 45f99bd to db7e0dc Compare August 11, 2015 10:29

rht force-pushed the cleanup-get branch from 7d518f9 to 3f168b6 Compare August 11, 2015 11:28

rht reviewed Aug 11, 2015
View reviewed changes

rht force-pushed the cleanup-get branch 2 times, most recently from 8ed16d5 to 178b568 Compare August 11, 2015 12:16

jbenet reviewed Aug 11, 2015
View reviewed changes

rht force-pushed the cleanup-get branch 6 times, most recently from c929c6d to 32eed98 Compare August 13, 2015 08:07

jbenet reviewed Aug 14, 2015
View reviewed changes

rht force-pushed the cleanup-get branch from 32eed98 to f08e291 Compare August 14, 2015 15:41

rht added 2 commits August 20, 2015 14:56

Refactor ipfs get

dfa0351

License: MIT Signed-off-by: rht <rhtbot@gmail.com>

Decompose DagArchive from unixfs tar

9f0c813

License: MIT Signed-off-by: rht <rhtbot@gmail.com>

rht force-pushed the cleanup-get branch from 960957e to 9f0c813 Compare August 20, 2015 07:56

rht added the ready to merge label Aug 20, 2015

jbenet mentioned this pull request Aug 22, 2015

Sprint Aug 17 ipfs/team-mgmt#28

Closed

16 tasks

jbenet added a commit that referenced this pull request Aug 23, 2015

Merge pull request #1558 from rht/cleanup-get

11a66b3

Refactor ipfs get

jbenet merged commit 11a66b3 into ipfs:master Aug 23, 2015

jbenet removed the status/in-progress In progress label Aug 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor ipfs get #1558

Refactor ipfs get #1558

rht commented Aug 10, 2015

rht Aug 10, 2015

rht Aug 10, 2015

whyrusleeping Aug 10, 2015

rht Aug 10, 2015

jbenet Aug 11, 2015

rht Aug 11, 2015

rht Aug 11, 2015

jbenet commented Aug 11, 2015

jbenet Aug 11, 2015

rht Aug 11, 2015

jbenet commented Aug 11, 2015

rht commented Aug 11, 2015

rht commented Aug 11, 2015

rht Aug 11, 2015

jbenet Aug 11, 2015

rht Aug 11, 2015

jbenet Aug 11, 2015

whyrusleeping Aug 11, 2015

rht Aug 12, 2015

rht Aug 13, 2015

jbenet Aug 13, 2015

rht Aug 13, 2015

jbenet Aug 13, 2015

rht Aug 13, 2015

rht commented Aug 13, 2015

jbenet Aug 14, 2015

rht Aug 14, 2015

rht Aug 14, 2015

jbenet commented Aug 15, 2015

jbenet commented Aug 15, 2015

jbenet commented Aug 15, 2015

whyrusleeping commented Aug 15, 2015

		@@ -31,12 +32,18 @@ func DagArchive(ctx cxt.Context, nd *mdag.Node, name string, dag mdag.DAGService
		// use a buffered writer to parallelize task
		bufw := bufio.NewWriterSize(pipew, DefaultBufSize)

Refactor ipfs get #1558

Refactor ipfs get #1558

Conversation

rht commented Aug 10, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbenet commented Aug 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbenet commented Aug 11, 2015

rht commented Aug 11, 2015

rht commented Aug 11, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rht commented Aug 13, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbenet commented Aug 15, 2015

jbenet commented Aug 15, 2015

jbenet commented Aug 15, 2015

whyrusleeping commented Aug 15, 2015