Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use tar reader for '-C' flag #1354

Merged
merged 1 commit into from
Jul 1, 2015
Merged

Don't use tar reader for '-C' flag #1354

merged 1 commit into from
Jul 1, 2015

Conversation

rht
Copy link
Contributor

@rht rht commented Jun 11, 2015

Currently ipfs get -C <hash> returns error even if is a file.
This PR is for the case when the compress flag is enabled, always
archive the output before the compression.

@jbenet jbenet added the backlog label Jun 11, 2015
@whyrusleeping
Copy link
Member

I think this LGTM.

@jbenet
Copy link
Member

jbenet commented Jun 12, 2015

@rht i'm not sure about this. what if I want a gzipped raw file?

> echo hello | gzip | gzip -d
hello

is possible. so should:

> ipfs cat <file> | gzip
<compressed-file-contents>
> ipfs cat <file> | gzip | gzip -d
<file-contents>

and thus i would expect

> ipfs get -C <file>
<compressed-file-contents>

not

> ipfs get -C -o out <file> && cat out
<tarred-and-compressed-file-contents>

@jbenet
Copy link
Member

jbenet commented Jun 12, 2015

other people's thoughts?

@rht
Copy link
Contributor Author

rht commented Jun 12, 2015

Ok my starting point is to investigate why ipfs get -C <file> always returns error.
Then I figured from the stream that:

  1. tarwriter and gzipwriter are soldered in (less buffer needed) in unixfs/tar/reader.go, turning the stream into a .tar.gz one.
  2. The .tar.gz stream is un-gzipped in https://github.com/ipfs/go-ipfs/blob/master/core/commands/get.go#L139 into a .tar stream
  3. The .tar stream is extracted in https://github.com/ipfs/go-ipfs/blob/master/core/commands/get.go#L148
  4. Unexpected EOF because the tar header expects a compressed .tar.gz size.

To bypass the problem, since it is already a .tar.gz stream, why not just use .tar.gz extension?

Alternatively the only way for > ipfs get -C <file> to not use the .tar.gz stream is to use NewDAGReader directly like in ipfs cat and pipe it to a gzip writer. Where beforehand the node has to be identified as a file. And the output should have .gz extension to avoid ambiguity (if someone ipfs add this gzipped file then it will turn into a different hash).

@jbenet
Copy link
Member

jbenet commented Jun 17, 2015

@rht ah, interesting.

To bypass the problem, since it is already a .tar.gz stream, why not just use .tar.gz extension?

that makes sense to me. if a .tar is compressed it should use .tar.gz

I'm still not sold on -C always meaning -a -C -- i may want a gzipped file. imagine:

ipfs get -o=file.gzip <huge-file>

or something.

the output should have .gz extension to avoid ambiguity

agree when -o is not specified. if the user specifies the filename directly with -o, then we should use what we're given exactly.

@rht rht changed the title Short-circuit '-C' flag to mean '-a -C' Don't use tar reader for '-C' flag Jun 21, 2015
@rht rht force-pushed the get-flag branch 3 times, most recently from 338a413 to 7508e1f Compare June 22, 2015 13:55
@rht
Copy link
Contributor Author

rht commented Jun 22, 2015

This is now fixed. '-C' uses NewDagReader directly, then piped to gzip processor.

Though I don't know which one is more effective: bufio.NewReader + WriteTo, or just io.Copy(reader, gw). I use the former.

@rht
Copy link
Contributor Author

rht commented Jun 22, 2015

I figure that in unixfs/tar/reader.go, NewDagReader -> syncCopy -> TarWriter -> GzipWriter -> buffer -> io.Reader, the syncCopy & buffer pattern are quite general (async read/write). I wonder if there is already a stdlib for this async glue between reader and writer into another reader?

So, NewDagReader -> syncCopy -> [a Writer] -> buffer -> io.Reader, can be reused where [a Writer] is DagArchiveWriter / 'TarWriter -> Bzip2Writer'

@jbenet
Copy link
Member

jbenet commented Jun 25, 2015

Could someone else CR this? i'm behind. maybe @wking @whyrusleeping @cryptix ?


// getZip is equivalent to `ipfs getdag $hash | gzip`
func getZip(ctx context.Context, node *core.IpfsNode, p string, compression int) (io.Reader, error) {
pathToResolve := path.Path(p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should call path.ParsePath so we can validate the input path

@jbenet
Copy link
Member

jbenet commented Jun 30, 2015

one comment above, otherwise LGTM.

// Validate path string
p, err := path.ParsePath(req.Arguments()[0])
if err != nil {
res.SetError(fmt.Errorf("failed to validate path: %v", err), cmds.ErrNormal)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added path validation here.

@whyrusleeping
Copy link
Member

LGTM

@jbenet
Copy link
Member

jbenet commented Jul 1, 2015

@rht circleci didn't like it. have you rebased on new master? (it's always good to do so as keeps our branches closer)

Currently `ipfs get -C <hash>` returns error even if <hash> is a file.
This PR is for the case when the compress flag is enabled, use the
dagreader directly and pipe to a gzip processor.

License: MIT
Signed-off-by: rht <rhtbot@gmail.com>
@rht
Copy link
Contributor Author

rht commented Jul 1, 2015

path.Path(pathString) is used in several other places as well. Put validations on those?

@rht
Copy link
Contributor Author

rht commented Jul 1, 2015

Now it passed circleci, but failed travis go-test because of ctx issue. The PR doesn't change any ctx related code, so should be fine.

@jbenet
Copy link
Member

jbenet commented Jul 1, 2015

Great, thanks! LGTM

jbenet added a commit that referenced this pull request Jul 1, 2015
Don't use tar reader for '-C' flag
@jbenet jbenet merged commit 22c653c into ipfs:master Jul 1, 2015
@jbenet jbenet removed the backlog label Jul 1, 2015
@rht
Copy link
Contributor Author

rht commented Jul 1, 2015

Want to add path.Path(pathString) validation everywhere? This validation stuff is in favor for making the struct private https://github.com/ipfs/go-ipfs/blob/master/path/path.go#L17

@jbenet
Copy link
Member

jbenet commented Jul 8, 2015

Want to add path.Path(pathString) validation everywhere?

Probably?

@rht
Copy link
Contributor Author

rht commented Jul 8, 2015

Already in dfde18e. All usage of path.Path(str) in the code will be validated later on.

@rht rht deleted the get-flag branch July 8, 2015 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants