Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extract specific path, accept stdin as streaming input #384

Merged
merged 7 commits into from
Mar 7, 2023

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Mar 6, 2023

(WIP: needs better testing, it works with the very limited test data I have at hand)

  • Adds a --path (-p) to extract just a specific unixfs path (and its children)
  • Adds - as a possible input for --file (-f) to read from stdin - the catch here is that it reads it into memory, so a large CAR is going to be a problem; but it will at least extract as the CAR blocks come in.

So cat foo.car | car extract -f - -p /my/specific/file.txt is a thing .. which will be even neater with lassie fetch <cid>/my/specific/file.txt -o -; when I make that a thing.

I think I might have also fixed the macos-12 flaky test problem, maybe it's not a github.com/rogpeppe/go-internal problem after all but related to how macos-12 handles buffered file reads.

@rvagg rvagg force-pushed the rvagg/path-extract-w-stdin-streaming branch from bb2448a to e072d34 Compare March 6, 2023 11:29
@rvagg
Copy link
Member Author

rvagg commented Mar 6, 2023

Latest commit adds - as an argument option that will extract a file to stdout. It'll error if it has to do anything more than a single file.

An alternative would be to provide --stdout as an argument rather than - since the output path is optional already and - is a valid (although very odd) output path that someone make want.

cat foo.car | car extract -f - -p /my/specific/file.txt - - now spits the contents of file.txt to stdout.

@willscott
Copy link
Member

this looks good! let me know when you want a full read through the code

Comment on lines 215 to 217
if nf, ok := err.(interface{ NotFound() bool }); ok && nf.NotFound() {
fmt.Fprintf(c.App.ErrWriter, "data for directory entry not found: %s (skipping...)\n", name)
return 0, nil
Copy link
Member Author

@rvagg rvagg Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighting this while I think about it and so it doesn't get lost in the code move—in the case where you want to extract all, rather than a specific path, instead of erroring where a block load fails, it skips whatever it's failed to load. This needs more testing, and perhaps it's not feasable, because I've got an example where this partially works but then fails with an iterator overrun error which I don't yet understand. I'm guessing it's to do with sharded directory incompleteness but I need to work that out.

@rvagg rvagg marked this pull request as ready for review March 7, 2023 10:01
@rvagg
Copy link
Member Author

rvagg commented Mar 7, 2023

Added a big 'ol test script and some fixture data; ready for review @willscott

cmd/car/car.go Outdated Show resolved Hide resolved
cmd/car/extract.go Outdated Show resolved Hide resolved
@rvagg rvagg merged commit 2d54909 into master Mar 7, 2023
@rvagg rvagg deleted the rvagg/path-extract-w-stdin-streaming branch March 7, 2023 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants