-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offline data preparation tools #820
Conversation
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
c.Message("Lotus offline-deal command:") | ||
c.Message("lotus client deal --manual-piece-cid=%s --manual-piece-size=%d %s <miner> <price> <duration>", pieceCid, pieceSize, dataCid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we add offline-deals support in Powergate, we'll add some Powergate CLI commands here.
Short: "Provides commands to prepare data for Filecoin onbarding", | ||
Long: `Provides commands to prepare data for Filecoin onbarding`, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commands logic doesn't have many interesting things apart from io.Pipe
wirings and similar things.
The pow offline prepare
command does the complete pipeline compared to pow offline commp
or pow offline car
that allow some power user to maybe do partial steps or similar. As I mentioned in the docs, pow offline prepare
performs better than simply running both commands separately since already starts calculating Piece(size/cid) at the same time it generates the CAR file.
Anyway, the important logic is extracted in a lib that I'll comment.
if err != nil { | ||
c.Fatal(fmt.Errorf("parsing json flag: %s", err)) | ||
} | ||
dataCid, dagService, cls, err := prepareDAGService(cmd, args, quiet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This prepareDAGService
is a temporary Badger-backed ipld.DAGService
that's used while DAGifying the data.
prCommP, pwCommP := io.Pipe() | ||
teeCAR := io.TeeReader(prCAR, pwCommP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do here some piping magic to, at the same time the CAR file is being streamed out and being saved in the outputfile (or stdout), we have a copy of the same stream to pipe to Piece(Size/Cid) calculation.
|
||
type closeFunc func() error | ||
|
||
func prepareDAGService(cmd *cobra.Command, args []string, quiet bool) (cid.Cid, ipld.DAGService, closeFunc, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here we create the DAGService, which is the storage layer for the blocks of the DAG.
Depending on the --ipfs-api
flag, we create a Badger-based one (temporary), or we rely on a remote go-ipfs
node that already has the DAG.
From the POV of the commands, they simply receive a ipld.DAGService
... if that's Badger or go-ifps based, that's quite irrelevant for the rest of the logic.
fmt.Fprint(jsonOutput, string(out)) | ||
} | ||
|
||
func dagify(ctx context.Context, dagService ipld.DAGService, path string, quiet bool) (cid.Cid, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, leveraging some go-ipfs
package to dagify the data, have progress bars, and that pretty stuff.
// NOTE: Testing only the `prepare` subcommand will indirectly test | ||
// the `car` and `commp` subcommands. This test simply prepares | ||
// some data and compares the final piece-size and piece-cid to a | ||
// known correct value. If anything in the process (DAGification, CARing) | ||
// misbehaves, it will result in a different PieceCID since, at the end of | ||
// the day, PieceCID is a fingerprint of the prepared data. | ||
func TestOfflinePreparation(t *testing.T) { | ||
testCases := []struct { | ||
size int | ||
json string | ||
}{ | ||
{size: 10000, json: `{"piece_size":16384,"piece_cid":"baga6ea4seaqjuk4uh5g7cu5znbvrr7wvfsn2l3xj47rbymvi63uiiroya44lkiy"}`}, | ||
{size: 1000, json: `{"piece_size":2048,"piece_cid":"baga6ea4seaqadahcx4ct54tlbvgkqlhmif7kxxkvxz3yf3vr2e4puhvsxdbrgka"}`}, | ||
{size: 100, json: `{"piece_size":256,"piece_cid":"baga6ea4seaqd4hgfl6texpf377k7igx2ga2mfwn3lb4c4kdpaq3g3oao2yftuki"}`}, | ||
} | ||
|
||
for _, test := range testCases { | ||
test := test | ||
t.Run(strconv.Itoa(test.size), func(t *testing.T) { | ||
out, err := run(t, test.size) | ||
require.NoError(t, err) | ||
require.Equal(t, test.json, out) | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We basically test the pow offline prepare
CLI command directly.
I've prepared some deterministic test cases with various sizes and defined the expected output.
As mentioned in the comment, since pow offline prepare
do all things (dagify, CAR file, and CommP of it) it's already testing everything important that could potentially lead to a wrong result.
stdbuf := bytes.NewBuffer(nil) | ||
jsonOutput = stdbuf | ||
Cmd.SetArgs([]string{"prepare", "--json", f.Name()}) | ||
|
||
if _, err := Cmd.ExecuteC(); err != nil { | ||
return "", fmt.Errorf("executing command: %s", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we override the jsonOutput
writer of the CLI so in the test we can plug a buffer to later inspect what was the output.
@@ -29,14 +29,18 @@ var ( | |||
CmdTimeout = time.Second * 60 | |||
) | |||
|
|||
// FmtOutput allows to configure where Message(), Success(), and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding flexibility here to let helper functions write to other places.
By default they still do the usual thing: write to stdout.
But for these commands that writes to stdout generated assets, they write to stderr other metadata output (loading bars, json output, etc). So this allows to set in the CLI command: FmtOutput = os.Stderr
, and you know that all message helpers will write to stderr and not mess with stdout that is being used for other thing.
@@ -0,0 +1,103 @@ | |||
package dataprep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this package we have the functions that do the Piece(size/cid) of data, and the DAGification.
In the CLI commands we are using these functions, we some extra piping wiring.
github.com/ipfs/go-ds-badger2 v0.1.1-0.20200708190120-187fc06f714e | ||
github.com/ipfs/go-graphsync v0.7.0 // indirect | ||
github.com/ipfs/go-ipfs v0.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might regret of adding this dependency... mostly because of indirect dependencies that also Louts brings in.
If at some point in the future this is too much of a mess, we might need to extract the subcommand into some other binary with it's own mod.
Doesn't seems like a problem now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, the adder logic could probably be pulled out of coreunix.NewAdder
, I think that's what I did for the local bucket repo stuff... but yeah, nbd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, actually go-ipfs end up using that package so can be narrowed down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM
github.com/ipfs/go-ds-badger2 v0.1.1-0.20200708190120-187fc06f714e | ||
github.com/ipfs/go-graphsync v0.7.0 // indirect | ||
github.com/ipfs/go-ipfs v0.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, the adder logic could probably be pulled out of coreunix.NewAdder
, I think that's what I did for the local bucket repo stuff... but yeah, nbd.
This PR adds new Powergate CLI commands to prepare data to make offline deals.
We'll soon add offline-deal support in the Powergate APIs, but these preparing tools are useful to prepare data for this stage without having to run a Lotus or go-ipfs daemon. Also, the prepared data can be used with any Filecoin client.
Might be helpful to read the docs PR that explains how to use these new features: textileio/community#281
Also, it updates the Lotus client, devnet, and docker-image to v1.6.0.