Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offline data preparation tools #820

Merged
merged 20 commits into from
Apr 7, 2021
Merged

Offline data preparation tools #820

merged 20 commits into from
Apr 7, 2021

Conversation

jsign
Copy link
Contributor

@jsign jsign commented Apr 7, 2021

This PR adds new Powergate CLI commands to prepare data to make offline deals.

We'll soon add offline-deal support in the Powergate APIs, but these preparing tools are useful to prepare data for this stage without having to run a Lotus or go-ipfs daemon. Also, the prepared data can be used with any Filecoin client.

Might be helpful to read the docs PR that explains how to use these new features: textileio/community#281

Also, it updates the Lotus client, devnet, and docker-image to v1.6.0.

@jsign jsign added the rd-minor label Apr 7, 2021
@jsign jsign self-assigned this Apr 7, 2021
jsign added 16 commits April 7, 2021 11:10
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
jsign added 4 commits April 7, 2021 11:13
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Comment on lines +259 to +260
c.Message("Lotus offline-deal command:")
c.Message("lotus client deal --manual-piece-cid=%s --manual-piece-size=%d %s <miner> <price> <duration>", pieceCid, pieceSize, dataCid)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we add offline-deals support in Powergate, we'll add some Powergate CLI commands here.

Short: "Provides commands to prepare data for Filecoin onbarding",
Long: `Provides commands to prepare data for Filecoin onbarding`,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commands logic doesn't have many interesting things apart from io.Pipe wirings and similar things.

The pow offline prepare command does the complete pipeline compared to pow offline commp or pow offline car that allow some power user to maybe do partial steps or similar. As I mentioned in the docs, pow offline prepare performs better than simply running both commands separately since already starts calculating Piece(size/cid) at the same time it generates the CAR file.

Anyway, the important logic is extracted in a lib that I'll comment.

if err != nil {
c.Fatal(fmt.Errorf("parsing json flag: %s", err))
}
dataCid, dagService, cls, err := prepareDAGService(cmd, args, quiet)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prepareDAGService is a temporary Badger-backed ipld.DAGService that's used while DAGifying the data.

Comment on lines +225 to +226
prCommP, pwCommP := io.Pipe()
teeCAR := io.TeeReader(prCAR, pwCommP)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do here some piping magic to, at the same time the CAR file is being streamed out and being saved in the outputfile (or stdout), we have a copy of the same stream to pipe to Piece(Size/Cid) calculation.


type closeFunc func() error

func prepareDAGService(cmd *cobra.Command, args []string, quiet bool) (cid.Cid, ipld.DAGService, closeFunc, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we create the DAGService, which is the storage layer for the blocks of the DAG.
Depending on the --ipfs-api flag, we create a Badger-based one (temporary), or we rely on a remote go-ipfs node that already has the DAG.

From the POV of the commands, they simply receive a ipld.DAGService... if that's Badger or go-ifps based, that's quite irrelevant for the rest of the logic.

fmt.Fprint(jsonOutput, string(out))
}

func dagify(ctx context.Context, dagService ipld.DAGService, path string, quiet bool) (cid.Cid, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, leveraging some go-ipfs package to dagify the data, have progress bars, and that pretty stuff.

Comment on lines +14 to +38
// NOTE: Testing only the `prepare` subcommand will indirectly test
// the `car` and `commp` subcommands. This test simply prepares
// some data and compares the final piece-size and piece-cid to a
// known correct value. If anything in the process (DAGification, CARing)
// misbehaves, it will result in a different PieceCID since, at the end of
// the day, PieceCID is a fingerprint of the prepared data.
func TestOfflinePreparation(t *testing.T) {
testCases := []struct {
size int
json string
}{
{size: 10000, json: `{"piece_size":16384,"piece_cid":"baga6ea4seaqjuk4uh5g7cu5znbvrr7wvfsn2l3xj47rbymvi63uiiroya44lkiy"}`},
{size: 1000, json: `{"piece_size":2048,"piece_cid":"baga6ea4seaqadahcx4ct54tlbvgkqlhmif7kxxkvxz3yf3vr2e4puhvsxdbrgka"}`},
{size: 100, json: `{"piece_size":256,"piece_cid":"baga6ea4seaqd4hgfl6texpf377k7igx2ga2mfwn3lb4c4kdpaq3g3oao2yftuki"}`},
}

for _, test := range testCases {
test := test
t.Run(strconv.Itoa(test.size), func(t *testing.T) {
out, err := run(t, test.size)
require.NoError(t, err)
require.Equal(t, test.json, out)
})
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We basically test the pow offline prepare CLI command directly.
I've prepared some deterministic test cases with various sizes and defined the expected output.

As mentioned in the comment, since pow offline prepare do all things (dagify, CAR file, and CommP of it) it's already testing everything important that could potentially lead to a wrong result.

Comment on lines +59 to +65
stdbuf := bytes.NewBuffer(nil)
jsonOutput = stdbuf
Cmd.SetArgs([]string{"prepare", "--json", f.Name()})

if _, err := Cmd.ExecuteC(); err != nil {
return "", fmt.Errorf("executing command: %s", err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we override the jsonOutput writer of the CLI so in the test we can plug a buffer to later inspect what was the output.

@@ -29,14 +29,18 @@ var (
CmdTimeout = time.Second * 60
)

// FmtOutput allows to configure where Message(), Success(), and
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding flexibility here to let helper functions write to other places.
By default they still do the usual thing: write to stdout.

But for these commands that writes to stdout generated assets, they write to stderr other metadata output (loading bars, json output, etc). So this allows to set in the CLI command: FmtOutput = os.Stderr, and you know that all message helpers will write to stderr and not mess with stdout that is being used for other thing.

@@ -0,0 +1,103 @@
package dataprep
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this package we have the functions that do the Piece(size/cid) of data, and the DAGification.

In the CLI commands we are using these functions, we some extra piping wiring.

@jsign jsign marked this pull request as ready for review April 7, 2021 18:59
@jsign jsign requested a review from asutula April 7, 2021 19:10
github.com/ipfs/go-ds-badger2 v0.1.1-0.20200708190120-187fc06f714e
github.com/ipfs/go-graphsync v0.7.0 // indirect
github.com/ipfs/go-ipfs v0.8.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might regret of adding this dependency... mostly because of indirect dependencies that also Louts brings in.
If at some point in the future this is too much of a mess, we might need to extract the subcommand into some other binary with it's own mod.

Doesn't seems like a problem now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, the adder logic could probably be pulled out of coreunix.NewAdder, I think that's what I did for the local bucket repo stuff... but yeah, nbd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, actually go-ipfs end up using that package so can be narrowed down.

Copy link
Member

@sanderpick sanderpick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM

github.com/ipfs/go-ds-badger2 v0.1.1-0.20200708190120-187fc06f714e
github.com/ipfs/go-graphsync v0.7.0 // indirect
github.com/ipfs/go-ipfs v0.8.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, the adder logic could probably be pulled out of coreunix.NewAdder, I think that's what I did for the local bucket repo stuff... but yeah, nbd.

@jsign jsign merged commit a304bdc into master Apr 7, 2021
@jsign jsign deleted the jsign/lcl branch April 7, 2021 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants