feat: dag import --stats (#8237) #8237

rvagg · 2021-07-01T12:23:19Z

I'm just taking a stab at this and am unsure if I'm following the right pattern here so feedback would be appreciated.

Currently dag import only reports pinned CIDs and an error on pinning. If you pin-roots=false then you get nothing. I'm working on the JS version of this, including the HTTP client, and I'd really like to get a bit more information out of this process. The minimum I think is simply a report of how many blocks were imported (although in the current form it's not counting unique blocks, just total processed blocks).

So we end up with this slightly awkward event: {BlockCount:uint64, Root:*RootMeta} - where the first one is expected to be Root==nil and BlockCount telling you how many blocks, and any remaining should have Root!=nil and reporting pin status if you asked for pinning.

Is there a better pattern to follow than this? Can I make two separate event types? Can I clean up the struct a bit somehow or maybe use two separate structs rather than overloading one?

core/commands/dag/dag.go

core/commands/dag/import.go

aschmahmann · 2021-07-09T16:05:49Z

@rvagg is the idea here basically to implement --progress like we have in ipfs pin add and ipfs add? If so we should also hide this behind a flag such that the output is unchanged by default

ribasushi · 2021-07-09T16:07:27Z

is the idea here basically to implement --progress

No. This implements a single "end-summary" output as the very very last printout, no intermediate progress updates.

lidel

Old response looks like this:

> ipfs dag import foo.car
Pinned root	QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR	success

behind the scenes, JSON (HTTP API at /api/vo/dag/import) looks like this:

> ipfs dag import foo.car --enc=json | jq
{
  "Root": {
    "Cid": {
      "/": "QmbWqxBEKC3P8tqsKc98xmWNzrzDtRLMiMPL8wBuTGsMnR"
    },
    "PinErrorMsg": ""
  }
}

While it is pretty safe to add new fields to JSON, changing the very first line printed as the text output could break people's scripts 🙈 👀

TODO before merge

We discussed this during triage today and it looks good, but we need to tweak it a bit:

we want Stats struct with Stats.BlockCount next to the Root so we can add other metrics in the future, eg:
- payloadBytesCount suggested in feat: dag import --stats (#8237) #8237 (comment)
- time import took
print Stats only when --stats is passed (making this a backward-compatible opt-in feature)

BigLep · 2021-08-20T15:26:25Z

@rvagg : are you able to do the open TODOs?

rvagg · 2021-08-23T04:59:56Z

@BigLep I'll put it on my personal backlog. Having it behind a flag makes me less enthusiastic, though I understand the reasoning.

BigLep · 2021-09-03T15:10:35Z

@gammazero : given your tribute next week, can you please incorporate @lidel's comments and get this over the line?

This applies to both text and json output encoings. - Stats data is now contained within a Stats datastructure - Stats are printed after root so that first line of output is the same as previously, even when stats are output using --stats

gammazero · 2021-09-07T23:26:17Z

Done:

Only print stats when --stats flag is passed. This applies to both text and json output encodings.
Stats data is now contained within a Stats data structure
Stats are printed after root so that the first line of output is the same as previously, even when stats are output
Updated sharness test to use new stats output and --stats flag

rvagg · 2021-09-08T01:00:28Z

would Approve if this wasn't my PR, but 👌 this is sweet, thanks so much @gammazero

gammazero · 2021-09-08T07:00:20Z

I added PayloadBytesCount to stats. Please check that nd.Size(), done here, is the correct way to get the payload size. Was not sure if something from nd.Stats() was needed instead.

I did not add an elapsed time stat because I do not know what is the desired format for that duration value (seconds, milliseconds, microseconds, time.Duration string, something else). If the format is known, it will be trivial to add that stat, but the sharness tests will be more complicated because they will have to ignore the changing elapsed time value. Unless elapsed time is really useful in the stats, I would suggest omitting it and letting that be determined by external means.

lidel

Sounds good, perfectly fine to keep stats limited to info around imported blocks.
See comment below.

lidel · 2021-09-09T15:25:05Z

core/commands/dag/dag.go

@@ -53,9 +54,15 @@ type ResolveOutput struct {
 	RemPath string
 }

+type CarImportStats struct {
+	BlockCount        uint64
+	PayloadBytesCount uint64


Payload can be confusing here: is this raw blocks (CAR-metadata) or actual data in each block (CAR-metadata-dagmetadata)?

Perhaps renaming it to BlockBytesCount and setting this to sum of nd.Stat().BlockSize() is a way to remove confusion while keeping this useful no matter what codecs are used inside of blocks?

Renamed PayloadBytesCount to BlockBytesCount

It happens that nd.Size() and nd.Stat().BlockSize return the same value, so I think it is better to use nd.Size(), given the comment here.

Hm.. if they are the same, it makes sense, but something feels off when I compare Size imported from CAR with value reported by ipfs dag stat:

$ ipfs dag stat bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva Size: 27676801, NumBlocks: 383 $ ipfs dag export bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva > test.car 0s 26.41 MiB / ? [--------------------------------------------------------------------------------=-----------------------] 390.25 MiB/s 0s $ ipfs dag import --stats test.car Pinned root bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva success Imported 383 blocks (125832269 bytes)

125832269 bytes is ~125 MB which is way more than 26MB

@lidel What I did previously (using nd.Size()) worked for the car files in sharness/t0054-dag-car-import-export-data/ but does not work for your example. Using nd.Stat().DataSize + nd.Stat().LinksSize works for your example, but not for the test cars/dags. The test cars/dags have stats with all zeros, for almost all blocks.

It appears that nd.Size() returns nd.Stat().CumulativeSize if a block has stat values. Othersize, nd.Size() is set to len(nd.RawData()). This makes both nd.Size() and nd.Stats() completely unreliable across different dags.

Apparently, the way to get a reliable size is to always use len(nd.RawData()). So, that is what the latest change does.

DataSize and LinksSize sounds like a dag-pb thing, which probably won't be represented in the fixture files.

What stat are we actually after here, @ribasushi what's your expectation of what this sizing is going to report? I would think that it's the size of the output, which includes CAR header, CID lengths and even varint section size prefixes. But I could find that by measuring the size of the output myself, so the utility doesn't seem great. But what is the utility of reporting just the block sizes? What is the useful for?

I would expect the ipfs dag stat number. It is useful in terms of "this is the amount of IPLD-data these blocks hold"

thanks @ribasushi, len(nd.RawData()) is probably the one then, but in this case maybe just use len(block.RawData()) then you get to use the block as it comes out of the CAR rather than whatever happens to it through a Decode cycle (probably the same, but seems safer to use the one closer to the original)

lidel

Remaining thing: #8237 (comment)

rvagg · 2021-09-13T05:03:50Z

I hope you don't mind @gammazero but pushed the len(block.RawSize()) change. With or without that change the import size is good for @lidel's example data so I think we're good to go on this PR.

Pinned root     bafybeihcyruaeza7uyjd6ugicbcrqumejf6uf353e5etdkhotqffwtguva     success
Imported 383 blocks (27676801 bytes)

basic regression tests for the default output (text and json)

lidel

LGTM.

We tested everything with --stats so I've added basic regression tests for dag import without --stats (default output).
This should be ready for merge.

* feat: report block count on `dag import` * fix: clean-up dag import message format * Only print stats when --stats flag is passed This applies to both text and json output encoding. - Stats data is now contained within a Stats datastructure - Stats are printed after root so that first line of output is the same as previously, even when stats are output using --stats * fix sharness test * Add PayloadBytesCount to stats * Attempt to stabilize flaky tests * Rename PayloadBytesCount to BlockBytesCount * Correctly calculate size or imported dag * Use RawSize of original block for import bytes calc * test: dag import without --stats basic regression tests for the default output (text and json) Co-authored-by: gammazero <gammazero@users.noreply.github.com> Co-authored-by: Marcin Rataj <lidel@lidel.org> (cherry picked from commit 0057199)

rvagg requested review from ribasushi and aschmahmann July 1, 2021 12:23

rvagg force-pushed the rvagg/import-block-count-output branch from ae4855c to 44433f3 Compare July 1, 2021 12:25

ribasushi reviewed Jul 1, 2021

View reviewed changes

core/commands/dag/dag.go Outdated Show resolved Hide resolved

ribasushi reviewed Jul 2, 2021

View reviewed changes

core/commands/dag/import.go Show resolved Hide resolved

achingbrain mentioned this pull request Jul 20, 2021

feat: implement dag import/export ipfs/js-ipfs#3728

Merged

lidel requested changes Jul 30, 2021

View reviewed changes

aschmahmann added the need/author-input Needs input from the original author label Aug 6, 2021

BigLep assigned rvagg and gammazero and unassigned rvagg Sep 3, 2021

BigLep removed the need/author-input Needs input from the original author label Sep 3, 2021

rvagg and others added 3 commits September 7, 2021 14:45

feat: report block count on dag import

6f0b5c2

fix: clean-up dag import message format

daab857

Only print stats when --stats flag is passed

875c3f0

This applies to both text and json output encoings. - Stats data is now contained within a Stats datastructure - Stats are printed after root so that first line of output is the same as previously, even when stats are output using --stats

gammazero force-pushed the rvagg/import-block-count-output branch from edbbb78 to 875c3f0 Compare September 7, 2021 21:46

fix sharness test

52c970f

Add PayloadBytesCount to stats

c14d1bc

Attempt to stabilize flaky tests

56b87df

gammazero requested a review from lidel September 8, 2021 09:54

lidel requested changes Sep 9, 2021

View reviewed changes

Rename PayloadBytesCount to BlockBytesCount

2636d56

gammazero requested a review from lidel September 9, 2021 16:34

lidel requested changes Sep 9, 2021

View reviewed changes

Correctly calculate size or imported dag

160b9a6

gammazero requested a review from lidel September 10, 2021 13:33

Use RawSize of original block for import bytes calc

ed9b606

BigLep added this to the go-ipfs 0.11 milestone Sep 21, 2021

lidel assigned lidel and unassigned gammazero Sep 21, 2021

test: dag import without --stats

02fe735

basic regression tests for the default output (text and json)

lidel approved these changes Sep 21, 2021

View reviewed changes

lidel changed the title ~~feat: report block count on dag import~~ feat: dag import --stats (#8237) Sep 23, 2021

lidel merged commit 0057199 into master Sep 23, 2021

lidel deleted the rvagg/import-block-count-output branch September 23, 2021 14:23

aschmahmann mentioned this pull request Sep 27, 2021

v0.10.0-rc2 #8459

Merged

5 tasks

aschmahmann mentioned this pull request Sep 30, 2021

Release v0.10 #8176

Closed

62 tasks

rvagg mentioned this pull request Oct 5, 2021

The ipfs.dag.import API ipfs/js-ipfs#3910

Closed

guseggert mentioned this pull request Nov 23, 2021

Release v0.11 #8343

Closed

80 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: dag import --stats (#8237) #8237

feat: dag import --stats (#8237) #8237

rvagg commented Jul 1, 2021

aschmahmann commented Jul 9, 2021

ribasushi commented Jul 9, 2021

lidel left a comment •

edited by gammazero

Loading

BigLep commented Aug 20, 2021

rvagg commented Aug 23, 2021

BigLep commented Sep 3, 2021

gammazero commented Sep 7, 2021

rvagg commented Sep 8, 2021

gammazero commented Sep 8, 2021 •

edited

Loading

lidel left a comment

lidel Sep 9, 2021 •

edited

Loading

gammazero Sep 9, 2021 •

edited

Loading

lidel Sep 9, 2021

gammazero Sep 10, 2021 •

edited

Loading

rvagg Sep 10, 2021

ribasushi Sep 10, 2021

rvagg Sep 10, 2021

lidel left a comment

rvagg commented Sep 13, 2021

lidel left a comment •

edited

Loading

feat: dag import --stats (#8237) #8237

feat: dag import --stats (#8237) #8237

Conversation

rvagg commented Jul 1, 2021

aschmahmann commented Jul 9, 2021

ribasushi commented Jul 9, 2021

lidel left a comment • edited by gammazero Loading

Choose a reason for hiding this comment

TODO before merge

BigLep commented Aug 20, 2021

rvagg commented Aug 23, 2021

BigLep commented Sep 3, 2021

gammazero commented Sep 7, 2021

rvagg commented Sep 8, 2021

gammazero commented Sep 8, 2021 • edited Loading

lidel left a comment

Choose a reason for hiding this comment

lidel Sep 9, 2021 • edited Loading

Choose a reason for hiding this comment

gammazero Sep 9, 2021 • edited Loading

Choose a reason for hiding this comment

lidel Sep 9, 2021

Choose a reason for hiding this comment

gammazero Sep 10, 2021 • edited Loading

Choose a reason for hiding this comment

rvagg Sep 10, 2021

Choose a reason for hiding this comment

ribasushi Sep 10, 2021

Choose a reason for hiding this comment

rvagg Sep 10, 2021

Choose a reason for hiding this comment

lidel left a comment

Choose a reason for hiding this comment

rvagg commented Sep 13, 2021

lidel left a comment • edited Loading

Choose a reason for hiding this comment

lidel left a comment •

edited by gammazero

Loading

gammazero commented Sep 8, 2021 •

edited

Loading

lidel Sep 9, 2021 •

edited

Loading

gammazero Sep 9, 2021 •

edited

Loading

gammazero Sep 10, 2021 •

edited

Loading

lidel left a comment •

edited

Loading