-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix malformed CAR panics and excessive memory usage #312
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
There is no way I can make a safe implementation of the parser by slurping thing into memory, indexes people use are just too big. So I made a new API which force consumers to manage that. They can choose to use a bytes.Reader, *os.File, mmaped thing, ...
Update index generation tests to assert indices are identical. Fix minor typo in the test utility name and a bug where the check was not using both index instances to assert they are identical. Also refactor the use of lock in favour of wait group for better readability of the assertion logic.
Previous changes added the `ForEach` interface to `Index` type which enables iteration through the index by multihash and offset. However, not all index types contain enough information to construct the full multihash. Namely, `CarIndexSorted` only stores the digest portion of the multihashes. In order to implement `ForEach` for this index type correctly uses the `uint64` max value as the code in the multihash and document the behaviour where the iterations over this index type should not rely on the returned code. Note that the max value is used as a code that doesn't match any existing multicodec.Code to avoid misleading users.
This index type does not store enough information to satisfy `ForEach`. It only contains the digest of mulithashes and not their code. Instead of some partial functionality simply return an error when `ForEach` is called on this function type. Because, there is no valid use for this index type and the user should ber regenerating the index to the newer `car-multihash-index-sorted` anyway. Update tests to include samples of both types and assert IO operations and index generation for both formats.
`go-cid` exposes `Sum` API that facilitates calculation of the CID from `[]byte` payload. `go-multihash` now exposes `SumStream` which can calculate digest from `io.Reader` as well as `[]byte`. But, unfortunately the equivalent API does not exist in `go-cid`. To avoid copying the entire block into memory, implement CID calculation using the streaming multihash sum during inspection of CAR payload.
This reverts the earlier changes to get the message consistent. Note, the CID we expect is the one in the CAR payload, not the calculated CID for the block.
Benchmark the `Reader.Inspect` with and without hash validation using a randomly generated CARv2 file of size 10 MiB. Results from running the benchmark in parallel locally on MacOS `Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz` repeated 10 times: ``` Reader_InspectWithBlockValidation-8 5.30ms ±48% Reader_InspectWithoutBlockValidation-8 231µs ±42% name speed Reader_InspectWithBlockValidation-8 2.08GB/s ±35% Reader_InspectWithoutBlockValidation-8 46.8GB/s ±32% name alloc/op Reader_InspectWithBlockValidation-8 10.7MB ± 0% Reader_InspectWithoutBlockValidation-8 60.7kB ± 0% name allocs/op Reader_InspectWithBlockValidation-8 4.54k ± 0% Reader_InspectWithoutBlockValidation-8 2.29k ± 0% ```
Cosmetic refactor to rename `car.CarStats` to `car.Stats`, which looks more fluent when using the API.
Return potential error when reading section error as varint. Add test to verify the error is indeed returned. Use `errors.New` instead of `fmt.Errorf` when no formatting is needed in error message.
Revert the changes to `index.Index` interface such that it is the same as the current go-car main branch head. Reverting the changes, however, means that unmarshalling untrusted indices is indeed dangerous and should not be done on untrusted files. Note, the `carv2.Reader` APIs are changed to return errors as well as readers when getting `DataReader` and `IndexReader`. This is to accommodate issues detected by fuzz testing while removing boiler plate code in internal IO reader conversion. This is a breaking change to the current API but should be straight forward to roll out. Remove index fuzz tests and change inspection to only read the index codec instead of reading the entire index.
Revert changes to serialization of `insertionindex` postponed until the streaming index work stream.
Jorropo
approved these changes
Jul 6, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🤞
Suggested version: Changes in diff --git a/go.mod b/go.mod
index f63811b..1d50e22 100644
--- a/go.mod
+++ b/go.mod
@@ -2,16 +2,58 @@ module github.com/ipld/go-car
require (
github.com/ipfs/go-block-format v0.0.3
- github.com/ipfs/go-cid v0.0.7
- github.com/ipfs/go-datastore v0.5.1 // indirect
- github.com/ipfs/go-ipfs-blockstore v1.1.2 // indirect
+ github.com/ipfs/go-cid v0.1.0
github.com/ipfs/go-ipld-cbor v0.0.5
github.com/ipfs/go-ipld-format v0.2.0
github.com/ipfs/go-merkledag v0.5.1
- github.com/ipld/go-codec-dagpb v1.3.0
- github.com/ipld/go-ipld-prime v0.14.3-0.20211207234443-319145880958
+ github.com/ipld/go-codec-dagpb v1.3.1
+ github.com/ipld/go-ipld-prime v0.16.0
github.com/multiformats/go-multihash v0.1.0
github.com/stretchr/testify v1.7.0
)
-go 1.16
+require (
+ github.com/davecgh/go-spew v1.1.1 // indirect
+ github.com/gogo/protobuf v1.3.2 // indirect
+ github.com/google/uuid v1.2.0 // indirect
+ github.com/hashicorp/golang-lru v0.5.4 // indirect
+ github.com/ipfs/bbloom v0.0.4 // indirect
+ github.com/ipfs/go-blockservice v0.2.1 // indirect
+ github.com/ipfs/go-datastore v0.5.1 // indirect
+ github.com/ipfs/go-ipfs-blockstore v1.1.2 // indirect
+ github.com/ipfs/go-ipfs-ds-help v1.1.0 // indirect
+ github.com/ipfs/go-ipfs-exchange-interface v0.1.0 // indirect
+ github.com/ipfs/go-ipfs-exchange-offline v0.1.1 // indirect
+ github.com/ipfs/go-ipfs-util v0.0.2 // indirect
+ github.com/ipfs/go-ipld-legacy v0.1.0 // indirect
+ github.com/ipfs/go-log v1.0.5 // indirect
+ github.com/ipfs/go-log/v2 v2.3.0 // indirect
+ github.com/ipfs/go-metrics-interface v0.0.1 // indirect
+ github.com/ipfs/go-verifcid v0.0.1 // indirect
+ github.com/jbenet/goprocess v0.1.4 // indirect
+ github.com/klauspost/cpuid/v2 v2.0.9 // indirect
+ github.com/mattn/go-isatty v0.0.13 // indirect
+ github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1 // indirect
+ github.com/minio/sha256-simd v1.0.0 // indirect
+ github.com/mr-tron/base58 v1.2.0 // indirect
+ github.com/multiformats/go-base32 v0.0.3 // indirect
+ github.com/multiformats/go-base36 v0.1.0 // indirect
+ github.com/multiformats/go-multibase v0.0.3 // indirect
+ github.com/multiformats/go-varint v0.0.6 // indirect
+ github.com/opentracing/opentracing-go v1.2.0 // indirect
+ github.com/pmezard/go-difflib v1.0.0 // indirect
+ github.com/polydawn/refmt v0.0.0-20201211092308-30ac6d18308e // indirect
+ github.com/spaolacci/murmur3 v1.1.0 // indirect
+ github.com/whyrusleeping/cbor-gen v0.0.0-20200123233031-1cdf64d27158 // indirect
+ go.uber.org/atomic v1.7.0 // indirect
+ go.uber.org/multierr v1.6.0 // indirect
+ go.uber.org/zap v1.16.0 // indirect
+ golang.org/x/crypto v0.0.0-20210506145944-38f3c27a63bf // indirect
+ golang.org/x/sys v0.0.0-20210426080607-c94f62235c83 // indirect
+ golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
+ google.golang.org/protobuf v1.27.1 // indirect
+ gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c // indirect
+ lukechampine.com/blake3 v1.1.6 // indirect
+)
+
+go 1.17diff --git a/cmd/go.mod b/cmd/go.mod
index 0d933ec..a7b9580 100644
--- a/cmd/go.mod
+++ b/cmd/go.mod
@@ -1,20 +1,74 @@
module github.com/ipld/go-car/cmd
-go 1.16
+go 1.17
require (
github.com/dustin/go-humanize v1.0.0
github.com/ipfs/go-block-format v0.0.3
github.com/ipfs/go-cid v0.1.0
- github.com/ipfs/go-ipfs-blockstore v1.0.3
- github.com/ipfs/go-unixfsnode v1.1.4-0.20211105121048-b9b6e9dc571e
- github.com/ipld/go-car v0.3.2-0.20211001222544-c93f5367a75c
- github.com/ipld/go-car/v2 v2.1.0
- github.com/ipld/go-codec-dagpb v1.3.0
- github.com/ipld/go-ipld-prime v0.12.4-0.20211014180653-3ba656a3bc6b
- github.com/multiformats/go-multicodec v0.3.1-0.20210902112759-1539a079fd61
- github.com/multiformats/go-multihash v0.0.17-0.20211012170226-654b06d7912a
+ github.com/ipfs/go-ipfs-blockstore v1.1.2
+ github.com/ipfs/go-unixfsnode v1.4.0
+ github.com/ipld/go-car v0.3.3
+ github.com/ipld/go-car/v2 v2.1.2-0.20220121091436-7e10f104f452
+ github.com/ipld/go-codec-dagpb v1.3.1
+ github.com/ipld/go-ipld-prime v0.16.1-0.20220324131737-6e219a02ca16
+ github.com/multiformats/go-multicodec v0.4.1
+ github.com/multiformats/go-multihash v0.1.0
github.com/multiformats/go-varint v0.0.6
- github.com/rogpeppe/go-internal v1.8.1-0.20210923151022-86f73c517451
+ github.com/rogpeppe/go-internal v1.8.1
github.com/urfave/cli/v2 v2.3.0
)
+
+require (
+ github.com/Stebalien/go-bitfield v0.0.1 // indirect
+ github.com/cpuguy83/go-md2man/v2 v2.0.0-20190314233015-f79a8a8ca69d // indirect
+ github.com/gogo/protobuf v1.3.2 // indirect
+ github.com/google/uuid v1.2.0 // indirect
+ github.com/hashicorp/golang-lru v0.5.4 // indirect
+ github.com/ipfs/bbloom v0.0.4 // indirect
+ github.com/ipfs/go-bitfield v1.0.0 // indirect
+ github.com/ipfs/go-blockservice v0.2.1 // indirect
+ github.com/ipfs/go-datastore v0.5.1 // indirect
+ github.com/ipfs/go-ipfs-chunker v0.0.1 // indirect
+ github.com/ipfs/go-ipfs-ds-help v1.1.0 // indirect
+ github.com/ipfs/go-ipfs-exchange-interface v0.1.0 // indirect
+ github.com/ipfs/go-ipfs-util v0.0.2 // indirect
+ github.com/ipfs/go-ipld-cbor v0.0.5 // indirect
+ github.com/ipfs/go-ipld-format v0.2.0 // indirect
+ github.com/ipfs/go-ipld-legacy v0.1.0 // indirect
+ github.com/ipfs/go-log v1.0.5 // indirect
+ github.com/ipfs/go-log/v2 v2.3.0 // indirect
+ github.com/ipfs/go-merkledag v0.5.1 // indirect
+ github.com/ipfs/go-metrics-interface v0.0.1 // indirect
+ github.com/ipfs/go-verifcid v0.0.1 // indirect
+ github.com/jbenet/goprocess v0.1.4 // indirect
+ github.com/klauspost/cpuid/v2 v2.0.9 // indirect
+ github.com/libp2p/go-buffer-pool v0.0.2 // indirect
+ github.com/mattn/go-isatty v0.0.13 // indirect
+ github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1 // indirect
+ github.com/minio/sha256-simd v1.0.0 // indirect
+ github.com/mr-tron/base58 v1.2.0 // indirect
+ github.com/multiformats/go-base32 v0.0.3 // indirect
+ github.com/multiformats/go-base36 v0.1.0 // indirect
+ github.com/multiformats/go-multibase v0.0.3 // indirect
+ github.com/opentracing/opentracing-go v1.2.0 // indirect
+ github.com/petar/GoLLRB v0.0.0-20210522233825-ae3b015fd3e9 // indirect
+ github.com/pkg/diff v0.0.0-20210226163009-20ebb0f2a09e // indirect
+ github.com/polydawn/refmt v0.0.0-20201211092308-30ac6d18308e // indirect
+ github.com/russross/blackfriday/v2 v2.0.1 // indirect
+ github.com/shurcooL/sanitized_anchor_name v1.0.0 // indirect
+ github.com/spaolacci/murmur3 v1.1.0 // indirect
+ github.com/whyrusleeping/cbor v0.0.0-20171005072247-63513f603b11 // indirect
+ github.com/whyrusleeping/cbor-gen v0.0.0-20200123233031-1cdf64d27158 // indirect
+ github.com/whyrusleeping/chunker v0.0.0-20181014151217-fe64bd25879f // indirect
+ go.uber.org/atomic v1.7.0 // indirect
+ go.uber.org/multierr v1.7.0 // indirect
+ go.uber.org/zap v1.16.0 // indirect
+ golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97 // indirect
+ golang.org/x/exp v0.0.0-20210615023648-acb5c1269671 // indirect
+ golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c // indirect
+ golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
+ google.golang.org/protobuf v1.27.1 // indirect
+ gopkg.in/errgo.v2 v2.1.0 // indirect
+ lukechampine.com/blake3 v1.1.6 // indirect
+)
diff --git a/v2/go.mod b/v2/go.mod
index e3b2043..b2686a7 100644
--- a/v2/go.mod
+++ b/v2/go.mod
@@ -1,24 +1,70 @@
module github.com/ipld/go-car/v2
-go 1.16
+go 1.17
require (
github.com/ipfs/go-block-format v0.0.3
github.com/ipfs/go-cid v0.1.0
- github.com/ipfs/go-ipfs-blockstore v1.1.2
+ github.com/ipfs/go-ipfs-blockstore v1.2.0
github.com/ipfs/go-ipld-cbor v0.0.5
- github.com/ipfs/go-ipld-format v0.2.0
- github.com/ipfs/go-merkledag v0.5.1
- github.com/ipld/go-codec-dagpb v1.3.0
- github.com/ipld/go-ipld-prime v0.14.0
+ github.com/ipfs/go-ipld-format v0.3.0
+ github.com/ipfs/go-merkledag v0.6.0
+ github.com/ipfs/go-unixfsnode v1.4.0
+ github.com/ipld/go-codec-dagpb v1.3.1
+ github.com/ipld/go-ipld-prime v0.16.0
github.com/ipld/go-ipld-prime/storage/bsadapter v0.0.0-20211210234204-ce2a1c70cd73
- github.com/multiformats/go-multicodec v0.3.1-0.20210902112759-1539a079fd61
+ github.com/multiformats/go-multicodec v0.5.0
github.com/multiformats/go-multihash v0.1.0
github.com/multiformats/go-varint v0.0.6
github.com/petar/GoLLRB v0.0.0-20210522233825-ae3b015fd3e9
github.com/stretchr/testify v1.7.0
github.com/whyrusleeping/cbor v0.0.0-20171005072247-63513f603b11
- golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97 // indirect
golang.org/x/exp v0.0.0-20210615023648-acb5c1269671
+)
+
+require (
+ github.com/Stebalien/go-bitfield v0.0.1 // indirect
+ github.com/davecgh/go-spew v1.1.1 // indirect
+ github.com/gogo/protobuf v1.3.2 // indirect
+ github.com/google/uuid v1.2.0 // indirect
+ github.com/hashicorp/golang-lru v0.5.4 // indirect
+ github.com/ipfs/bbloom v0.0.4 // indirect
+ github.com/ipfs/go-bitfield v1.0.0 // indirect
+ github.com/ipfs/go-blockservice v0.3.0 // indirect
+ github.com/ipfs/go-datastore v0.5.0 // indirect
+ github.com/ipfs/go-ipfs-chunker v0.0.1 // indirect
+ github.com/ipfs/go-ipfs-ds-help v1.1.0 // indirect
+ github.com/ipfs/go-ipfs-exchange-interface v0.1.0 // indirect
+ github.com/ipfs/go-ipfs-exchange-offline v0.2.0 // indirect
+ github.com/ipfs/go-ipfs-util v0.0.2 // indirect
+ github.com/ipfs/go-ipld-legacy v0.1.0 // indirect
+ github.com/ipfs/go-log v1.0.5 // indirect
+ github.com/ipfs/go-log/v2 v2.3.0 // indirect
+ github.com/ipfs/go-metrics-interface v0.0.1 // indirect
+ github.com/ipfs/go-verifcid v0.0.1 // indirect
+ github.com/jbenet/goprocess v0.1.4 // indirect
+ github.com/klauspost/cpuid/v2 v2.0.9 // indirect
+ github.com/libp2p/go-buffer-pool v0.0.2 // indirect
+ github.com/mattn/go-isatty v0.0.13 // indirect
+ github.com/minio/blake2b-simd v0.0.0-20160723061019-3f5f724cb5b1 // indirect
+ github.com/minio/sha256-simd v1.0.0 // indirect
+ github.com/mr-tron/base58 v1.2.0 // indirect
+ github.com/multiformats/go-base32 v0.0.3 // indirect
+ github.com/multiformats/go-base36 v0.1.0 // indirect
+ github.com/multiformats/go-multibase v0.0.3 // indirect
+ github.com/opentracing/opentracing-go v1.2.0 // indirect
+ github.com/pmezard/go-difflib v1.0.0 // indirect
+ github.com/polydawn/refmt v0.0.0-20201211092308-30ac6d18308e // indirect
+ github.com/spaolacci/murmur3 v1.1.0 // indirect
+ github.com/whyrusleeping/cbor-gen v0.0.0-20200123233031-1cdf64d27158 // indirect
+ github.com/whyrusleeping/chunker v0.0.0-20181014151217-fe64bd25879f // indirect
+ go.uber.org/atomic v1.7.0 // indirect
+ go.uber.org/multierr v1.7.0 // indirect
+ go.uber.org/zap v1.16.0 // indirect
+ golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97 // indirect
golang.org/x/sys v0.0.0-20210630005230-0f9fa26af87c // indirect
+ golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
+ google.golang.org/protobuf v1.27.1 // indirect
+ gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect
+ lukechampine.com/blake3 v1.1.6 // indirect
)
|
retroactive 👍 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix an issue where malformed CARv1 files cause a panic by introducing a maximum header and section size in CARv1 as an option.
Add fuzz tests across the repo