Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boxo/blockstore: failing to parse keys #293

Open
m0ar opened this issue Nov 29, 2024 · 11 comments
Open

boxo/blockstore: failing to parse keys #293

m0ar opened this issue Nov 29, 2024 · 11 comments
Labels
need/author-input Needs input from the original author need/triage Needs initial labeling and prioritization

Comments

@m0ar
Copy link

m0ar commented Nov 29, 2024

Hi 👋

Problem

We are using go-ds-s3@0.11.0 bundled/built into kubo@0.24.0, but are seeing consistent failures in everything that is using AllKeysChan. It prevents reprovide from being functional, as well as all other block listing operations like ipfs repo ls.

These errors are logged at a very high speed every time that happens:

{"level":"warn","ts":"2024-11-29T10:35:13.118Z","logger":"blockstore","caller":"blockstore/blockstore.go:298","msg":"error parsing key from binary: illegal base32 data at input byte 4"}
{"level":"warn","ts":"2024-11-29T10:35:13.118Z","logger":"blockstore","caller":"blockstore/blockstore.go:298","msg":"error parsing key from binary: illegal base32 data at input byte 4"}
{"level":"warn","ts":"2024-11-29T10:35:13.118Z","logger":"blockstore","caller":"blockstore/blockstore.go:298","msg":"error parsing key from binary: illegal base32 data at input byte 4"}
{"level":"warn","ts":"2024-11-29T10:35:13.118Z","logger":"blockstore","caller":"blockstore/blockstore.go:298","msg":"error parsing key from binary: illegal base32 data at input byte 4"}

The source of the error is this check in boxo/blockstore.go (v0.15.0 for go-ds-s3@v0.11.0): https://github.com/ipfs/boxo/blob/v0.15.0/blockstore/blockstore.go#L296-L300

This is quite bad as it prevents the node from being a useful network participant. While it can serve most direct queries, it prevents efficient content discovery from other peers.

It seems related to #198, as this has the same failure mode. (cc @Stebalien @obo20 )
Not sure if fcbd2ef alone would do the trick, or if it relies on other changes in that branch? 🤔

Analysis

We have a few other nodes running go-ds-s3@kubo-0.26.0 with kubo@release-0.26.0, that show the same issues, but on byte 6 instead:

error parsing key from binary: illegal base32 data at input byte **6**

Given those two datapoint, I suspect it's choking on the rootDirectory being included in the response when comparing the datastore specs. The node erroring out at byte 4 has rootDirectory: "ipfs", and the other has desci-s3-public-ipfs-production. When looking at the S3 paths, I think it chokes on special chars in the path, but this shouldn't happen as boxo should only be getting the naked flatfs keys:

ipfs/CIQA2274WAKEDP4ICBKMSTA5QHDI63OA4NM25GJELROEKYZMBC2QGNI
1234⬆

public-ceramic-ipfs-dev-1/CIQA222FAE4B3WMWDZ5DJU5EKE32XMA2GNJDHEBE7SKG5VIYMAYYQXA
123456⬆
Datastore specs 👈 (click to expand)
// go-ds-s3@0.11.0 + kubo@0.24.0
"Spec": {
  "mounts": [
    {
      "child": {
        "accessKey": "LEAKED",
        "bucket": "desci-s3-public-ipfs-production",
        "region": "us-east-2",
        "rootDirectory": "ipfs",
        "secretKey": "LEAKED",
        "type": "s3ds"
      },
      "mountpoint": "/blocks",
      "prefix": "s3.datastore",
      "type": "measure"
    },
    {
      "child": {
        "compression": "none",
        "path": "datastore",
        "type": "levelds"
      },
      "mountpoint": "/",
      "prefix": "leveldb.datastore",
      "type": "measure"
    }
  ],
  "type": "mount"
},
// go-ds-s3@kubo-0.26.0 + kubo@release-0.26.0
"Spec": {
  "mounts": [
    {
      "child": {
        "accessKey": "[redacted]",
        "bucket": "public-ceramic-ipfs-dev",
        "region": "us-east-2",
        "rootDirectory": "public-ceramic-ipfs-dev-1",
        "secretKey": "[redacted]",
        "type": "s3ds"
      },
      "mountpoint": "/blocks",
      "prefix": "s3.datastore",
      "type": "measure"
    },
    {
      "child": {
        "compression": "none",
        "path": "datastore",
        "type": "levelds"
      },
      "mountpoint": "/",
      "prefix": "leveldb.datastore",
      "type": "measure"
    }
  ],
  "type": "mount"
},

Questions

  1. Is there any workaround we can use?
  2. Can this be solved by merging fcbd2ef from Fix query logic #39 ?
  3. Is this solved in later versions of this plugin?
  4. In general, do we need to be particularly careful when upgrading kubo when using this plugin? Are there points in the kubo release history that require particular migration steps for this datastore?
@m0ar m0ar added the need/triage Needs initial labeling and prioritization label Nov 29, 2024
Copy link

welcome bot commented Nov 29, 2024

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@hsanjuan
Copy link
Contributor

Hello @m0ar , we received an alert about leaked AWS keys. I believe you edited your message etc. but about 13 repositories admin got notified and can see them. Please rotated them.

@hsanjuan
Copy link
Contributor

Regarding your issue, Kubo is at v0.32.1 and the version you are using is over a year old. Please upgrade. Your analysis is probably correct though, would you like to send a patch?

@m0ar
Copy link
Author

m0ar commented Nov 29, 2024

Hello @m0ar , we received an alert about leaked AWS keys. I believe you edited your message etc. but about 13 repositories admin got notified and can see them. Please rotated them.

Woah, thanks a lot! So sloppy of me, but bound to leak some secrets at some point ig 😅

We've handled the issue, appreciate the heads up 🧁

@m0ar
Copy link
Author

m0ar commented Nov 29, 2024

@hsanjuan

Regarding your issue, Kubo is at v0.32.1 and the version you are using is over a year old. Please upgrade. Your analysis is probably correct though, would you like to send a patch?

All right, that's fair. Are there kubo versions where it needs to perform migrations within the datastore, and if so what does that process look like? Or, can we assume the bucket storage format is stable and freely migrate kubo versions?

@m0ar
Copy link
Author

m0ar commented Nov 29, 2024

Re. the linked commit, not sure if it's enough after having a closer look. The change uses ds.NewKey when preparing the prefix. The point seems to handle leading slashes and reduntant path segments.

image

// NewKey constructs a key from string. it will clean the value.
func NewKey(s string) Key {
	k := Key{s}
	k.Clean()
	return k
}

// Clean up a Key, using path.Clean.
func (k *Key) Clean() {
	switch {
	case len(k.string) == 0:
		k.string = "/"
	case k.string[0] == '/':
		k.string = path.Clean(k.string)
	default:
		k.string = path.Clean("/" + k.string)
	}
}

As I understand it after reading path.Clean, this only makes sure the prefix we pass to s.S3.ListObjectsV2 is sane, i.e. is a valid path.

We know blockstore.go doesn't pass a prefix with the query [source] (https://github.com/ipfs/boxo/blob/v0.15.0/blockstore/blockstore.go#L272-L273) :

q := dsq.Query{KeysOnly: true}
res, err := bs.datastore.Query(ctx, q)
[...]

Following the logic in https://github.com/ipfs/go-ds-s3/blob/master/s3.go#L186-L243, we then know Prefix is set to path.Join(s.RootDirectory, '').

This should be the equivalent S3 API query:

❯ aws s3api list-objects-v2 --bucket=desci-ipfs-staging --max-keys 10 --prefix 'ipfs/' --delimiter '/' | jq --raw-output ".Contents[] | .Key"
ipfs/
ipfs/CIQA2223KXGBREW6S5HXAVLGJ2QGYQVA6CC4SE4PTNLJUSQ76R7LBQY
ipfs/CIQA2224437OEF75774FFOCEVPR5VXCM26W6N4N3WPZB4I5WTKP6HEI
ipfs/CIQA222HN4FOUCPIJVL255DMG2I53W6KPCUPRXVRXE6CJCCKGF3QKAQ
ipfs/CIQA222QZZZCHAHNZT2R77AXCYPYQV6EFEBRD6M37X7P6DACGZZTRPI
ipfs/CIQA222RKPSYT7YS56QSC77D7OUW2EHEZXHVBYVGUPLMAFTU6P24IIQ
ipfs/CIQA224PFVXWDIZ6KRZWOG4AFSWXW7ZV3CE3VEMIKX2HU2W4DJPHWEQ
ipfs/CIQA2274WAKEDP4ICBKMSTA5QHDI63OA4NM25GJELROEKYZMBC2QGNI
ipfs/CIQA227AE5IEBPVBCZ7ILEDSVDVEYKUV2RJW6DHV6NM77CW5HXT5UBY
ipfs/CIQA22A7CZPEBSYZFJ4MAQS3F47EPJXPMMWJV2T7OMZ7AY5ES66RYRI

Then, the module only packages those results up as-is:

entry := dsq.Entry{
    Key:  ds.NewKey(*resp.Contents[index].Key).String(),
    Size: int(*resp.Contents[index].Size),
}

...in which case this error makes total sense to me, because we never chop off the rootDirectory from the S3 keys!
I think we may need to:

  1. Remove the first key iff it equals the rootDirectory
  2. Do the golang equivalent of sed 's|$rootDirectory||' to chop it off the head of each key

@hsanjuan
Copy link
Contributor

Isn't it what this comment is doing: #198 (comment)

@gammazero
Copy link
Contributor

Were you previously running an old version of go-ipfs (e.g. v0.8) which created block keys in an old format? If so the migration to the new format of of blockstore keys is found in v0.12. https://github.com/ipfs/kubo/releases/tag/v0.12.0

Or, is this an installation of a recent version of kubo and s3-plugin which is somehow creating new keys in a non-supported format?

@gammazero gammazero added the need/author-input Needs input from the original author label Dec 10, 2024
@hsanjuan
Copy link
Contributor

If I understood correctly, the problem affects only listing all the keys. Wrong key format would affect any GET.

@hsanjuan
Copy link
Contributor

Or maybe we still have the old hack in place for backwards compatibility with old keys on GET.

@hsanjuan
Copy link
Contributor

Or maybe we still have the old hack in place for backwards compatibility with old keys on GET.

No I don't think that exists.

@lidel lidel added need/author-input Needs input from the original author need/triage Needs initial labeling and prioritization and removed need/author-input Needs input from the original author need/triage Needs initial labeling and prioritization labels Dec 17, 2024
@lidel lidel changed the title Boxo failing to parse blockstore keys boxo/blockstore: failing to parse blockstore keys Dec 17, 2024
@lidel lidel changed the title boxo/blockstore: failing to parse blockstore keys boxo/blockstore: failing to parse keys Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/author-input Needs input from the original author need/triage Needs initial labeling and prioritization
Projects
None yet
Development

No branches or pull requests

4 participants