feat: store pins in datastore instead of a DAG #2771

achingbrain · 2020-02-12T14:06:34Z

Adds a .pins datastore to ipfs-repo and uses that to store pins as cbor binary keyed by base32 encoded multihashes (n.b. not CIDs).

Format

As stored in the datastore, each pin has several fields:

{
  codec: // optional Number, the codec from the CID that this multihash was pinned with, if omitted, treated as 'dag-pb'
  version: // optional Number, the version number from the CID that this multihash was pinned with, if omitted, treated as v0
  depth: // Number Infinity = recursive pin, 0 = direct, 1+ = pinned to a depth
  name: // optional String a user-friendly name for the pin
  metadata: // optional Object, user-defined data for the pin
}

Notes:

.codec and .version are stored so we can recreate the original CID when listing pins.

Metadata

The intention is for us to be able to add extra fields that have technical meaning to the root of the object, and the user can store application-specific data in the metadata field.

CLI

$ ipfs pin add bafyfoo --metadata key1=value1,key2=value2
$ ipfs pin add bafyfoo --metadata-format=json --metadata '{"key1":"value1","key2":"value2"}'

$ ipfs pin list
bafyfoo

$ ipfs pin list -l
CID      Name    Type       Metadata
bafyfoo  My pin  Recursive  {"key1":"value1","key2":"value2"}

$ ipfs pin metadata Qmfoo --format=json
{"key1":"value1","key2":"value2"}

HTTP API

'/api/v0/pin/add' route adds new metadata argument, accepts a json string
'/api/v0/pin/metadata' returns metadata as json

Future tech:

Pin namespaces? E.g. the datastore key would be /default/C19A797..., /my-namespace/C19A797...
- ipfs pin ls --namespace=my-namespace
Query interface
- ipfs pin query metadata.key1=value1

Core API

ipfs.pin.addAll accepts and returns an async iterator
ipfs.pin.rmAll accepts and returns an async iterator

// pass a cid or IPFS Path with options
const { cid } = await ipfs.pin.add(new CID('/ipfs/Qmfoo'), {
  recursive: false,
  metadata: {
    key: 'value
  },
  timeout: 2000
}))

// pass an iterable of CIDs
const [{ cid: cid1 }, { cid: cid2 }] = await all(ipfs.pin.addAll([
  new CID('/ipfs/Qmfoo'),
  new CID('/ipfs/Qmbar')
], { timeout: '2s' }))

// pass an iterable of objects with options
const [{ cid: cid1 }, { cid: cid2 }] = await all(ipfs.pin.addAll([
  { cid: new CID('/ipfs/Qmfoo'), recursive: true, comments: 'A recursive pin' },
  { cid: new CID('/ipfs/Qmbar'), recursive: false, comments: 'A direct pin' }
], { timeout: '2s' }))

ipfs.pin.rmAll accepts and returns an async generator (other input types are available)

// pass an IPFS Path or CID
const { cid } = await ipfs.rm(new CID('/ipfs/Qmfoo/file.txt'))

// pass options
const { cid } = await all(ipfs.rm(new CID('/ipfs/Qmfoo'), { recursive: true }))

// pass an iterable of CIDs or objects with options
const [{ cid }] = await all(ipfs.rmAll([{ cid: new CID('/ipfs/Qmfoo'), recursive: true }]))

Bonus: Lets us pipe the output of one command into another:

await pipe(
	ipfs.pin.ls({ type: 'recursive' }),
    (source) => ipfs.pin.rmAll(source)
)

// or
await all(ipfs.pin.rmAll(ipfs.pin.ls({ type: 'recursive'})))

Todo:

dedupe interface pinning tests
ipfs-repo migration script

Depends on:

BREAKING CHANGES:

pins are now stored in a datastore, a repo migration will be necessary

achingbrain · 2020-02-12T14:33:49Z

Refs:

#2650
#2197

Supersedes:

#2198

achingbrain · 2020-02-12T14:57:50Z

Adhoc testing script. Add a buffer without pinning it, time how long it takes to pin it. Store the time and work out the average time taken every 100 pins:

'use strict'

const last = require('it-last')
const drain = require('it-drain')

const { createController } = require('ipfsd-ctl')

async function main () {
  const ipfs = (await createController({
    type: 'go',
    ipfsBin: require('go-ipfs-dep').path(),
    ipfsHttpModule: require('ipfs-http-client'),
    disposable: false
  }))
  await ipfs.init()
  await ipfs.start()

  let times = []
  let chunk = 0

  for (let i = 0; i < 83000; i++) {
    const buf = Buffer.from(`${Math.random()}`)

    const result = await last(ipfs.api.add(buf, {
      pin: false
    }))

    const start = Date.now()

    const res = await ipfs.api.pin.add(result.cid)

    if (res[Symbol.asyncIterator]) {
      await drain(res)
    }

    const mem = process.memoryUsage()

    times.push({
      ...mem,
      elapsed: Date.now() - start
    })

    chunk++

    if (chunk === 1000) {
      const sum = times.reduce((acc, curr) => {
        acc.elapsed += curr.elapsed
        acc.rss += curr.rss
        acc.heapTotal += curr.heapTotal
        acc.heapUsed += curr.heapUsed
        acc.external += curr.external

        return acc
      }, { elapsed: 0, rss: 0, heapTotal: 0, heapUsed: 0, external: 0 })

      console.info(`${i + 1}, ${sum.elapsed / times.length}, ${sum.rss / times.length}, ${sum.heapTotal / times.length}, ${sum.heapUsed / times.length}, ${sum.external / times.length}`)

      chunk = 0
      times = []
    }
  }

  await ipfs.stop()
}

main()

Results:

10k pins, DAG vs datastore, ranges from 20-300x speedup in time taken to add a single pin:

After 100k pins, there doesn't seem to be much performance degredation in storing in the datastore whereas the DAG method degrades significantly after 8192 pins (see #2197 for discussion of that):

The next significant performance jump vs DAGs would probably be after the first layer of buckets is full - e.g. 256 buckets of 8192 pins = 2,097,152 pins. That'll probably take a bit of time to benchmark...

achingbrain · 2020-02-12T14:59:39Z

Next steps:

Ensure all tests are passing
Add concurrency to increase perf when fetching indirectly pinned CIDs for very large DAGs
Write repo migration script
Add docs and tests for named pins
Make resolvePath util async iterable

alanshaw · 2020-02-12T15:30:50Z

That's a very cool speed improvement! Some observations:

Double storing of CID
Repo is no longer compatible with go-ipfs repo
We lose the ability to share pinsets via IPFS

achingbrain · 2020-02-12T16:15:05Z

Double storing of CID

I guess you could only store the cid version/codec in the pin? I was thinking of changing the pin type to be an integer too, so there are definitely some improvements that can be made, this is just a first pass.

Repo is no longer compatible with go-ipfs repo

@Stebalien has talked about making a similar change to this too so it's only slightly ahead of the go-ipfs repo.

At any rate, go-ipfs is switching to badger by default which js-ipfs can't read so I'm not sure how much of a priority that is any more.

We lose the ability to share pinsets via IPFS

I guess you can't share your entire list of pins by sharing one CID, but also now do you don't have to share your entire list of pins, you can share individual ones.

Grouping multiple pins as pinsets could be added back in as a new feature, the human readable names would make this nicer to work with.

Something like:

$ ipfs pin add Qmfoo
pinned Qmfoo recursively
$ ipfs pin-set add my-super-fun-pinset Qmfoo
$ ipfs pin-set list my-super-fun-pinset
my-super-fun-pinset Qmqux
  Qmfoo
  Qmbar
  Qmbaz

You could event have the root of a pinset be an IPNS name to allow pulling updates from the network. That'd be neat.

Stebalien · 2020-02-12T17:28:13Z

Repo is no longer compatible with go-ipfs repo

This can be fixed :).

We lose the ability to share pinsets via IPFS

At the moment, I think this is causing strictly more harm than help. It's been 6 years and I have yet to see someone use this.

Ideally, everything would be stored in an IPLD-backed graph database. However, we aren't there yet in terms of tooling.

We could get part way there by creating an IPLD-backed datastore (datastore -> IPLD HAMT -> datastore) but that will throw away the type information.

Double storing of CID

Any reason to store the CID?

keyed by b58

Base64url? Go-ipfs, at least, now has hyper-optimized base58. However, it's still slower than base64 (and takes more space).

Questions/comments.

How much size does this take? The definitions of these pins is now significantly larger so we should account for the overhead.
Let's make sure to allow for arbitrary fields.
Are the names unique?
Given that we can't efficiently query by name, we might want to consider just calling them comments.
We might want to consider mapping names to pins, instead of CIDs to pins. See Voker's work: Named pins & pins stored in datastore kubo#4757. The only concern here is that it significantly changes the API where as the current version just optimizes things a bit.

achingbrain · 2020-02-18T14:53:45Z

Some more graphs. I pinned 83k single blocks using the test script above (originally intended to be 100k but the js-dag benchmark took too long to run and I had to get on an aeroplane).

The initial hump at 8192 pins is there, then a consistent performance degradation over time. At 83k pins, js is taking 2.5s to add a pin. Go has the same degradation but it is significantly less pronounced.

The js-dag implementation stores the pinsets in memory, js-datastore does not. There is an increase in memory usage over time but it's may not be hitting the v8 gc threshold, or there's a leak somewhere...

How much size does this take? The definitions of these pins is now significantly larger so we should account for the overhead.

The sizes appear to be comparable, or perhaps they are statistically insignificant compared to the block size.

After completing the benchmark and running repo gc I see:

# js-dag
.jsipfs $  du -hs
367M	.

# go-dag
.ipfs $ du -hs
353M	.

# js-datastore
.jsipfs $ du -hs
344M	.

Let's make sure to allow for arbitrary fields.

Yes, this is the idea behind storing them CBOR encoded rather than protobufs.

Are the names unique?
Given that we can't efficiently query by name, we might want to consider just calling them comments.

Good suggestion, names are not unique so comments might be a better field name.

We might want to consider mapping names to pins, instead of CIDs to pins

If we're not going let the user query by name we probably shouldn't do this.

Any reason to store the CID?

My thinking was that by using the multihash of a block as the pin identifier (not the full CID), it becomes cheap to calculate if a given block has already been pinned (assuming the user has hashed it with the same algorithm).

The full CID is stored so we can show the user what they used to pin the block when they do a ipfs.pin.ls().

achingbrain · 2020-02-18T15:16:17Z

cc @hsanjuan

Adds a `.pins` datastore to `ipfs-repo` and uses that to store pins as cbor binary keyed by base64 stringified multihashes (n.b. not CIDs). Each pin has several fields: ```javascript { cid: // buffer, the full CID pinned type: // string, 'recursive' or 'direct' comments: // string, human-readable comments for the pin } ``` BREAKING CHANGES: * pins are now stored in a datastore, a repo migration will be necessary * ipfs.pins.add now returns an async generator * ipfs.pins.rm now returns an async generator Depends on: - [ ] ipfs/js-ipfs-repo#221

The changes in ipfs/js-ipfs#2771 mean that the input/output of `ipfs.pins.add` and `ipfs.pins.rm` are now streaming so this PR updates to the new API.

…-in-datastore

Adds a `.pins` datastore to `ipfs-repo` and uses that to store pins as cbor binary keyed by multihash. ### Format As stored in the datastore, each pin has several fields: ```javascript { codec: // optional Number, the codec from the CID that this multihash was pinned with, if omitted, treated as 'dag-pb' version: // optional Number, the version number from the CID that this multihash was pinned with, if omitted, treated as v0 depth: // Number Infinity = recursive pin, 0 = direct, 1+ = pinned to a depth comments: // optional String user-friendly description of the pin metadata: // optional Object, user-defined data for the pin } ``` Notes: `.codec` and `.version` are stored so we can recreate the original CID when listing pins. ### Metadata The intention is for us to be able to add extra fields that have technical meaning to the root of the object, and the user can store application-specific data in the `metadata` field. ### CLI ```console $ ipfs pin add bafyfoo --metadata key1=value1,key2=value2 $ ipfs pin add bafyfoo --metadata-format=json --metadata '{"key1":"value1","key2":"value2"}' $ ipfs pin list bafyfoo $ ipfs pin list -l CID Name Type Metadata bafyfoo My pin Recursive {"key1":"value1","key2":"value2"} $ ipfs pin metadata Qmfoo --format=json {"key1":"value1","key2":"value2"} ``` ### HTTP API * '/api/v0/pin/add' route adds new `metadata` argument, accepts a json string * '/api/v0/pin/metadata' returns metadata as json ### Core API * `ipfs.pin.addAll` accepts and returns an async iterator * `ipfs.pin.rmAll` accepts and returns an async iterator ```javascript // pass a cid or IPFS Path with options const { cid } = await ipfs.pin.add(new CID('/ipfs/Qmfoo'), { recursive: false, metadata: { key: 'value }, timeout: 2000 })) // pass an iterable of CIDs const [{ cid: cid1 }, { cid: cid2 }] = await all(ipfs.pin.addAll([ new CID('/ipfs/Qmfoo'), new CID('/ipfs/Qmbar') ], { timeout: '2s' })) // pass an iterable of objects with options const [{ cid: cid1 }, { cid: cid2 }] = await all(ipfs.pin.addAll([ { cid: new CID('/ipfs/Qmfoo'), recursive: true, comments: 'A recursive pin' }, { cid: new CID('/ipfs/Qmbar'), recursive: false, comments: 'A direct pin' } ], { timeout: '2s' })) ``` * ipfs.pin.rmAll accepts and returns an async generator (other input types are available) ```javascript // pass an IPFS Path or CID const { cid } = await ipfs.rm(new CID('/ipfs/Qmfoo/file.txt')) // pass options const { cid } = await all(ipfs.rm(new CID('/ipfs/Qmfoo'), { recursive: true })) // pass an iterable of CIDs or objects with options const [{ cid }] = await all(ipfs.rmAll([{ cid: new CID('/ipfs/Qmfoo'), recursive: true }])) ``` Bonus: Lets us pipe the output of one command into another: ```javascript await pipe( ipfs.pin.ls({ type: 'recursive' }), (source) => ipfs.pin.rmAll(source) ) // or await all(ipfs.pin.rmAll(ipfs.pin.ls({ type: 'recursive'}))) ``` BREAKING CHANGES: * pins are now stored in a datastore, a repo migration will occur on startup * All deps of this module now use Uint8Arrays in place of node Buffers

achingbrain force-pushed the refactor/store-pins-in-datastore branch from 43316eb to 80986f9 Compare February 13, 2020 11:56

This was referenced Feb 21, 2020

add support for batch puts ipfs-inactive/js-ipfs-unixfs-importer#38

Closed

ipfs:preload and ipfs:pin takes much time on ipfs.add #2792

Closed

This was referenced Mar 2, 2020

⚡️ v0.42.0 RELEASE 🚀 #2808

Closed

fix: return iterables from pin add and pin rm ipfs-inactive/js-ipfs-http-client#1240

Closed

achingbrain force-pushed the refactor/store-pins-in-datastore branch from a5ec30e to 582be49 Compare March 4, 2020 08:30

achingbrain changed the title ~~feat: store pins in datastore instead of DAG~~ feat: store pins in datastore instead of a DAG Mar 4, 2020

chore: revert update of ipfs-repo in mfs dev deps

b65c94b

achingbrain added a commit to ipfs/interop that referenced this pull request Mar 5, 2020

refactor: use streaming pin api

1fa8a7a

The changes in ipfs/js-ipfs#2771 mean that the input/output of `ipfs.pins.add` and `ipfs.pins.rm` are now streaming so this PR updates to the new API.

achingbrain mentioned this pull request Mar 5, 2020

refactor: update to js-ipfs@0.50.0 with Uint8Arrays ipfs/interop#107

Merged

achingbrain added 3 commits March 5, 2020 17:21

chore: fix interop tests

a8e1a06

chore: dedupe pinning tests

6ff3a10

chore: fix up http tests

e9ab233

MicrowaveDev mentioned this pull request Mar 5, 2020

Improve ipfs saving time galtproject/geesome-node#209

Closed

achingbrain added 3 commits March 6, 2020 15:54

feat: new api

68345b2

fix: handle invalid paths in http routes

1d304db

chore: fix up tests

8aeb819

achingbrain marked this pull request as ready for review March 6, 2020 17:40

achingbrain requested review from alanshaw, hugomrdias and Stebalien March 6, 2020 17:40

achingbrain added 11 commits August 15, 2020 12:08

chore: fix failing test

bfc74d6

chore: fix pubsub tests

173905f

chore: remove unused dep

d072a40

chore: fix failing test

5879a49

chore: use base64pad and print migration progress

caf8ba5

chore: fix object data

50ad0f2

chore: fix base string

903e2cf

chore: fix failing tests

ba5c29b

chore: fix up examples

5ad0906

chore: fix circuit relay test

a73ec06

chore: ignore invalid messages

a069f4c

This was referenced Aug 17, 2020

Replace node Buffers with Uint8Arrays #3220

Closed

chore: libp2p 0.29 integration #3237

Closed

achingbrain added 6 commits August 24, 2020 14:42

chore: update ipfs-utils version

b2a73f7

chore: update dep versions

27d1866

Merge remote-tracking branch 'origin/master' into refactor/store-pins…

ce097df

…-in-datastore

chore: stringify ipns record properly

6efcc3a

chore: update dep version

2cff1c8

Merge remote-tracking branch 'origin/master' into refactor/store-pins…

8914194

…-in-datastore

achingbrain merged commit 64b7fe4 into master Aug 25, 2020

achingbrain deleted the refactor/store-pins-in-datastore branch August 25, 2020 06:20

obo20 mentioned this pull request Sep 15, 2020

[Benchmark] - Store pins in a datastore instead of a DAG ipfs/kubo#7674

Closed

akavel mentioned this pull request Nov 12, 2020

Named pins & pins stored in datastore ipfs/kubo#4757

Closed

5 tasks

snyk-bot mentioned this pull request Jan 10, 2021

[Snyk] Security upgrade ipfs from 0.25.4 to 0.50.0 rtpro/js-ipfs#32

Open

snyk-bot mentioned this pull request Jan 17, 2021

[Snyk] Security upgrade ipfs from 0.25.4 to 0.50.0 rtpro/js-ipfs#33

Open

snyk-bot mentioned this pull request Apr 27, 2021

[Snyk] Security upgrade ipfs from 0.25.4 to 0.50.0 rtpro/js-ipfs#38

Open

snyk-bot mentioned this pull request May 5, 2021

[Snyk] Security upgrade ipfs from 0.25.4 to 0.50.0 rtpro/js-ipfs#40

Open

achingbrain mentioned this pull request Jul 21, 2021

js-ipfs pinning performance #2197

Closed

achingbrain mentioned this pull request Jan 27, 2023

Pinning and Garbage Collection ipfs/helia#28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: store pins in datastore instead of a DAG #2771

feat: store pins in datastore instead of a DAG #2771

achingbrain commented Feb 12, 2020 •

edited

Loading

achingbrain commented Feb 12, 2020

achingbrain commented Feb 12, 2020 •

edited

Loading

achingbrain commented Feb 12, 2020 •

edited

Loading

alanshaw commented Feb 12, 2020

achingbrain commented Feb 12, 2020

Stebalien commented Feb 12, 2020

achingbrain commented Feb 18, 2020

achingbrain commented Feb 18, 2020

feat: store pins in datastore instead of a DAG #2771

feat: store pins in datastore instead of a DAG #2771

Conversation

achingbrain commented Feb 12, 2020 • edited Loading

Format

Metadata

CLI

HTTP API

Future tech:

Core API

achingbrain commented Feb 12, 2020

achingbrain commented Feb 12, 2020 • edited Loading

achingbrain commented Feb 12, 2020 • edited Loading

alanshaw commented Feb 12, 2020

achingbrain commented Feb 12, 2020

Stebalien commented Feb 12, 2020

achingbrain commented Feb 18, 2020

achingbrain commented Feb 18, 2020

achingbrain commented Feb 12, 2020 •

edited

Loading

achingbrain commented Feb 12, 2020 •

edited

Loading

achingbrain commented Feb 12, 2020 •

edited

Loading