This repository was archived by the owner on Feb 8, 2023. It is now read-only.
This repository was archived by the owner on Feb 8, 2023. It is now read-only.
Sharding IPLD objects #76
Open
Description
This note gathers the constraints + will drive toward a design of object sharding in IPFS and IPLD. Object sharding is the algorithms and formats used to represent a single (virtual) large object out of many smaller ones. Think of this like the way large directories are represented in modern filesystems (RB-Trees, B-Trees, HTrees, etc).
Sharding IPLD objects in general is a useful thing. instead of implementing it for unixfs and other datastructs each time, we could implement it once. it could be a datastruct the others employ, or maybe -- if it is simple enough -- it belongs as part of IPLD itself.
Constraints to support:
- efficient in the small case (1 to 5 nodes)
- allows user-chosen sharding (eg for small numbers of nodes, may want specific construction)
- large fanouts (millions or billions)
- efficient access
- minimize insertion re-writes (shadowing/cloning)
- upgradeable algorithms (can signal which sharding algo via version, or even with a key/val)
- union style fanouts
- hierarchical style fanouts (patricia tries)
For large fanouts, look at
- all filesystems research into indirect block topologies
- ext4 dir entries (list + tree) https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Directory_Entries
- HTree https://en.wikipedia.org/wiki/HTree
- PHTree http://phunq.net/pipermail/tux3/2013-January/000026.html
- B-trees + shadowing + clones http://liw.fi/larch/ohad-btrees-shadowing-clones.pdf
- djb's cdb http://cr.yp.to/cdb.html
- @tv42's set + multiset on ipfs (used already for 0.4.0+ pinset) pinning index (as ipfs objects) and cdb discussion #4 (comment)
case for supporting it on-top of IPLD
- It is nice that the IPLD spec is very simple. Finding a nice way to support this without complicating it much will be hard-- the constraints above do not bode well for this.
- can define it as a different datastruct, should not be hard for other datastructs to extend it
- flexible algorithms for sharding may complicate IPLD
case for supporting it in IPLD
- we could have a very powerful datastructure if sharding came everywhere
- merkle-linking in IPLD is already like hierarchical fanout sharding of a single massive tree, this is just sharding within a single level.
- IPLD already has flexible algos in multicodec
- could use a directive like
@shard
or something - could be an IPLD extension if not properly in core spec.
cc @whyrusleeping @lgierth @diasdavid @cryptix @ion1 @mildred @tv42 @wking