From 3f0baaaf4371af820073672d5df28abfd75806df Mon Sep 17 00:00:00 2001 From: Juan Benet Date: Mon, 13 Feb 2017 08:16:32 -0800 Subject: [PATCH] IPFS Repo Spec update (#43) * repo spec (old) update --- repo/README.md | 88 +++++++++++++++++++++++++----------------- repo/fs-repo/README.md | 81 +++++++++++++++++++++++++++++++------- 2 files changed, 120 insertions(+), 49 deletions(-) diff --git a/repo/README.md b/repo/README.md index 65d8d2f83..67ab9f9bd 100644 --- a/repo/README.md +++ b/repo/README.md @@ -17,7 +17,6 @@ A `repo` is the storage repository of an IPFS node. It is the subsystem that actually stores the data ipfs nodes use. All IPFS objects are stored in in a repo (similar to git). - There are many possible repo implementations, depending on the storage media used. Most commonly, ipfs nodes use an [fs-repo](fs-repo). @@ -32,46 +31,35 @@ Repo Implementations: ## Repo Contents -The Repo stores: -- version - the repo version, required for safe migrations +The Repo stores a collection of [IPLD](../merkledag/ipld.md) objects that represent: + - keys - cryptographic keys, including node's identity - config - node configuration and settings -- datastore - locally stored ipfs objects and indexing data +- datastore - content stored locally, and indexing data - logs - debugging and usage event logs -- locks - process semaphores - hooks - scripts to run at predefined times (not yet implemented) -![](ipfs-repo-contents.png?) +Note that the IPLD objects a repo stores are divided into: +- **state** (system, control plane) used for the node's internal state +- **content** (userland, data plane) which represent the user's cached and pinned data. -### version +Additionally, the repo state must determine the following. These need not be IPLD objects, though it is of course encouraged: -Repo implementations may change over time, thus they must all be recognizable. -For example, the `fs-repo` simply includes a `version` file with the contents. - -### keys +- version - the repo version, required for safe migrations +- locks - process semaphores for correct concurrent access -A Repo holds the keys a node has access to, for signing xor encryption. -This includes: -- a special (private, public) key pair that defines the node's identity -- (private, public) key pairs -- symmetric keys - -TODO: perhaps support ssh-agent style delegation. +![](ipfs-repo-contents.png?) -### config +### version -The node's config is a tree of variables, used to configure various aspects -of operation. For example: -- the set of bootstrap peers IPFS uses to connect to the network -- the Swarm, API, and Gateway network listen addresses +Repo implementations may change over time, thus they MUST include a `version` recognizable across versions. Meaning that a tool MUST be able to read the `version` of a given repo type. +For example, the `fs-repo` simply includes a `version` file with the version number. This way, the repo contents can evolve over time but the version remains readable the same way across versions. ### datastore -IPFS nodes stores some merkledag objects locally. These are either pinned -(stored until they are unpinned) or cached (stored until the next repo garbage -collection). +IPFS nodes store some IPLD objects locally. These are either (a) **state objects** required for local operation -- such as the `config` and `keys` -- or (b) **content objects** used to represent data locally available. **Content objects** are either _pinned_ (stored until they are unpinned) or _cached_ (stored until the next repo garbage collection). The name "datastore" comes from [go-datastore](https://github.com/jbenet/go-datastore), a library for @@ -86,26 +74,53 @@ feature swappable datastores, for example: This makes it easy to change properties or performance characteristics of a repo without an entirely new implementation. + +### keys (state) + +A Repo typically holds the keys a node has access to, for signing and for encryption. This includes: + +- a special (private, public) key pair that defines the node's identity +- (private, public) key pairs +- symmetric keys + +Some repos MAY support key-agent delegation, instead of storing the keys directly. + +Keys are structured using the [multikey](https://github.com/jbenet/multikey) format, and are part of the [keychain](../keychain) datastructure. This means all keys are IPLD objects, and that they link to all the data needed to make sense of them, including parent keys, identities, and certificates. + +### config (state) + +The node's `config` (configuration) is a tree of variables, used to configure various aspects of operation. For example: +- the set of bootstrap peers IPFS uses to connect to the network +- the Swarm, API, and Gateway network listen addresses + +It is recommended that `config` files avoid identifying information, so that they may be re-shared across multiple nodes. + +**CHANGES**: today, implementations like go-ipfs store the peer-id and private key directly in the config. These will be removed and moved out. + ### logs -A full IPFS node is complex. Many events can happen, and thus ipfs +A full IPFS node is complex. Many events can happen, and thus some ipfs implementations capture event logs and (optionally) store them for user review or debugging. +Logs MAY be stored directly as IPLD objects along with everything else, but this may be a problem if the logs + +**NOTE**: go-ipfs no longer stores logs. it only emits them at a given route. This section is kept here in case other implementations may wish to store logs, though it may be removed in the future. + ### locks IPFS implementations may use multiple processes, or may disallow multiple -processes from running simultaneously on the same repo. This synchronization -is accomplished via locks on the repo itself. +processes from using the same repo simultaneously. Others may disallow using +the same repo but may allow sharing _datastores_ simultaneously. This +synchronization is accomplished via _locks_. All repos contain the following standard locks: -- `repo.lock` - prevents concurrent access to the repo. - Must be held to read or write. +- `repo.lock` - prevents concurrent access to the repo. Must be held to _read_ or _write_. ### hooks (TODO) -Like git, IPFS will have `hooks`, a set of user configurable scripts that -can be run at predefined moments in ipfs operations. This makes it easy +Like git, IPFS nodes will allow `hooks`, a set of user configurable scripts +to run at predefined moments in ipfs operations. This makes it easy to customize the behavior of ipfs nodes without changing the implementations themselves. @@ -114,14 +129,15 @@ themselves. #### A Repo uniquely identifies an IPFS Node A repository uniquely identifies a node. Running two different ipfs programs -with identical repositories -- and thus identical identities -- will cause +with identical repositories -- and thus identical identities -- WILL cause problems. +Datastores MAY be shared -- with proper synchronization -- though note that sharing datastore access MAY erode privacy. #### Repo implementation changes MUST include migrations -DO NOT BREAK USERS' DATA. It is critical. Thus, any changes to a repo's -implementation must be accompanied by a migration tool. +**DO NOT BREAK USERS' DATA.** This is critical. Thus, any changes to a repo's implementation **MUST** be accompanied by a **SAFE** migration tool. + See https://github.com/jbenet/go-ipfs/issues/537 and https://github.com/jbenet/random-ideas/issues/33 diff --git a/repo/fs-repo/README.md b/repo/fs-repo/README.md index 684f98f85..f0fb7c3ad 100644 --- a/repo/fs-repo/README.md +++ b/repo/fs-repo/README.md @@ -42,10 +42,12 @@ The repo interface is defined [here](../). ### api -`api` is a file that exists only if there is currently a live api listening -for requests. This is used when the `repo.lock` prevents access. Clients may -opt to use the api service, or wait untill the process holding `repo.lock` -exits. The file's content is the api multiaddr +`./api` is a file that exists to denote an API endpoint to listen to. +- It MAY exist even if the endpoint is no longer live (i.e. it is a _stale_ or left-over `./api` file). + +In the presence of an `./api` file, ipfs tools (eg go-ipfs `ipfs daemon`) MUST attempt to delegate to the endpoint, and MAY remove the file if resonably certain the file is stale. (e.g. endpoint is local, but no process is live) + +The `./api` file is used in conjunction with the `repo.lock`. Clients may opt to use the api service, or wait until the process holding `repo.lock` exits. The file's content is the api endoint as a [multiaddr](https://github.com/jbenet/multiaddr) ``` > cat .ipfs/api @@ -57,6 +59,31 @@ Notes: - It is not enough to use the `config` file, as the API addr of a daemon may have been overridden via ENV or flag. +#### api file for remote control + +One use case of the `api` file is to have a repo directory like: + +``` +> tree $IPFS_PATH +/Users/jbenet/.ipfs +└── api + +0 directories, 1 files + +> cat $IPFS_PATH/api +/ip4/1.2.3.4/tcp/5001 +``` + +In go-ipfs, this has the same effect as: + +``` +ipfs --api /ip4/1.2.3.4/tcp/5001 +``` + +Meaning that it makes ipfs tools use an ipfs node at the given endpoint, instead of the local directory as a repo. + +In this use case, the rest of the `$IPFS_PATH` may be completely empty, and no other information is necessary. It cannot be said it is a _repo_ per-se. (TODO: come up with a good name for this). + ### blocks/ The `block/` component contains the raw data representing all IPFS objects @@ -119,9 +146,9 @@ timestamp of their creation. For example: ### repo.lock -`repo.lock` prevents concurrent access to the repo. Its content is the PID -of the process currently holding the lock. This allows clients to detect -a failed lock cleanup. +`repo.lock` prevents concurrent access to the repo. Its content SHOULD BE the +PID of the process currently holding the lock. This allows clients to detect +a failed lock and cleanup. ``` > cat .ipfs/repo.lock @@ -130,17 +157,32 @@ a failed lock cleanup. 42 ttys000 79:05.83 ipfs daemon ``` +**TODO, ADDRESS DISCREPANCY:** the go-ipfs implementation does not currently store the PID in the file, which in some systems causes failures after a failure or a teardown. This SHOULD NOT require any manual intervention-- a present lock should give new processes enough information to recover. Doing this correctly in a portable, safe way, with good UX is very tricky. We must be careful with TOCTTOU bugs, and multiple concurrent processes capable of running at any moment. The goal is for all processes to operate safely, to avoid bothering the user, and for the repo to always remain in a correct, consistent state. + ### version -The `version` file contains the repo implementation name and version +The `version` file contains the repo implementation name and version. This format has changed over time: ``` -> cat version -fs-repo: 1 +# in version 0 +> cat $repo-at-version-0/version +cat: /Users/jbenet/.ipfs/version: No such file or directory + +# in versions 1 and 2 +> cat $repo-at-version-1/version +1 +> cat $repo-at-version-2/version +2 + +# in versions >3 +> cat $repo-at-version-3/version +fs-repo/3 ``` -_Any_ fs-repo implementation of _any_ versions MUST be able to read the -`version` file. It MUST NOT change between versions. +_Any_ fs-repo implementation of _any_ versions `>0` MUST be able to read the +`version` file. It MUST NOT change format between versions. The sole exception is version 0, which had no file. + +**TODO: ADDRESS DISCREPANCY:** versions 1 and 2 of the go-ipfs implementation use just the integer number. It SHOULD have used `fs-repo/`. We could either change the spec and always just use the int, or change go-ipfs in version `>3`. we will have to be backwards compatible. ## Datastore @@ -188,8 +230,21 @@ For example: filesystems are case insensitive. - the multihash prefix is two bytes, which would waste two directory levels, thus these are combined into one. -- the git `idx` and `pack` file could be used to coalesce objects +- the git `idx` and `pack` file formats could be used to coalesce objects + +**TODO: ADDRESS DISCREPANCY:** + +the go-ipfs fs-repo in version 2 uses a different `blocks/` dir layout: + +``` +/Users/jbenet/.ipfs/blocks +├── 12200007 +│   └── 12200007d4e3a319cd8c7c9979280e150fc5dbaae1ce54e790f84ae5fd3c3c1a0475.data +├── 1220000f +│   └── 1220000fadd95a98f3a47c1ba54a26c77e15c1a175a975d88cf198cc505a06295b12.data +``` +We MUST address whether we should change the fs-repo spec to match go-ipfs in version 2, or we should change go-ipfs to match the fs-repo spec (more tiers). We MUST also address whether the levels are a repo version parameter or a config parameter. There are filesystems in which a different fanout will have wildly different performance. These are mostly networked and legacy filesystems. ### Reading without the `repo.lock`