Skip to content

Added some useful overviews #38

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 7, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions overviews/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# quick-overview docs

These are simple documents meant to be provide quick overviews to various parts of the protocol.

- [Implementing HTTP API bindings](implement-api-bindings.md)
- [Implementing IPFS itself](implement-ipfs.md)
175 changes: 175 additions & 0 deletions overviews/implement-api-bindings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# IPFS API Implementation Doc

This short document aims to give a quick guide to anyone implementing API bindings for IPFS implementations-- in particular go-ipfs.

Sections:
- IPFS Types
- API Transports
- API Commands
- Implementing bindings for the HTTP API

## IPFS Types

IPFS uses a set of type value that is useful to enumerate up front:

- `<ipfs-path>` is unix-style path, beginning with `/ipfs/<hash>/...` or `/ipns/<hash>/...` or `/ipns/<domain>/...`.
- `<hash>` is a base58 encoded [multihash](https://github.com/jbenet/multihash) (there are [many implementations](https://github.com/jbenet/multihash#implementations)). Usually the hash of an ipfs object (or merkle dag node).

A note on streams: IPFS is a streaming protocol. Everything about it can be streamed. When importing files, API requests should aim to stream the data in, and handle back-pressure correctly, so that the IPFS node can handle it sequentially without too much memory pressure. (If using HTTP, this is typically handled for you by writes to the request body blocking.)

## API Transports

Like with everything else, IPFS aims to be flexible regarding the API transports. Currently, the [go-ipfs](https://github.com/ipfs/go-ipfs) implementation supports both an in-process API and an HTTP api. More can be added easily, by mapping the API functions over a transport. (This is similar to how gRPC is also _mapped on top of transports_, like HTTP).

Mapping to a transport involves leveraging the transport's features to express function calls. For example:

### CLI API Transport

In the commandline, IPFS uses a traditional flag and arg-based mapping, where:
- the first arguments selects the command, as in git - e.g. `ipfs object get`
- the flags specify options - e.g. `--enc=protobuf -q`
- the rest are positional arguments - e.g. `ipfs object patch <hash1> add-link foo <hash2>`
- files are specified by filename, or through stdin

(NOTE: when go-ipfs runs the daemon, the CLI API is actually converted to HTTP calls. otherwise, they execute in the same process)

### HTTP API Transport

In HTTP, our API layering uses a REST-like mapping, where:
- the URL path selects the command - e.g `/object/get`
- the URL query string implements option arguments - e.g. `&enc=protobuf&q=true`
- the URL query also implements positional arguments - e.g. `&arg=<hash1>&arg=add-link&arg=foo&arg=<hash2>`
- the request body streams file data - reads files or stdin
- multiple streams are muxed with multipart (todo: add tar stream support)


## API Commands

There is a "standard IPFS API" with a set of commands, which we are documenting clearly soon. But this is not yet extracted into its own document. Perhaps -- as part of this API Bindings effort -- we can document it all. It is currently defined as "all the commands exposed by the go-ipfs implementation". You can see [a listing here](https://github.com/ipfs/go-ipfs/blob/916f987de2c35db71815b54bbb9a0a71df829838/core/commands/root.go#L82-L111), or by running `ipfs commands` locally. **The good news is: we should be able to easily write a program that outputs a markdown API specification!**

(Note: the go-ipfs [commands library](https://github.com/ipfs/go-ipfs/tree/916f987de2c35db71815b54bbb9a0a71df829838/commands) also makes sure to keep the CLI and the HTTP API exactly in sync.)

## Implementing bindings for the HTTP API

As mentioned above, the API commands map to HTTP with:
- the URL path selects the command - e.g `/object/get`
- the URL query string implements option arguments - e.g. `&enc=protobuf&q=true`
- the URL query also implements positional arguments - e.g. `&arg=<hash1>&arg=add-link&arg=foo&arg=<hash2>`
- the request body streams file data - reads files or stdin
- multiple streams are muxed with multipart (todo: add tar stream support)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is decided that we are moving towards tar instead of multipart? That would definitely solve a lot of vinyl madness :) //cc @dignifiedquire

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm no i meant add not replace with


To date, we have two different HTTP API clients:

- [js-ipfs-api](https://github.com/ipfs/js-ipfs-api) - simple javascript wrapper -- best to look at
- [go-ipfs/commands/http](https://github.com/ipfs/go-ipfs/tree/916f987de2c35db71815b54bbb9a0a71df829838/commands/http) - generalized transport based on the [command definitions](https://github.com/ipfs/go-ipfs/tree/916f987de2c35db71815b54bbb9a0a71df829838/core/commands)

The Go implementation is good to answer harder questions, like how is multipart handled, or what headers should be set in edge conditions. But the javascript implementation is very concise, and easy to follow.

### Anatomy of js-ipfs-api

Currently, js-ipfs-api has three main files
- [src/index.js](https://github.com/ipfs/js-ipfs-api/blob/66d1462bd02181d46e8baf4cd9d476b213426ad8/src/index.js) defines the functions clients of the API module will use. uses `RequestAPI`, and translates function call parameters to the API almost directly.
- [src/get-files-stream.js](https://github.com/ipfs/js-ipfs-api/blob/66d1462bd02181d46e8baf4cd9d476b213426ad8/src/get-files-stream.js) implements the hardest part: file streaming. This one uses multipart.
- [src/request-api.js](https://github.com/ipfs/js-ipfs-api/blob/66d1462bd02181d46e8baf4cd9d476b213426ad8/src/request-api.js) generic function call to perform the actual HTTP requests

### Note on multipart + inspecting requests

Despite all the generalization spoken about above, the IPFS API is actually very simple. You can inspect all the requests made with `nc` and the `--api` option (as of [this PR](https://github.com/ipfs/go-ipfs/pull/1598), or `0.3.8`):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sweet! Didn't know about this feature! @RichardLitt this might be super useful for your documentation efforts :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to install the most recent version and test this, but I'm afraid I'm a bit lost. It looks like I should be able to run this as ipfs daemon --api, right?


```
> nc -l 5002 &
> ipfs --api /ip4/127.0.0.1/tcp/5002 swarm addrs local --enc=json
POST /api/v0/version?enc=json&stream-channels=true HTTP/1.1
Host: 127.0.0.1:5002
User-Agent: /go-ipfs/0.3.8/
Content-Length: 0
Content-Type: application/octet-stream
Accept-Encoding: gzip


```

The only hard part is getting the file streaming right. It is (now) fairly easy to stream files to go-ipfs using multipart. Basically, we end up with HTTP requests like this:

```
> nc -l 5002 &
> ipfs --api /ip4/127.0.0.1/tcp/5002 add -r ~/demo/basic/test
POST /api/v0/add?encoding=json&progress=true&r=true&stream-channels=true HTTP/1.1
Host: 127.0.0.1:5002
User-Agent: /go-ipfs/0.3.8/
Transfer-Encoding: chunked
Content-Disposition: form-data: name="files"
Content-Type: multipart/form-data; boundary=2186ef15d8f2c4f100af72d6d345afe36a4d17ef11264ec5b8ec4436447f
Accept-Encoding: gzip

1
-
e5
-2186ef15d8f2c4f100af72d6d345afe36a4d17ef11264ec5b8ec4436447f
Content-Disposition: form-data; name="file"; filename="test"
Content-Type: multipart/mixed; boundary=acdb172fe12f25e8ffae9981ce6f4580abdefb0cae3ceebe464d802866be


9c
--acdb172fe12f25e8ffae9981ce6f4580abdefb0cae3ceebe464d802866be
Content-Disposition: file; filename="test%2Fbar"
Content-Type: application/octet-stream


4
bar

dc

--acdb172fe12f25e8ffae9981ce6f4580abdefb0cae3ceebe464d802866be
Content-Disposition: file; filename="test%2Fbaz"
Content-Type: multipart/mixed; boundary=2799ac77a72ef7b8a0281945806b9f9a28f7681145aa8e91b052d599b2dd


a0
--2799ac77a72ef7b8a0281945806b9f9a28f7681145aa8e91b052d599b2dd
Content-Type: application/octet-stream
Content-Disposition: file; filename="test%2Fbaz%2Fb"


4
bar

a2

--2799ac77a72ef7b8a0281945806b9f9a28f7681145aa8e91b052d599b2dd
Content-Disposition: file; filename="test%2Fbaz%2Ff"
Content-Type: application/octet-stream


4
foo

44

--2799ac77a72ef7b8a0281945806b9f9a28f7681145aa8e91b052d599b2dd--

9e

--acdb172fe12f25e8ffae9981ce6f4580abdefb0cae3ceebe464d802866be
Content-Disposition: file; filename="test%2Ffoo"
Content-Type: application/octet-stream


4
foo

44

--acdb172fe12f25e8ffae9981ce6f4580abdefb0cae3ceebe464d802866be--

44

--2186ef15d8f2c4f100af72d6d345afe36a4d17ef11264ec5b8ec4436447f--

0

```

Which produces: http://gateway.ipfs.io/ipfs/QmNtpA5TBNqHrKf3cLQ1AiUKXiE4JmUodbG5gXrajg8wdv

206 changes: 206 additions & 0 deletions overviews/implement-ipfs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# IPFS Implementation Doc

This short document aims to give a quick guide to anyone implementing IPFS -- it is modelled after go-ipfs, and serving as a template for js-ipfs and py-ipfs.

Sections:
- IPFS Types
- API Transports
- API Commands
- Implementing bindings for the HTTP API

## Libraries First

There are a number of non-ipfs specific things that have been built for ipfs, that ipfs depends on. Implement these first

### The Multis

There are a number of self-describing protocols/formats in use all over ipfs.

- [multiaddr](https://github.com/jbenet/multiaddr)
- [multihash](https://github.com/jbenet/multihash)
- [multicodec](https://github.com/jbenet/multicodec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • multistream

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, addressed in 45b5d5f

- [multistream](https://github.com/jbenet/multistream)

### libp2p

All complex peer-to-peer protocols for IPFS have been abstracted out into a separate library called libp2p. libp2p is a thin veneer over a wealth of modules that interface well with each other.

Implementations:
- [go-libp2p](https://github.com/ipfs/go-libp2p)
- [js-libp2p](https://github.com/ipfs/js-libp2p)


libp2p may in fact be _the bulk_ of an ipfs implementation. the rest is very simple.

## Core Pieces

### IPLD

IPLD is the format for IPFS objects, but it can be used outside of ipfs (hence a module). Its layered on top of multihash and multicodec, and provides the heart of ipfs: the merkledag.

Implementations:
- [go-ipld](https://github.com/ipfs/go-ipld)
- [js-ipld](https://github.com/ipfs/js-ipld)

### IPRS

IPRS is the record system for IPFS, but it can be used outside of ipfs (hence a module). This deals with p2p system records -- it is also used by libp2p.

Implementations:
- [go-iprs](https://github.com/ipfs/go-iprs)
- [js-iprs](https://github.com/ipfs/js-iprs)

### IPNS

IPNS provides name resolution on top of IPRS -- and a choice of record routing system.

### IPFS-Repo

The IFPS-Repo is an IPFS Node's "local storage" or "database", though the storage may not be in a database nor local at all (e.g. s3-repo). There are common formats so that multiple implementations can read and write to the same repos. Though today we only have one repo format, more are easy to add so that we can create IPFS nodes on top of other storage solutions.

Implementations:
- [go-ipfs-repo](https://github.com/ipfs/go-ipfs-repo)
- [go-ipfs-repo-fs](https://github.com/ipfs/go-ipfs-repo/fs) - in filesystem
- [go-ipfs-repo-s3](https://github.com/ipfs/go-ipfs-repo/s3) - in amazon s3
- [js-ipfs-repo](https://github.com/ipfs/js-ipfs-repo)
- [js-ipfs-repo-fs](https://github.com/ipfs/js-ipfs-repo/fs) - in filesystem
- [js-ipfs-repo-browser](https://github.com/ipfs/js-ipfs-repo/browser) - in local storage

## IPFS Core

The Core of IPFS is an interface of functions layered over all of the other pieces.

### IPFS Node

The IPFS Node is an entity that bundles all the other pieces, and implements the interface (below). In its most basic sense, an IPFS node is really just:

```go
type ipfs.Node struct {

Config // has a configuration
repo.Repo // has a Repo for storing all the local data
libp2p.Node // has an embedded libp2p.Node, and thus a peer.ID, and keys.
dag.Store // has a DAG Store (over the repo + network)

}
```

IPFS itself is very, very simple. The complexity lies within `libp2p.Node` and how the different IPFS commands should run depending on the `libp2p.Node` configuration.

### IPFS Node Config

IPFS Nodes can be configured. The basic configuration format is a JSON file, and so naturally converters to other formats can be made. eventually, the configuration will be an ipfs object itself.

The config is stored in the IPFS Repo, but is separate because some implementations may give it knowledge of other packages (like routing, http, etc)

### IPFS Interface or API

The IPFS Interface or API (not to be confused with the IPFS HTTP API) is the set of functions that IPFS Nodes must support. These are classified into sections, like _node, network, data, util_ etc.

The IPFS Interface can be implemented:
- as a library - first and foremost
- as a commandline toolchain, so users can use it directly
- as RPC API, so that other programs could use it
- over HTTP (the IPFS HTTP API)
- over unix domain sockets
- over IPC

One goal for the core interface libraries is to produce an interface that could operate on a local or a remote node. This means that, for example:

```go
func Cat(n ipfs.Node, p ipfs.Path) io.Reader { ... }
```
should be able to work whether `n` represents a local node (in-process, local storage), or a remote node (over an RPC API, say HTTP).

_**For now, i list these from the commandline, but the goal is to produce a proper typed function interface/API that we can all agree on.**_

#### Node Commands

These are the for the node itself.

- ipfs init
- ipfs config
- ipfs repo
- ipfs repo gc
- ipfs stats
- ipfs diag

#### Data Commands

- ipfs block
- ipfs object
- ipfs {cat, ls, refs}
- ipfs pin
- ipfs files
- ipfs tar
- ipfs resolve

#### Network Commands

These are carried over from libp2p, so ideally the libp2p implementations do the heavy lifting here.

- ipfs id
- ipfs ping
- ipfs swarm
- ipfs exchange
- ipfs routing
- ipfs bitswap
- ipfs bootstrap

#### Naming commands

These are carried over from IPNS (can make that its own tool/lib).

- ipfs dns
- ipfs name

#### Tool Commands

- ipfs log
- ipfs update
- ipfs version
- ipfs tour
- ipfs daemon

## IPFS Datastructures and Data Handling

There are many useful datastructures on top of IPFS. Things like `unixfs`, `tar`, `keychain`, etc. And there are a number of ways of importing data -- whether posix files or not.

### IPLD Data Importing

Importing data into IPFS can be done in a variety of ways. These are use-case specific, produce different datastructures, produce different graph topologies, and so on. These are not _strictly_ needed in an IPFS implementation, but definitely make it more useful. They are really tools on top of IPLD though, so these can be generic and separate from IPFS itself.

- graph topologies - shape of the graphs
- balanced - dumb, dead simple
- trickledag - optimized for seeking
- live stream
- database indices
- file chunking - how to split a continuous stream/file
- fixed size
- rabin fingerprinting
- format chunking (use knowledge of formats, e.g. audio, video, etc)
- special format datastructures
- tar
- document formats - pdf, doc, etc
- audio and video formats - ogg, mpeg, etc
- container and vm images
- and many more

### `unixfs` datastructure

It's worth mentioning the `unixfs` datastructure, as it provides support for representing unix (posix) files in ipfs. It's simple, but powerful And it is first class, in that several basic commands make use of it.

### Interesting Data Structure questions

**interfacing with a variety of data structures**

We are still figuring out good ways to make all the different data structures play well with various commands -- there is some complexity when it comes to implementing things like `ipfs cat` -- it currently outputs the data of a `unixfs.File`, but it could do something for other graph objects too. Ideally, we could figure out common ways of making this work, If you have ideas, please discuss.

**graph mapping**

Sometimes one graph maps to another, for example a unixfs graph shards big files and big directories into smaller units and transparently presents them to the user for commands such as `ipfs cat` and `ipfs ls`.

**mixing data structures**

Some data structures are meant to be interspersed with others, meaning that they provide meaning to arbitrary things. One example is a `keychain.Signature` a cryptographic signature on any other object. Another example is a `versioning.Commit` which represents a specific revision in a version history over any other object. It is still not entirely clear how to build nice tooling that handles these transparently.