Skip to content
This repository has been archived by the owner on Mar 10, 2020. It is now read-only.

RFC: unite the files APIs #552

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Conversation

alanshaw
Copy link
Contributor

@alanshaw alanshaw commented Nov 5, 2019

Not finished yet!

Copy link
Contributor

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like these changes!


#### 1.2 Changes to returned values

1. Importing a single file will now yield two entries, one for the imported file and one for the containing directory. Note this change can be considered almost backwards compatible; in the current API you'd receive an array of one value which you would access like `files[0]`. If you collect the entries in the new API you'd still access it like that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: in terms of backwards compat, I'd like to just switch to a new API version and create a shim to maintain support for v0.

#### 1.2 Changes to returned values

1. Importing a single file will now yield two entries, one for the imported file and one for the containing directory. Note this change can be considered almost backwards compatible; in the current API you'd receive an array of one value which you would access like `files[0]`. If you collect the entries in the new API you'd still access it like that.
2. Instead of a `hash` property, entries will instead have a `cid` property. In entries yielded from core it will be a CID instance, not a string (as agreed in [ipfs/interface-js-ipfs-core#394](https://github.com/ipfs/interface-js-ipfs-core/issues/394)). In the HTTP API/CLI it will necessarily be a string, encoded in base32 by default or whatever `?cid-base`/`--cid-base` option value was requested.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to consider returning namespaced paths instead of CIDs:

  1. In theory, it allows us to import into other filesystem formats (e.g., swarm, git?, etc.).
  2. If we support embedding small files in directory entries, not all files will have CIDs (unixfs 2.0 should, ideally, support that).

| `ipfs stat` | ✅ | ✅ |
| `ipfs write` | ❌ | ✅ |

The `/ipfs` directory in MFS problem can simply be avoided by either assuming IPFS path (the current solution) or by denying writes to a directory of this name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to join mfs with everything else, I'd like to namespace MFS under something like /local or `/files. That way, we can continue adding new namespaces without clobbering user-defined directories.

  • /ipfs
  • /ipns
  • /ipld?
  • /p2p?? (might be nice to expose this as sockets when mounted with fuse).
  • /git/Cid/... ???

Copy link
Contributor

@lidel lidel Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to have clean namespacing to not worry about edge cases and vulnerabilities related to that.

I'd go with /local because

  • no cognitive overhead
  • it emphasizes data being "local to the node"
  • things named localhost and .local already exist in other systems, and familiarity reduces cognitive overhead


The file system APIs will be streaming by default. Due to the way we store and retrieve data it makes sense for our API methods to stream content when retrieving it locally or over the network. Buffering APIs can cause OOM issues, give no feedback to the user on progress and they can be trivially wrapped to collect all items in order to achieve the same effect as a buffering API.

Streaming APIs will use a language native / standard library feature that is supported in all runtimes that IPFS is actively targeting. This prevents bloat and by only supporting one streaming mechanism it reduces API surface area.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Let's also stop streaming bytes. If we switch to only streaming objects, we can stream metadata (progress, etc.).

| `ipfs files read` | `ipfs read` |
| `ipfs files rm` | `ipfs rm` |
| `ipfs files stat` | `ipfs stat` |
| `ipfs files write` | `ipfs write` |
Copy link
Contributor

@lidel lidel Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ this proposal, and have only one concern:

While I like the symmetry of import / export, I have a problem with write if it uses different DAG builder than import, because people will continue to be confused why file added with import has different CID than write.

Current ipfs add creates balanced dag, while ipfs files write creates trickle-dag that supports adding data at the end, so is more like append operation.

Loose idea: does it make sense to have write which creates the same DAG as import and additional append command that is a special version of write that only adds data at the end and builds a trickle dag?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The js files.write implementation takes a strategy parameter that can be balanced or trickle (because ipfs.add and ipfs.files.write both use the same unixfs-importer which receives the parameter).

If you files.write a file, it'll get converted into a trickle DAG, unless you pass strategy=balanced in which case it'll convert it into a balanced DAG (rebalancing as necessary).

I think we could standardise on one and allow the user to override as they see fit.

If we could then extend the UnixFS v1.5 metadata proposal to include the originally selected DAG strategy we'd be able to optimise for appending to trickle DAGs.

@daviddias
Copy link
Contributor

YES! ❤️ Will review more carefully soon. Prev work for context to others at ipfs/specs#98 && https://github.com/ipfs/interface-js-ipfs-core/issues/284


Adding files to MFS removes any performance overhead of creating/maintaining pinset DAG nodes, unburdens the user from understanding pinning (for the most part), improves visibility of added files and makes it significantly easier to find or remove files that were previously imported.

The `ipfs import` command _optionally_ takes a MFS path option `--dest`, a directory into which imported files are placed. Note the destination directory is automatically created (but not any parents). If the destination directory exists already then an error is thrown, unless the `--overwrite` flag is set. This causes any existing files with the same name as the imported files to be overwritten.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small point, but maybe --force instead of --overwrite? Feels a bit more unixy.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants