Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Please make ipfs.files API more ergonomic #2894

Closed
Gozala opened this issue Jan 9, 2018 · 8 comments
Closed

Please make ipfs.files API more ergonomic #2894

Gozala opened this issue Jan 9, 2018 · 8 comments
Assignees

Comments

@Gozala
Copy link
Contributor

Gozala commented Jan 9, 2018

I find ipfs.files API somewhat painful to use here is why:

  1. There are multiple version of cat (catReadableStream, catPullStream) which is fine, except error handling is painful. If ipfs.files.catReadableStream(path) returns a stream, but there is no simple way to know if path is incorrect (for example I was using following /QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme which was failing silently before end up realizing that / in front was causing an issue). One could listen to error events on a stream but that's not ideal in pipelines where you'd prefer actual stream errors to be passe through.

  2. As things stand detection of the content type is pretty complicated as it involves pass through stream etc...

I would propose alternative API to make dealing with this more convenient. I'll write up interface in flow / ts just for clarity. Also even though both defnine Promise<a> I'll use Promise<x, a> where x represents error type.

interface files {
   // ...
  readEntry(EntryPath):Promise<EntryReadError, Entry>
  readFile(EntryPath):Promise<FileReadError, FileEntry>
  readDirectory(EntryPath):Promise<DirectoryReadError, DirectoryEntry>
}

// including the ipfs handler, a cid and a path to traverse to, ie:
//  '/ipfs/QmXEmhrMpbVvTh61FNAxP9nU7ygVtyvZA8HZDUaqQCAb66'
//  '/ipfs/QmXEmhrMpbVvTh61FNAxP9nU7ygVtyvZA8HZDUaqQCAb66/a.txt'
//  'QmXEmhrMpbVvTh61FNAxP9nU7ygVtyvZA8HZDUaqQCAb66/a.txt'
type IPFSPath = string

type EntryPath =
  | CID
  | Buffer // the raw Buffer of the cid
  | string // the base58 encoded version of the cid
  | IPFSPath

type Entry =
  | FileEntry
  | DirectoryEntry

type EntryReadError =
   | InvalidCIDBuffer // incorrectly encoded cid
   | InvalidCIDString // incorecctly encocec cid string
   | EntryNotFound // incorrect / unavailable path

type FileReadError =
  | EntryReadError
  | NotAFileEntry

type DirectoryReadError =
  | EntryReadError
  | NotADirectoryEntry

interface File {
  entryType: "file",
  size: number,
  path: IPFSPath,
  cid(): CID,
  contentType():Promise<IOError, string>, // fetches just enough to detect content type
  buffer():Promise<IOErrror, Buffer>, // Equivalent of cat
  string():Promise<IOError, string>, // await this.toBuffer().toString()
  readableStream():ReadableStream<IOError, Buffer>,
  pullStream():PullStream<IOError, Buffer>
}

interface DirectoryEntry {
  entryType: "directory",
  cid(): CID,
  path: IPFSPath,
  size: number, // number of entries it contains
  array():Promise<IOError, Entry[]>,
  readableStream():ReadableStream<IOError, Entry>,
  pullStream():PullStream<IOError, Entry>
}

Also are there any plans on support standard streams https://streams.spec.whatwg.org/ ? They support asyc iteration https://streams.spec.whatwg.org/ allowing one even nicer way to do IO:

const entry = ipfs.fils.readFile(path)
for await (const chunk of entry.content()) {
  console.log(chunk);
}
@Gozala
Copy link
Contributor Author

Gozala commented Jan 9, 2018

I end up implementing something along those lines for my use case:
https://gist.github.com/Gozala/e52439ba42fcad635667cfacee5d8271

@daviddias
Copy link
Member

Hi @Gozala, thank you for your sharing this feedback with us 🙌🏽

(for example I was using following /QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme which was failing silently before end up realizing that / in front was causing an issue)

This is a bug, it should not fail silently. Would you like to contribute a fix or a failing test for it? We keep the tests for the interface at https://github.com/ipfs/interface-ipfs-core

One could listen to error events on a stream but that's not ideal in pipelines where you'd prefer actual stream errors to be passe through.

We can either throw the errors that come from input validation and/or let an option callback argument for those cases, which is a common pattern from Node.js APIs

One could listen to error events on a stream but that's not ideal in pipelines where you'd prefer actual stream errors to be passe through.

Yeah, agreed. Input validation errors should not come on the stream itself.

As things stand detection of the content type is pretty complicated as it involves pass through stream etc...

This is a separate issue. It is indeed complicated to know what the content type is and as you see by --

const stream = ipfs.files.catReadableStream(data.multihash)
stream.once('error', (err) => {
if (err) {
log.error(err)
return reply(err.toString()).code(500)
}
})
if (ref.endsWith('/')) {
// remove trailing slash for files
return reply
.redirect(PathUtils.removeTrailingSlash(ref))
.permanent(true)
} else {
if (!stream._read) {
stream._read = () => {}
stream._readableState = {}
}
// response.continue()
let filetypeChecked = false
let stream2 = new Stream.PassThrough({ highWaterMark: 1 })
let response = reply(stream2).hold()
pull(
toPull.source(stream),
pull.through((chunk) => {
// Check file type. do this once.
if (chunk.length > 0 && !filetypeChecked) {
log('got first chunk')
let fileSignature = fileType(chunk)
log('file type: ', fileSignature)
filetypeChecked = true
const mimeType = mime.lookup(fileSignature
? fileSignature.ext
: null)
log('ref ', ref)
log('mime-type ', mimeType)
if (mimeType) {
log('writing mimeType')
response
.header('Content-Type', mime.contentType(mimeType))
.send()
} else {
response.send()
}
}
stream2.write(chunk)
}),
pull.onEnd(() => {
log('stream ended.')
stream2.end()
})
--, we use content inspection to guess what type it is. This does not work for all the files and it is a know shortcomming of unixfs.

We do have a WIP unixfs V2 proposal that will add that kind of metadata and more.

Today, you can go around this by wrapping your files in a directory and preserving file extensions, similar to what happens in a regular OS filesystem.

Also are there any plans on support standard streams https://streams.spec.whatwg.org/ ? They support asyc iteration https://streams.spec.whatwg.org/ allowing one even nicer way to do IO:

Haven't looked into that spec recently, what is the best implementation of that spec out there?

By support async iteration you mean, async/wait iteration and not just asynchronous iteration, correct? We currently do not use async/await in js-ipfs and js-libp2p codebases as we support Node.js 6 too.

We host exploration for these kind of discussions. We had two big one Streams and Exposed APIs (#362 + #557) which led to the API we have today.

Given that Node.js 8 is the current LTS and that Node.js 6 is going into Maintenance in April, I think it is a good time to consider the usage of async/await :)

@RangerMauve
Copy link
Contributor

With regards to async iteration, node core is working on adding it to streams, and I'm sure that pull streams will add it eventually, too.

@Gozala
Copy link
Contributor Author

Gozala commented Jan 10, 2018

(for example I was using following /QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme which was failing silently before end up realizing that / in front was causing an issue)

This is a bug, it should not fail silently. Would you like to contribute a fix or a failing test for it? We keep the tests for the interface at https://github.com/ipfs/interface-ipfs-core

Hmm, I don't see any tests am I overlooking something ? Could you provide a pointer to an example ? Also to be absolutely clear, there was error it's just it did not show up because I was passing created stream back to electron's protocol.registerStreamProtocol response handler, which presumably did not know what to do with it. But then again that just illustrates my point.

We can either throw the errors that come from input validation and/or let an option callback argument for those cases, which is a common pattern from Node.js APIs

I think problem I'm trying to point out is there are multiple kinds of errors:

  1. Failing to find a referenced file (invalid path, incorrect encoding, file not found all fall to some degree here).
  2. Failing to read contents of the file.

The problem here is that one component might be dealing with finding a file and other with contents of it. I would prefer if exposed API draw that distinction. If I understand what you wrote correctly you suggest something like this:

interface files {
  catReadableStream(
    path:string,
    callback(error:?EntryReadError, stream:ReadableStream<IOError, Buffer>) => void
  )
}

If so that would be an improvement, although I still think the API proposed in the description has certain advantages. As I mentioned earlier finding a certain file might be responsibility of one component of the app and handling a content of it of the other, my proposal decouples finding a file from decision how to access it's content, meaning I don't have to know if content handler will use readable stream, pull stream or a buffer to access it's content.

In fact I would prefer to take it step further out of these errors InvalidCIDBuffer InvalidCIDString and EntryNotFound arguable only third one should be possible. First two should occur when trying to encode CID in the first place. In other words instead of accepting Buffer as argument I'd suggest taking CIDBuffer instead and creating it could raise InvalidCID errors.

This is a separate issue. It is indeed complicated to know what the content type is and as you see by > , we use content inspection to guess what type it is. This does not work for all the files and it is a know shortcomming of unixfs.

We do have a WIP unixfs V2 proposal that will add that kind of metadata and more.

I understand and I'm glad V2 will have a fix, but my point was changing an interface could greatly reduce this complexity, this is kind of what I end up with:

const path = require("path")
const url = require("url")
const IPFS = require("ipfs")
const mime = require("mime-types")
const fileType = require("file-type")
const unixfs = require("ipfs-unixfs-engine")
const pull = require("pull-stream")
const toStream = require("pull-stream-to-stream")
const peek = require("pull-peek")

class FileSystem {
  constructor(node) {
    this.node = node
    this.ipldResolver = node._ipldResolver
  }
  static normalizePath(path) {
    const path2 = path.startsWith("/ipfs/") ? path.substr(6) : path
    const path3 = path2.endsWith("/")
      ? path2.substr(0, path2.length - 1)
      : path2
    return path3
  }
  entry(ipfsPath) {
    return new Promise((resolve, reject) => {
      const entryPath = FileSystem.normalizePath(ipfsPath)
      const maxDepth = entryPath.split("/").length

      return pull(
        unixfs.exporter(entryPath, this.ipldResolver, { maxDepth }),
        pull.collect((error, entries) => {
          if (error) {
            if (error.message.startsWith("multihash length inconsistent")) {
              reject(new InvalidCID(error))
            } else if (error.message.startsWith("Non-base58 character")) {
              reject(new InvalidCID(error))
            } else {
              reject(new IOError(error))
            }
          } else {
            switch (entries.length) {
              case 0:
                return reject(new EntryNotFound(entryPath))
              case 1: {
                const [node] = entries
                return resolve(new FileEntry(path.dirname(entryPath), node))
              }
              default: {
                const [node, ...nodes] = entries
                return resolve(
                  new DirectoryEntry(this, path.dirname(entryPath), node, nodes)
                )
              }
            }
          }
        })
      )
    })
  }
}

class Entry {
  static from(fs, parentPath, node) {
    if (node.type === "file") {
      return new FileEntry(parentPath, node)
    } else {
      return new DirectoryEntry(fs, parentPath, node, null)
    }
  }
}

class FileEntry {
  constructor(parentPath, node) {
    this.node = node
    this.name = node.name
    this.path = `/ipfs/${parentPath}/${this.name}`
    this.hash = node.hash
    this.type = "file"
  }
  peek() {
    if (this._nodeContent == null) {
      this._fileType = new Promise((resolve, reject) => {
        this._nodeContent = pull(
          this.node.content,
          peek((end, data) => {
            resolve(fileType(data))
          })
        )
      })
    }
    return this
  }

  fileType() {
    return this.peek()._fileType
  }
  contentStream() {
    return toStream.source(this.peek()._nodeContent)
  }
  contentBuffer() {
    return new Promise((resolve, reject) => {
      pull(
        this.peek()._nodeContent,
        pull.collect((error, chunks) => {
          if (error) reject(error)
          else resolve(Buffer.concat(chunks))
        })
      )
    })
  }
}

class DirectoryEntry {
  constructor(fs, parentPath, node, nodes) {
    this.fs = fs
    this.nodes = nodes
    this.node = node
    this.name = node.name
    this.path = path.normalize(`/ipfs/${parentPath}/${this.name}/`)
    this.hash = node.hash
    this.type = "directory"
  }
  async entries() {
    if (this._entries) {
      return this._entries
    } else if (this.nodes) {
      this._entries = this.nodes.map(node =>
        Entry.from(this.fs, this.path, node)
      )
      return this._entries
    } else {
      const directory = await this.fs.entry(this.path)
      this._entries = await directory.entries()
      return this._entries
    }
  }
}

class IOError {
  constructor(error) {
    this.code = 500
    this.reason = error
    this.stack = error.stack
    this.message = error.message
  }
}

class InvalidCID {
  constructor(error) {
    this.code = 404
    this.reason = error
    this.stack = error.stack
    this.message = error.message
  }
}

class EntryNotFound {
  constructor(path) {
    this.code = 404
    this.path = path
    this.stack = new Error().stack
    this.message = this.toString()
  }
  toString() {
    return `Entry \`/ipfs/${this.path}\` not found`
  }
}

Haven't looked into that spec recently, what is the best implementation of that spec out there?

I have not really looked at the implementations but they are available natively in chrome and safari already and we have them behind the flag in spidermonkey. I'm actually surprised node is not just switching to it but then again they don't like native modules either so who knows...

By support async iteration you mean, async/wait iteration and not just asynchronous iteration, correct? We currently do not use async/await in js-ipfs and js-libp2p codebases as we support Node.js 6 too.

I meant this:

const entry = ipfs.fils.readFile(path)
for await (const chunk of entry.content()) {
  console.log(chunk);
}

To be clear I'm not proposing to use async/await nor for/await in js-ipfs codebase but allowing users of js-ipfs to use those features would be extremely valuable, which is what I'd be most interested in.

We host exploration for these kind of discussions. We had two big one Streams and Exposed APIs (#362 + ipfs-inactive/interface-js-ipfs-core#557) which led to the API we have today.

I think I've read through that one. But then again I'm mostly interested in allowing users to consume streams in a standard manner. It is also that I'm working on earlang actor model inspired spidermonkey runtime where actors/threads are async functions that exchange messages over standard streams.

Given that Node.js 8 is the current LTS and that Node.js 6 is going into Maintenance in April, I think it is a good time to consider the usage of async/await :)

I think exposing async/await and for/await friendly APIs is orthogonal to using these features in js-ipfs implementation, not to imply don't.

@Gozala
Copy link
Contributor Author

Gozala commented Jan 10, 2018

With regards to async iteration, node core is working on adding it to streams, and I'm sure that pull streams will add it eventually, too.

That's cool I'm glad it's happening!

It is also worth mentioning that native binary streams do allow off the main thread processing in browser engines and likely some of the standard transformations would be exposed by those means. Not only that if say you have binary stream from server request and you're piping it to another IO stream browser engines are able to do all the data transfer off the main thread. No other non native implementation would be able to accomplish this which is another reason why I'd love js-ipfs to use native streams. Anyway streams discussion probably belongs in a different thread though.

@daviddias daviddias transferred this issue from ipfs/js-ipfs Dec 9, 2018
@lidel
Copy link
Member

lidel commented Dec 9, 2018

FYSA (quoting https://github.com/ipfs/interface-ipfs-core/issues/284#issuecomment-440569314), we are upgrading the Files API in 3 milestones:

There is also an ongoing work to improve error handling in JS-land: #1746

@achingbrain achingbrain transferred this issue from ipfs-inactive/interface-js-ipfs-core Mar 10, 2020
@SgtPooki SgtPooki self-assigned this May 17, 2023
@SgtPooki SgtPooki moved this to 🥞 Todo in js-ipfs deprecation May 17, 2023
@SgtPooki
Copy link
Member

SgtPooki commented May 26, 2023

js-ipfs is being deprecated in favor of Helia. You can #4336 and read the migration guide.

Please feel to reopen with any comments by 2023-06-02. We will do a final pass on reopened issues afterwards (see #4336).

I believe the work documented in this issue should be looked at by @achingbrain and @BigLep prior to us closing it for good.

@SgtPooki SgtPooki assigned achingbrain and unassigned SgtPooki May 26, 2023
@SgtPooki SgtPooki moved this from 🥞 Todo to 🛑 Blocked in js-ipfs deprecation May 26, 2023
@achingbrain
Copy link
Member

Helia has no strong opinions on what the files API should look like, @helia/unixfs is one approach but at the althing @Gozala also volunteered (only slightly under duress) to implement an alternative that uses web streams everywhere, so I look forward to seeing this implemented!

@github-project-automation github-project-automation bot moved this from 🛑 Blocked to ✅ Done in js-ipfs deprecation Jun 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: Done
Development

No branches or pull requests

6 participants