Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add normalise input function #5

Merged
merged 3 commits into from
Sep 4, 2019
Merged

feat: add normalise input function #5

merged 3 commits into from
Sep 4, 2019

Conversation

achingbrain
Copy link
Member

@achingbrain achingbrain commented Aug 31, 2019

This PR follows on from this comment and allows an input normalisation function to be shared between ipfs and the http client.

All supported types are tested, at the moment they are (from the test output):

  normalise-input
    String
      ✓ String
      ✓ { path: '', content: String }
      ✓ Iterable<{ path: '', content: String }
      ✓ AsyncIterable<{ path: '', content: String }
      ✓ PullStream<{ path: '', content: String }
    Buffer
      ✓ Buffer
      ✓ Iterable<Buffer>
      ✓ AsyncIterable<Buffer>
      ✓ PullStream<Buffer>
      ✓ { path: '', content: Buffer }
      ✓ { path: '', content: Iterable<Buffer> }
      ✓ { path: '', content: AsyncIterable<Buffer> }
      ✓ { path: '', content: PullStream<Buffer> }
      ✓ Iterable<{ path: '', content: Buffer }
      ✓ AsyncIterable<{ path: '', content: Buffer }
      ✓ PullStream<{ path: '', content: Buffer }
      ✓ Iterable<{ path: '', content: Iterable<Buffer> }>
      ✓ Iterable<{ path: '', content: AsyncIterable<Buffer> }>
      ✓ AsyncIterable<{ path: '', content: Iterable<Buffer> }>
      ✓ AsyncIterable<{ path: '', content: AsyncIterable<Buffer> }>
      ✓ PullStream<{ path: '', content: PullStream<Buffer> }>
    Blob
      ✓ Blob
      ✓ { path: '', content: Blob }
      ✓ Iterable<{ path: '', content: Blob }
      ✓ AsyncIterable<{ path: '', content: Blob }
      ✓ PullStream<{ path: '', content: Blob }
    Iterable<Number>
      ✓ Iterable<Number>
      ✓ { path: '', content: Iterable<Number> }
      ✓ Iterable<{ path: '', content: Iterable<Number> }
      ✓ AsyncIterable<{ path: '', content: Iterable<Number> }
      ✓ PullStream<{ path: '', content: Iterable<Number> }
    TypedArray
      ✓ TypedArray
      ✓ Iterable<TypedArray>
      ✓ AsyncIterable<TypedArray>
      ✓ PullStream<TypedArray>
      ✓ { path: '', content: TypedArray }
      ✓ { path: '', content: Iterable<TypedArray> }
      ✓ { path: '', content: AsyncIterable<TypedArray> }
      ✓ { path: '', content: PullStream<TypedArray> }
      ✓ Iterable<{ path: '', content: TypedArray }
      ✓ AsyncIterable<{ path: '', content: TypedArray }
      ✓ PullStream<{ path: '', content: TypedArray }
      ✓ Iterable<{ path: '', content: Iterable<TypedArray> }>
      ✓ Iterable<{ path: '', content: AsyncIterable<TypedArray> }>
      ✓ AsyncIterable<{ path: '', content: Iterable<TypedArray> }>
      ✓ AsyncIterable<{ path: '', content: AsyncIterable<TypedArray> }>
      ✓ PullStream<{ path: '', content: PullStream<TypedArray> }>

Can you think of any others? Could add tests for browser-centric typed arrays I guess?

Some notes:

  • We will now support ipfs.adding a bunch of new types for content such as strings, arrays full of numbers for a slightly more human-friendly API.
  • I didn't include Pull Streams because in the brave new future the calling code should transform them before passing it to the core ipfs.add.
  • I've tried to avoid Buffer.from unless its absolutely necessary because it involves copying buffers which is slow.
  • Detecting Iterable<Number> involves peeking at the iterable and is a little fragile as it assumes the iterable is full of the same type
  • Maybe at some point in the future we should revisit this and convert everything to BufferLists to avoid the Buffer.from copy problem

I haven't integrated this with ipfs yet, just putting this up for early feedback.

@achingbrain achingbrain added the status/in-progress In progress label Aug 31, 2019
@achingbrain achingbrain force-pushed the normalise-input branch 3 times, most recently from 103e999 to d46ba26 Compare August 31, 2019 09:20
Allows input normalisation function to be shared between ipfs and the
http client.
function toFileObject (input) {
return {
path: input.path || '',
content: toAsyncIterable(input.content || input)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both path and content are optional - i.e. you can pass { path: '/path/to/dir' } so this won't work in this case.


// Iterator<?>
if (input[Symbol.iterator]) {
if (!isNaN(input[0])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right check here? When would you pass an [NaN]?

Suggested change
if (!isNaN(input[0])) {
if (Number.isInteger(input[0])) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've just determined this object is iterable, but that does not mean it's an array and you can access values like input[0]. Strictly speaking you should input[Symbol.iterator]().next().value

Copy link
Member Author

@achingbrain achingbrain Sep 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not is not a number means it's a number 😉 . Of course it'll happily accept floats and negative numbers, etc which would lead to 🤪. Number.isInteger is probably better in this case..

I'd hoped to avoid calling .next() on the iterator in case it consumes a value that otherwise wouldn't be re-iterated - e.g. more of a .peek()-type mechanism. Could probably wrap the iterator in another iterator(!) that does something a bit more sensible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it-peek!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not is not a number means it's a number 😉 . Of course it'll happily accept floats and negative numbers, etc which would lead to 🤪.

😉 ...and other values are are not NaN. isNaN is just the worst. For the benefit of anyone else reading this the following non-numbers would also make it through!:

!isNaN(null) // true
!isNaN('123') // true
!isNaN(' ') // true
!isNaN([]) // true
!isNaN(new Date) // true

}

throw new Error('Unexpected input: ' + typeof chunk)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the test for isBytes(chunk) you could just remove the other tests and return Buffer.from(chunk) since it'll already throw if it doesn't get one of the things you're checking for.

if (input[Symbol.asyncIterator]) {
return (async function * () { // eslint-disable-line require-await
for await (const chunk of input) {
yield toFileObject(chunk)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not support AsyncIterable<Bytes>? I think fs.createReadStream works because the stream it creates has a path property and it is assumed to be a file object earlier in the code, but not all AsyncIterable<Bytes>'s will necessarily have this property.

* AsyncIterable<{ path, content: Blob }>
* AsyncIterable<{ path, content: Iterable<Buffer> }>
* AsyncIterable<{ path, content: AsyncIterable<Buffer> }>
* ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No support for pull streams?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to not include it here. My thinking was that all ipfs.add* methods could route to ipfs._addAsyncIterator, and it would be down to those methods to convert their arguments to something this method accepts. E.g. ipfs.addPullStream would do the pull-stream source to async iterator conversion, ipfs.addFromFs would convert glob-matched paths to async iterators of file objects with paths & content readable streams, etc.

I'm working on integrating that with ipfs and ipfs-http-client so we'll see how realistic that is..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds nice, but you can call ipfs.add({ content: PullStreamSource }) so I think it's necessary, unless we want to duplicate a lot of the normalise work in both places.

* Transform one of:
*
* ```
* Buffer|ArrayBuffer|TypedArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of simplicity and understanding maybe we should setup an alias for this:

Suggested change
* Buffer|ArrayBuffer|TypedArray
* Bytes (Buffer|ArrayBuffer|TypedArray|Iterable<Number>)
* Bloby (Blob|File)

When I see Bytes I think that they are part of a set of bytes for a single file (albeit there may only be one), when I see Bloby I think that this is a whole file.

})()
}

// Iterable<Buffer>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's werid that Iterable<Buffer> means multiple files but AsyncIterable<Buffer> is a single file...

As I mentioned in an earlier comment I reckon we should interpret sets of Bytes as a single file and sets of Blobys as multiple files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - multiple Buffers should be treated as one file, multiple Blobs are multiple files. That's the intention anyway, I guess the comment isn't clear.

Realistically you'd need { path, content } objects for multiple Blobs as they have no path themselves and the importer will fail complaining of multiple roots.

Alternatively we could widen it to accept HTML File objects which have a .name property and use that as the path? The user would still need to pass the wrapWithDirectory arg to the importer though because I think the .name property is supposed to just be the filename, not the path to the file within a directory structure as with the { path, content } object.

Urgh, what fun.

@alanshaw
Copy link
Member

alanshaw commented Sep 1, 2019

In general though I'm super happy to see a normalise function that actually defines and deals with all the cases properly. 🚀

return typeof obj === 'object' && (obj.path || obj.content)
}

function blobToAsyncGenerator (blob) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract as a module blob-to-it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just mkdir iterables && touch blob-to-it.js in this repo for less overhead

* feat: support pull streams

This PR updates the `normaliseInput` function to accept pull streams.

I've also made the following changes:

1. Update the docs for supported inputs
  * `Buffer|ArrayBuffer|TypedArray` is aliased as `Bytes`
  * `Blob|File` is aliased as `Bloby`
  * Added info for what a input "means" i.e. causes single/multiple files to be added
1. Peek the first item of an (async) iterator properly
1. Move file object check below `input[Symbol.asyncIterator]` check because Node.js streams have a path property that will false positive the `isFileObject` check
1. Fix `toFileObject` to allow objects with no `content` property
1. Simplify `toBuffer` to remove checks that `Buffer.from` already does

License: MIT
Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>

* fix: tests

License: MIT
Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>
async function * readBlob (blob, options) {
options = options || {}

const reader = new global.FileReader()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const reader = new global.FileReader()
const reader = new globalThis.FileReader()

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

}

function isBloby (obj) {
return typeof Blob !== 'undefined' && obj instanceof global.Blob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return typeof Blob !== 'undefined' && obj instanceof global.Blob
return typeof globalThis.Blob !== 'undefined' && obj instanceof globalThis.Blob

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

}

// PullStream<?>
if (typeof input === 'function') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let BLOB

if (supportsFileReader) {
BLOB = new global.Blob([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BLOB = new global.Blob([
BLOB = new globalThis.Blob([

and require https://github.com/ipfs/js-ipfs-utils/blob/master/src/globalthis.js at the top

License: MIT
Signed-off-by: Alan Shaw <alan.shaw@protocol.ai>
@hugomrdias
Copy link
Member

is this ready to be merged ?

@achingbrain
Copy link
Member Author

do-it

@hugomrdias hugomrdias merged commit b22b8de into master Sep 4, 2019
@hugomrdias hugomrdias deleted the normalise-input branch September 4, 2019 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants