-
Notifications
You must be signed in to change notification settings - Fork 37
Conversation
@lidel Yes. If not it's a bug. |
@Gozala proposes a different (simpler) API: https://gozala.hashbase.io/posts/Constraints%20of%20an%20API%20design/ It really has good points and I think we should follow this route and change the API outlined here to the proposed one. As a start we won't implement any batching yet, but just do the single item commands. |
Only operating on individual
Since a node has a single
Also yes, otherwise we're into non-deterministic territory. Again, hard to express in code but a documentation note should be sufficient?
Our datastore is immutable so rolling back puts after a failure seems a little weird, but I guess there's the I suppose you might want to control when and where to abort based on error conditions (e.g. abort after failure or continue after failure), but then you can drop down to the version without syntactic sugar: const iter = ipld.put(someIterator)
while (true) {
try {
const { done, value } = await iter.next()
// do something
} catch (err) {
// abort, continue, revert - it's up to you
}
} |
One thing:
Shouldn't this be an iterable of nodes? Also, why not an async iterable so we don't have to hold the whole list in memory? |
It is nodes. The README is already right, the commit message is wrong. And it also can be an async iterable (the code has support for it). |
@achingbrain thanks for going through questions I’ve mentioned. As I later I attempted to point out those questions are symphtom of incidental complexity. I am huge proponent of batching optimization, but again if I were to distill that post into two centences - batching API needs acareful design, optimized towards it’s user rather than imposing solution optimized for internal batching on all users. Argument I’m trying to make that while (async) iterators seem neat they’re added complexity for non-batch operations and are far from good even for batched operations. They might be adequate for single use case of streaming chunked ops and making that default interface for everyone is just makes all other uses prone to error. In other words if batching is important use case let’s deaign separate API for that |
I think you're misunderstanding my question here, I was not talking about rolling back operations (which in theory could also fail) but rather that if you I also know the answer, what I was trying to illustrate that those questions will arise and that is because in some use cases you'd want one behavior and in other the other. |
My proposal would be not to try and optimize API for batching without having support for that in first place. Once support for batching is being worked lets craft API that takes into account constraints and optimization opportunities into account as well as enables consumers to choose optimizations strategies, failure modes and some & mechanism to recovery (and maybe even unroll). |
I’m starting to get very confused about who the consumer of this API is. We’re making good improvements but as we start to narrow in on each one the expectations from the consumer of the API start to matter. It seems like we’re trying to create a new, cleaner, layering scheme but we’re also shoving all of those new layers into a single API. Right now, this single interface is responsible for:
If these were separate libraries/layers the interfaces between them would be much more obvious, but having them all together makes it unclear to me what the consumer wants to be doing. However, because the old interface combined all of this functionality it would be a much bigger shift to change it too much at a time when If the goal here is just to have a replacement for the current interface that uses async/await so that we can continue to make progress on the migration in IPFS then let’s just merge and land this as soon as possible. I don’t think it’s productive to have larger meta conversations about the ideal interface while we also try to make the migration for We can come back to better interfaces for IPLD generally after this lands. |
No it's more than this. It's a major clean-up and a major breaking change. As there are more projects that are using js-ipld than js-ipfs, it would be a bad if we would breaking the API a lot. The current implementation of the new API quite trivial. We don't do any fancy optimizations, it's really all sequential. Taking @Gozala concerns into account, the only change would be to make the So if we do this change now (chaning from "multi" to "single" operations), we can talk about all the batching/requesting several items at the same time later on, without breaking any of the API again. |
It is worth pointing out that current put / get API is unary, so if anything I’m proposing to not break that API without a really good reason |
This is not strictly true, there's .getMany(cids, callback) which IPFS makes use of and which we will lose if we switch to only one-in-one-out. The reason we use this is to not swamp the DHT (when it arrives in the next js-ipfs release).
It does for fs manipulations (e.g. copying/moving files, etc) but when reading and writing whole files it uses the unixfs exporter/importer. The exporter uses Using iterables for single-node operations is a bit clunky - can't we just have both?
In the case of errors, from an IPFS point of view it's best to fail quickly and give the user a useful error message so the default behaviour of exploding halfway through an iteration and stopping is fine. If you want fancy error handling strategies you can just fall back to the non-syntactic sugar version as I suggested earlier. If the syntax offends, you can just wrap it in a module and publish it. |
We also lose it in the current implementation I did for this API as it was too hard to do with the current callback based interface of the underlying libraries. So instead of implementing that properly I could invest my time on doing the batch API.
One of my goals is keeping the API surface small. I don't think the clunky single node operation justifies adding new 3 new API calls (get, put, remove).
One more question here. The current implementation would throw if there was an error. What would you do if you want to continue with the rest of the nodes? Would it be an option in that it returns errors instead of throwing them? PS: @achingbrain It is not clear from my replies, but you've good points here. |
You mean that the implementation of
Not sure I agree - I think the improved DX makes it worthwhile. The fact that the implementation adds non-standard
From an IPFS point of view It's hard to see why you'd want that. Imagine writing a 10TB file and the first node fails. You're not going to want to wait for the last node to be written to tell the user - Hey I'm done, by the way not all of your file made it kthxbye! I suppose you could retry the failed nodes - if it became a requirement and there was no API available I'd just use the non-syntactic sugar version and do my own error handling. |
Good point.
I'm talking from an IPLD point of view here.
I don't think you can resume an iterator that throws. Can you? Though you could manually keep track of the things you've already processed and then skip the one that failed. This is quite painful (as @Gozala explanations show), though I'm not sure how much of a problem that really is as it is certainly still possible somehow. |
Yes, if you don't use 'use strict'
const iterator = {
[Symbol.asyncIterator] () {
return {
i: -1,
values: [0, 1, 2, 3, 4],
next () {
this.i++
if (this.i === this.values.length) {
return Promise.resolve({ done: true })
}
if (this.i === 2) {
throw new Error('Nerp')
}
return Promise.resolve({ done: false, value: this.values[this.i] })
}
}
}
}
async function surviveErrors () {
console.info('--> Will continue after error')
const iter = iterator[Symbol.asyncIterator]()
while (true) {
try {
const { done, value } = await iter.next()
console.info('done', done, 'value', value)
if (done) {
return
}
} catch (err) {
console.error('got an error', err)
}
}
}
async function exitOnError () {
console.info('--> Will not continue after error')
try {
for await (const value of iterator) {
console.info('value', value)
}
} catch (err) {
console.error('got an error', err)
}
}
async function main () {
await surviveErrors()
await exitOnError()
}
main() $ node test.js
--> Will continue after error
done false value 0
done false value 1
got an error Error: Nerp
at Object.next (/Users/alex/Desktop/test.js:16:17)
at surviveErrors (/Users/alex/Desktop/test.js:32:42)
at process._tickCallback (internal/process/next_tick.js:68:7)
at Function.Module.runMain (internal/modules/cjs/loader.js:745:11)
at startup (internal/bootstrap/node.js:283:19)
at bootstrapNodeJSCore (internal/bootstrap/node.js:743:3)
done false value 3
done false value 4
done true value undefined
--> Will not continue after error
value 0
value 1
got an error Error: Nerp
at Object.next (/Users/alex/Desktop/test.js:16:17)
at exitOnError (/Users/alex/Desktop/test.js:49:28)
at process._tickCallback (internal/process/next_tick.js:68:7)
at Function.Module.runMain (internal/modules/cjs/loader.js:745:11)
at startup (internal/bootstrap/node.js:283:19)
at bootstrapNodeJSCore (internal/bootstrap/node.js:743:3) |
I am very much advocating for having unary operations and having a separate batch operations.
I think size of API surface is far less important than a good API. |
For some reason there is a bias towards (async) iterators which I really don't understand. They introduce incidental complexity for both batch on unary operations and require user to keep track of additional local state. As I illustrated in the post you can have far better representation for batch operations that remove need for keeping track of state, pairing inputs with outputs and even can offer I would really like to know what are the basis for choosing async iterators ? |
Please consider that not every one is an expert and if you're API precludes that, that's a bad smell. Best APIs are the one that don't let you plug square peg into round hole |
The reason is that at lot of IPFS/IPLD APIs are using pull-streams and async iterators are a good replacement for that. |
@vmx what's the strategy for moving forward on this? FWIW I agree with @Gozala's analysis, on top of which I'd add learnability and usability concerns. Consider the code and explanation overhead in the proto.school lessons. They'd be moving from teaching what If there's concern about burdening downstream consumers (ipfs) with having to build their own abstractions to get iterable-like functionality, just bundle those use-cases in a utility module or as helper functions (or ideally as entirely separate packages) and leave the basic API clean and singular. |
My current plan would be to add single and multi versions. So we would have:
This way, we have a simple interface, for single operations (like @Gozala suggested). We also need Implementation wise, the logic will be put into the single item versions and the "many"-versions will call into those. As a start it will be just sequential operations. But it leaves it open for the future to have more optimised "many"-calls. How does that sound @achingbrain @alanshaw @Gozala @mikeal @rvagg? |
@vmx sounds good. I’ll note that, long term, we’ll still end up with more usage of async iterators than straightforward single methods because even simple mutations of a graph will produce multiple block changes. |
@vmx I think it's a good compromise. Longer term I would still encourage to design a batch operations tailored for specific use cases instead of generic async iterator. |
I still think that less is best for core APIs, particularly when they are trivial to build on top of outside of the core offering. The additional *Many assumes you know enough about all the eventual multiple-item use-cases (as @Gozala has pointed out). But since you all know more about the actual use-cases than I do so I'll defer to your better judgement! |
I've updated the document. Please see the commit message as well as the new README. There's now also Can I please get an approval from @achingbrain @alanshaw @Gozala @mikeal @rvagg |
README.md
Outdated
- [`options.formats`](#optionsformats) | ||
- [`options.loadFormat(codec)`](#optionsloadformatcodec) | ||
- [`.put(node, format, options)`](#putnode-format-options) | ||
- [`.putMany(node, format, options)`](#putmanynode-format-options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably node
should be plural putMany(nodes, format, options)
README.md
Outdated
Returns a Promise with the CID of the serialized IPLD Node. | ||
|
||
|
||
### `.putMany(node, format, options)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/node/nodes/
README.md
Outdated
|
||
> Stores the given IPLD Nodes of a recognized IPLD Format. | ||
|
||
- `nodes` (`Iterable<Object>`): deserialized IPLD nodes that should be inserted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I recall it was also accepted AsyncIterable<a>
which is worth mentioning here if still the case.
- `hashAlg` (`multicodec`, default: hash algorithm of the given multicodec): the hashing algorithm that is used to calculate the CID. | ||
- `cidVersion` (`number`, default: 1): the CID version to use. | ||
- `onlyHash` (`boolean`, default: false): if true the serialized form of the IPLD Node will not be passed to the underlying block store. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth specifying what happens if error occurs in the middle of the iteration.
|
||
- `iter.first()`: Return the first item only | ||
- `iter.last()`: Return the last item only | ||
- `iter.all()`: Return all items as array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it this is confusing way to communicate. I would suggest to just define a type / class e.g Batch
and say that it implements async iteration interface along with methods first
, last
, all
.
That would make communication clear:
- It's clear that
putMany
takes async iterable, while it does returnBatch
, otherwise it leaves reader wondering if whetherputMany
should also be provided async iterable that has those extensions. Batch
could be exposed and could haveBatch.from(asyncIterable)
in case users need to create own API compatible instances without needing to do it on their own.- In the future
Batch
API can be further extended with other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #196 as kind of a "catch all" issue for batch API related things.
README.md
Outdated
Example: | ||
|
||
```js | ||
const result = ipld.get([cid1, cid2]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be getMany
> Stores the given IPLD Node of a recognized IPLD Format. | ||
|
||
- `node` (`Object`): the deserialized IPLD node that should be inserted. | ||
- `format` (`multicodec`, required): the multicodec of the format that IPLD Node should be encoded in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is general question / feedback:
I find passing name / code of the codec counter intuitive & I think just providing a codec instance might be better for following reasons:
- Type checkers / linters can catch mismatching / invalid references, but are useless with names.
- If codec has not being installed you get an error, while with say
dag.put(node, dag.codec[name])
you are encourage to see if format is known. - Don't necessarily need to install codec to use it.
I understand that without installing a codec / format decode
will not work, however get
could optionally take codec as well. In case of putMany / getMany
it's less clear but maybe optional codec dict can be provided as in {[name]:Codec}
.
Not a big deal, just in my code I find it more convenient to pass actual codecs than names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #197 so that we don't lose that idea.
|
||
### `.getMany(cids, callback)` | ||
### `.tree(cid, [path], [options])` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: I find tree
to be a confusing name especially since it just returns paths. I think paths
or list
would have being far more clear.
I also kind of wish there was a way to traverse without following every branch .traverse(cid, path, {follow:path => boolean, order:path[] => path[] })
that way traverse(cid, path)
is shallow, and .traverse(cid, path, {follow:Boolean})
is recursive, but it also provides an opportunity to avoid traversing irrelevant paths. As of order
it can provide a way to choose traversal scheme e.g. depth-first could be as simple as returning path with most path components first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted into #198.
The whole IPLD APIs get a review to make them more consistent and easier to use. Changes to compared to the current spec: - No more callbacks, everything is Promises. - `get()`: - Is renamed to `resolve()` to indicate that it also takes a mandatory path argument. - Doesn’t have a `cid` in the return value anymore. That use case is covered by returning all objects that were traversed. Their `value` will always be the CID of the next one to traverse. So if you want to know the CID of the current IPLD Node, just look at the `value` of the previously returned IPLD Node. - Doesn’t return a single value, but an iterator. - `localResolve` option is removed. If you want to resolve a single IPLD Node only, just stop the iterator after the first item. - `getStream()` is removed without replacement. Use `resolve()` which uses async iterators instead. - `getMany()` takes an iterable as input now. - `put()`: - Doesn’t take `cid` as an option anymore. The CID is always calculated (#175). - The options don’t take the `format` (which is a string), but the `codec` (which is a `multicodec`) (#175). - the `cidVersion` option always defaults to `1`. - `.treeStream()` is renamed to `.tree()` and returns an async iterator. - `.support.add()` is renamed to `.addFormat()`. - `.support.rm()` is renamed to `.removeFormat()`. - `options.loadFormat()` is no longer callback based but is using an async function. - `putMany()` and `removeMany()` are introduced which both take an iterable as input. Almost all open issues in regards to the IPLD API got adressed. One bigger thing is the Patch API, which also isn’t part of the current specification. Here’s an overview of the issues: - #66 - [ ] IPLD Resolver `patch` API: There is no patch API yet - #70 - [x] lazy value lookups: Can be done as IPLD Formats is pure JavaScript - [x] get external / local paths only: @vmx doesn’t know what this is, considers it a “won’t fix” - [x] toJSON + fromJSON - roundtrip: A roundtrip is not a goal anymore => won’t fix - [ ] put + patch api #66: Patch API is still missing - [x] text summary: @vmx doesn’t see a strong use case for this => “won’t fix” - [x] globbing: Out of scope for js-ipld - #175 - [x] Deprecate passing a CID to `ipld.put`?: Current spec doesn’t allow `put()` with a CID - #182 - [x] API Review: Links to many other issues, I list the individual issues below - #183 - [x] getting the merkle proof with resolver.resolve: `resolve()` returns all CIDs along the traversed path - #184 - [ ] Pass down the `options` object to resolver.resolve(): This needs a custom resolver. @vmx isn’t sure if the js-ipld API should make this possible, or of it should just be easy enough to create your own resolver. - https://github.com/ipfs/interface-ipfs-core/issues/81 - [x] The `dag` API - One API to manipulate all the IPLD Format objects: Interesting discussion, but not relevant anymore. - ipfs-inactive/interface-js-ipfs-core#121 - [x] dag api: add {ls, tree}: `tree()` is not part of js-ipld anymore as the IPLD Nodes are just JavaScript objects which you can easily traverse if you like. Though `tree()`-like functionality might be added as an example or separate module. - ipfs-inactive/interface-js-ipfs-core#125 - [x] `dag get --localResolve` vs `dag resolve`: This is solved by the new `resolve()` API - ipfs-inactive/interface-js-ipfs-core#137 - [x] add `level` option to ipfs.dag.tree: `tree()` is not part of js-ipld anymore. It should be considered if `tree()` is implemented (probably as an example or separate module) Closes #70, #175, #182, #183 Closes ipfs-inactive/interface-js-ipfs-core#121 Closes ipfs-inactive/interface-js-ipfs-core#125 Closed ipfs-inactive/interface-js-ipfs-core#137
Thanks a lot @Gozala for the valuable comments. I opened a few issues based on your comments so that we discuss them outside this PR and hopefully get this one merged (together with the implementation) soon. |
The whole IPLD APIs get a review to make them more consistent and
easier to use.
Changes to compared to the current spec:
get()
:resolve()
to indicate that it also takes a mandatorypath argument.
cid
in the return value anymore. That use case iscovered by returning all objects that were traversed. Their
value
will always be the CID of the next one to traverse. So if you want
to know the CID of the current IPLD Node, just look at the
value
of the previously returned IPLD Node.
localResolve
option is removed. If you want to resolve a single IPLDNode only, just stop the iterator after the first item.
getStream()
is removed without replacement. Useresolve()
which usesasync iterators instead.
getMany()
is renamed toget()
:put()
:cid
as an option anymore. The CID is always calculated(Deprecate passing a CID to
ipld.put
? #175).format
(which is a string), but thecodec
(which is a
multicodec
) (Deprecate passing a CID toipld.put
? #175).cidVersion
option always defaults to1
..treeStream()
is renamed to.tree()
and returns an async iterator.remove()
:.support.add()
is renamed to.addFormat()
..support.rm()
is renamed to.removeFormat()
.options.loadFormat()
is no longer callback based but is using an asyncfunction.
Almost all open issues in regards to the IPLD API got adressed. One bigger
thing is the Patch API, which also isn’t part of the current specification.
Here’s an overview of the issues:
patch
API #66patch
API: There is no patch API yetconsiders it a “won’t fix”
=> won’t fix
patch
API #66: Patch API is still missingipld.put
? #175ipld.put
?: Current spec doesn’t allowput()
with a CIDbelow
resolve()
returnsall CIDs along the traversed path
options
object to resolver.resolve() #184options
object to resolver.resolve(): This needs acustom resolver. @vmx isn’t sure if the js-ipld API should make this
possible, or of it should just be easy enough to create your own
resolver.
dag
API - One API to manipulate all the IPLD Format objects:Interesting discussion, but not relevant anymore.
tree()
is not part of js-ipld anymore asthe IPLD Nodes are just JavaScript objects which you can easily traverse
if you like. Though
tree()
-like functionality might be added as anexample or separate module.
dag get --localResolve
vsdag resolve
ipfs-inactive/interface-js-ipfs-core#125dag get --localResolve
vsdag resolve
: This is solved by the newresolve()
APIlevel
option to ipfs.dag.tree ipfs-inactive/interface-js-ipfs-core#137level
option to ipfs.dag.tree:tree()
is not part of js-ipldanymore. It should be considered if
tree()
is implemented (probablyas an example or separate module)
Closes #70, #175, #182, #183
Closes ipfs-inactive/interface-js-ipfs-core#121
Closes ipfs-inactive/interface-js-ipfs-core#125
Closed ipfs-inactive/interface-js-ipfs-core#137