Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: content-type response header hints how to process response #426

Merged
merged 15 commits into from
Feb 16, 2024

Conversation

achingbrain
Copy link
Member

Description

DAG-JSON and DAG-CBOR can have embeded CIDs which are deserialized to CID objects by the @ipld/dag-json and @ipld/dag-cbor modules.

This fixes a bug whereby we were JSON.stringifying them which lead to rubbish being returned from .json().

Notes & open questions

This overwrites the .json method on the returned Response object. Is that bad?

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if necessary (this includes comments as well)
  • I have added tests that prove my fix is effective or that my feature works

@achingbrain achingbrain requested a review from a team as a code owner February 9, 2024 16:03
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly believe we should not leak custom types onto object returned by .json(). See https://github.com/ipfs/helia/pull/426/files#r1485686844

@achingbrain achingbrain changed the base branch from main to fix/remove-stubs-from-tests February 12, 2024 14:31
The extensive stubbing in the `@helia/verified-fetch` tests have some
baked-in assumptions about how the codecs work which are not easy
to unpick.

It's quick to test using the actual codecs if the block data is already
present so remove the stubs and use a network-less Helia node to make
the tests more reliable.
DAG-JSON and DAG-CBOR can have embeded CIDs which are
deserialized to CID objects by the `@ipld/dag-json` and
`@ipld/dag-cbor` modules.

This fixes a bug whereby we were JSON.stringifying them
which lead to rubbish being returned from `.json()`.
@achingbrain
Copy link
Member Author

achingbrain commented Feb 12, 2024

I strongly believe we should not leak custom types onto object returned by .json()

I agree, though we now have a weird DX niggle where these return incompatible types even though you are interacting with the same ecosystem of modules:

import { verifiedFetch } from '@helia/verified-fetch'
import { dagJson } from '@helia/dag-json'

const cid = CID.parse('QmDAG-JSONwithCID')

// obj has CID as `{ "/": "QmFoo" }`
const res = await verifiedFetch(`ipfs://${cid}`)
const obj = await res.json()

// obj has CID as object
const d = dagJson(helia)
const obj = await d.get(CID.parse('QmDAG-JSONwithCID')

DAG-CBOR is more problematic when trying to treat is as plain-old-JSON because it supports several unserializable types:

const obj = {
  hello: 'world',
  bigInt: 12345n,
  bytes: Uint8Array.from([0, 1, 2, 3, 4])
}

const d = dagCbor(helia)
const cid = await d.add(obj)

const res = await verifiedFetch(`ipfs://${cid}`)
const output = await res.json() // 💥

I guess the thing to do here is to convert CIDs to { "/": "Qmfoo"}, BigInts to strings, Uint8Array to plain arrays, and so on - then .json() would work.

We could add a .object() method to the response type that returns rich object types?

It wouldn't survive caching, but we can't cache ipfs://QmFoo URLs without browser support anyway.

It'd be nice to have arrayBuffer() return the raw CBOR, then the user could pass it to @ipld/dag-cbor to get a rich object back, but we can't have that and have .json() return an object as for that to work the response body has to be set as a JSON string so arrayBuffer() would return a array buffer containing that JSON string and not CBOR - at that point all the CBOR codepoints that delimit types like Uint8Array etc will have been lost so there's no way to convert it back to the original object.

@SgtPooki
Copy link
Member

It wouldn't survive caching, but we can't cache ipfs://QmFoo URLs without browser support anyway.

With verified-fetch in a SW (helia-service-worker-gateway), the returned responses are cached because its the Response object itself that's cached by the service-worker, based on requested URL. I'm not 100% sure of the intricacies there, but the returned object will be whatever is in the body i'm sure, so we need to make sure the stream/arrayBuffer given to the Response constructor is what we're depending on being cached.

This also brings up a few targeted use cases:

  1. applications that expect the browser to cache the response object of verified-fetch (SW & other gateway-esque solutions)
  2. applications that expect an easy fetch interface and subsequent consumption of the returned data.

It'd be nice to have arrayBuffer() return the raw CBOR, then the user could pass it to @ipld/dag-cbor to get a rich object back, but we can't have that and have .json() return an object as for that to work the response body has to be set as a JSON string so arrayBuffer() would return a array buffer containing that JSON string and not CBOR - at that point all the CBOR codepoints that delimit types like Uint8Array etc will have been lost so there's no way to convert it back to the original object.

I think having .arrayBuffer() for dagCbor be the priority makes the most sense, given that the data in the response should be unique per requested URL, and consistent. If someone calls .json on dagCbor, it should fail, right? Afterall, it's not actual JSON ™. If we need a codec that is actual JSON, not some psuedo-json-like-object, we should have a different codec, shouldn't we?

Maybe we could provide some interface for verifiedFetch similar to hashers for Helia, where we can extend how we process returned content? I imagine handling all of the possible scenarios could easily become painful.

@lidel
Copy link
Member

lidel commented Feb 13, 2024

I think getting DAG-CBOR experience with .json() and .arrayBuffer() is extra important, because it is a common pattern for people. We've been telling devs to build apps that use a functional equivalent of ipfs dag put for a long time. By default, the command turns JSON into CBOR: accepts json string, parses it as dag-json, and stores it as dag-cbor:

$ echo -n '{"hello":"world"}' | ipfs dag put # [--store-codec=dag-cbor] [--input-codec=dag-json]
bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae

So if someone builds an app, and wants to read that dag-cbor back with verified-fetch, they would want .json() to return JSON similar to what was passed to ipfs dag put:

> resp.json()
{"hello":"world"}

For practical purposes the roundtrip (json→cbor→json) works fine, as long it is limited to the safe DAG- subset of CBOR and JSON, listed in

I think if we have to choose between .arrayBuffer() and .json() for a DAG-CBOR CID, having JSON result will be more useful to devs.

@achingbrain Is my assumption correct that if someone wants to get raw dag-cbor bytes to parse with @ipld/dag-cbor, we could implement verified-fetch so they can make an explicit request with Accept header set to application/vnd.ipld.dag-cbor (or application/vnd.ipld.raw), and then .arrayBuffer() would work fine and return CBOR bytes, ano not JSON string, right?

@achingbrain
Copy link
Member Author

Yes, they can request the raw block and decode it themselves, then they can use the rich types and we don't have to worry about breaking .json().


Notes from the dApps working group call:

We can try to deserialize the CBOR as JSON and error if it fails.

The user can then re-request with the header Accept: application/vnd.ipld.raw to get raw bytes and use @ipld/dag-cbor to decode the .arrayBuffer() output.

Presumably if they request with Accept: application/vnd.ipld.dag-cbor, application/vnd.ipld.raw (e.g. both types) we'll try to decode and if it fails, fall back to letting them use .arrayBuffer().

@achingbrain
Copy link
Member Author

achingbrain commented Feb 13, 2024

That said, what if we just try to decode and just let the (familiar) content type of the response dictate how it can be consumed, to save the user making two requests and so they don't have to care about application/vnd.ipld.whatnow?

If the block decodes as JSON-compliant DAG-CBOR it's application/json, otherwise it's application/octet-stream.

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'

const response = await verifiedFetch('ipfs://bafyDagCbor')

let obj

if (response.headers.get('Content-Type')?.includes('application/json')) {
  obj = await response.json()
} else {
  obj = dagCbor.decode(await response.arrayBuffer())
}

vs

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'

let response = await verifiedFetch('ipfs://bafyDagCbor')
let obj

try {
  obj = await response.json() // 💥
} catch (err) {
  if (!err.message.includes('some JSON error') {
    throw err
  }

  // try again
  response = await verifiedFetch('ipfs://bafyDagCbor', {
    headers: {
      accept: 'application/vnd.ipld.raw'
    }
  })

  try {
    obj = dagCbor.decode(await response.arrayBuffer())
  }  catch (err) {
    console.error('giving up', err)
    throw err
  }
}

I think I prefer the version without exceptions-as-control-flow.

Base automatically changed from fix/remove-stubs-from-tests to main February 13, 2024 18:18
@achingbrain
Copy link
Member Author

Ok, I think we're in a good place now.

JSON codec

import { verifiedFetch } from '@helia/verified-fetch'
import * as json from 'multiformats/codecs/json'

const res = await verifiedFetch('ipfs://bafyJSONCodec')

// get object representation of JSON body
console.info(await res.json()) // { ... }

// alternative way to do the same thing
const obj = json.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { ... }

DAG-JSON codec

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagJson from '@ipld/dag-json'

const res = await verifiedFetch('ipfs://bafyDAGJSONCodec')

// get spec-compliant plain-old-JSON object that happens to contain a CID
console.info(await res.json()) // { cid: { '/': 'Qmfoo' } }

// ...or use @ipld/dag-json to get rich-object version
const obj = dagJson.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { cid: CID('Qmfoo') }

DAG-CBOR codec

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'

const res = await verifiedFetch('ipfs://bafyDAGCBORCodec')

if (res.headers.get('content-type') === 'application/json') {
  // CBOR was JSON-friendly
  console.info(await res.json()) // { ... }
} else {
  // CBOR was not JSON-friendly, must use @ipld/dag-cbor to decode
  const obj = dagCbor.decode(new Uint8Array(await res.arrayBuffer()))
  console.info(obj) // { ... }
}

If the DAG-CBOR block contains anything that cannot round-trip to JSON (e.g. CIDs, Uint8Arrays, BigInts, etc), the content type will be application/octet-stream - .json() will throw and so @ipld/dag-cbor must be used to decode the resolved value of .arrayBuffer().


I've added README docs explaining the above, please comment if they muddy the waters.

Copy link
Member

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but I imagine we'll wait on multiformats/js-multiformats#287. Looks amazing, fantastic work as always.

*
* [DAG-CBOR](https://ipld.io/docs/codecs/known/dag-cbor/) uses the [Concise Binary Object Representation](https://cbor.io/) format for serialization instead of JSON.
*
* This supports more datatypes in a safer way than JSON and is smaller on the wire to boot so is usually preferable to JSON or DAG-JSON.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

private readonly pathWalker: PathWalkerFn
private readonly log: Logger
private readonly contentTypeParser: ContentTypeParser | undefined

constructor ({ helia, ipns, unixfs, dagJson, json, dagCbor, pathWalker }: VerifiedFetchComponents, init?: VerifiedFetchInit) {
constructor ({ helia, ipns, unixfs, pathWalker }: VerifiedFetchComponents, init?: VerifiedFetchInit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great. The more we can stay out of pulling in all of the extra codecs, the better.

Comment on lines 468 to 483
// N.b. this is not possible because the incoming block is turned into JSON
// and returned as the response body, so `.arrayBufer()` returns a string
// encoded into a Uint8Array which we can't parse as CBOR
it.skip('can round trip JSON-compliant dag-cbor via .arrayBuffer()', async () => {
const obj = {
hello: 'world'
}
const c = dagCbor(helia)
const cid = await c.add(obj)

const resp = await verifiedFetch.fetch(cid)
expect(resp.headers.get('content-type')).to.equal('application/json')

const output = ipldDagCbor.decode(new Uint8Array(await resp.arrayBuffer()))
await expect(c.add(output)).to.eventually.deep.equal(cid)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to call this out in packageDocumentation (or is it already?)

Is this forever not-possible, or something we hope to support in the future? (we could remove this test, or assert that it isn't equal)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this in because it's the logical next test give the input/output matrix, but it won't work for reasons outlined in the comment. We can remove it if we think it's redundant.

packages/verified-fetch/test/verified-fetch.spec.ts Outdated Show resolved Hide resolved
@2color
Copy link
Member

2color commented Feb 15, 2024

I opened a separate PR with some small docs refinements #436

The one challenge here, which isn't a blocker to me, is that for projects that use dag-cbor extensively, they might have some CIDs that can roundtrip to json and some that don't (with BigInts and Uint8Arrays). In its current state, there's no way to opt out of the automatic JSON parsing, so depending on their data, they either have to pay the cost of attempting to parse as json or deal with the difference cases (json/binary) in their code.

@achingbrain
Copy link
Member Author

achingbrain commented Feb 15, 2024

In its current state, there's no way to opt out of the automatic JSON parsing

If they know it's going to fail they can add an Accept: application/vnd.ipld.raw to the request and it'll just be a raw block with the content type application/octet-stream which they can await res.arrayBuffer() into @ipld/dag-cbor?

E.g.:

import { verifiedFetch } from '@helia/verified-fetch'
import * as dagCbor from '@ipld/dag-cbor'

const res = await verifiedFetch('ipfs://bafyDAGCBORCodec', {
  headers: {
    accept: 'application/vnd.ipld.raw'
  }
})

const obj = dagCbor.decode(new Uint8Array(await res.arrayBuffer()))
console.info(obj) // { ... }

@2color
Copy link
Member

2color commented Feb 15, 2024

If they know it's going to fail they can add an Accept: application/vnd.ipld.raw to the request and it'll just be a raw block with the content type application/octet-stream which they can await res.arrayBuffer() into @ipld/dag-cbor?

From a look at the code, this isn't implemented and the example you provided returns a response object with status 501.

for (const [format, acceptHeader] of formatsAndAcceptHeaders) {
// eslint-disable-next-line no-loop-func
it(`returns 501 for ${acceptHeader}`, async () => {
const resp = await verifiedFetch.fetch(`ipns://mydomain.com?format=${format}`)
expect(resp).to.be.ok()
expect(resp.status).to.equal(501)
const resp2 = await verifiedFetch.fetch(testCID, {
headers: {
accept: acceptHeader
}
})
expect(resp2).to.be.ok()
expect(resp2.status).to.equal(501)
})
}
})

Another thing which might be a bug is that we pass the options object to a method which expects the body as the first parameter:

function notSupportedResponse (body?: BodyInit | null): Response {
return new Response(body, {
status: 501,
statusText: 'Not Implemented'
})
}

response = await formatHandler.call(this, { cid, path, options })

I also tried the example and it doesn't work

Co-authored-by: Daniel N <2color@users.noreply.github.com>
achingbrain and others added 2 commits February 16, 2024 15:57
Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>
@achingbrain
Copy link
Member Author

I'm going to merge this so we can proceed with the tests for the different data types in the examples.

@achingbrain achingbrain changed the title fix: round-trip DAG-JSON and DAG-CBOR with embedded CIDs fix: content-type response header hints how to process response Feb 16, 2024
@achingbrain achingbrain merged commit 754c7af into main Feb 16, 2024
18 checks passed
@achingbrain achingbrain deleted the fix/round-trip-data-with-cids branch February 16, 2024 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants