Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CESU-8 support in XS to address UTF-8/UTF-16 confusion risk #2118

Closed
dckc opened this issue Dec 22, 2020 · 30 comments · Fixed by #4920
Closed

CESU-8 support in XS to address UTF-8/UTF-16 confusion risk #2118

dckc opened this issue Dec 22, 2020 · 30 comments · Fixed by #4920
Assignees
Labels
audit-zestival Vulnerability assessment of ERTP + Zoe documentation Improvements or additions to documentation security xsnap the XS execution tool
Milestone

Comments

@dckc
Copy link
Member

dckc commented Dec 22, 2020

What is the Problem Being Solved?

update: @erights noted our kernel runs on v8 with UTF-16 and xsnap uses UTF-8 and wondered about a risk of one side confusing the other -- especially if the xsnap child can confuse the kernel.

Description of the Design

Evaluate confusion risk as strings etc. go between the parent node process and the child xsnap process.

Document it as a known difference between our platform and the JS spec.

Security Considerations

not clear; we're evaluating a confusion risk

Test Plan

Unit tests to check border crossings.

cc @kriskowal @warner @erights

@phoddie
Copy link

phoddie commented Jan 15, 2021

No question - XS stores strings internally as UTF-8. How is that visible to JavaScript code?

Our conformance document describes the places where strings in XS do not conform to ECMA-262. But, it isn't necessarily the case that they are unsolvable when strings are stored as UTF-8 or that those issues make the fact that strings are stored as UTF-8 visible to scripts.

Apologies if this seems overly precise. I only want to be certain what this issue is referring to.

@dckc
Copy link
Member Author

dckc commented Jan 16, 2021

No question - XS stores strings internally as UTF-8. How is that visible to JavaScript code?

I just mean things like '𝌆'.length === 1 on XS vs 2 on node and my browser. Also:

xs> '\uD834' + '\uDF06'
"\ud834\udf06"
Welcome to Node.js v14.2.0.
> '\uD834' + '\uDF06'
'𝌆'

It doesn't necessarily allow JS code to tell that the internal encoding is UTF-8, but it's a difference from other implementations that is visible from JS code.

Apologies if this seems overly precise. I only want to be certain what this issue is referring to.

Precision is exactly what we need here.

I don't know for certain that this difference represents an exploitable vulnerability. I just have nightmares from reading The Tangled Web , which is full of situations where differences in implementations turned into security vulnerabilities. For example, on p 30:

let’s have a peek at this syntax instead:
http://example.com\@coredump.cx/
In Firefox, that URL will take the user to coredump.cx, because example.com\
will be interpreted as a valid value for the login field. In almost all other browsers, “\” will be interpreted as a path delimiter, and the user will land on example.com instead.

  • This particular @-based trick was quickly embraced to facilitate all sorts of online fraud
    targeted at casual users. ...

I can imagine some JS code that splits at the 10th character, which could lead to problems like the ones derived from browsers splitting URLs at @ differently.

@erights
Copy link
Member

erights commented Jan 16, 2021

Just to emphasize, we do not expect this to be a problem and are not asking it to be fixed. Rather, as @dckc says, it is a difference we need to stay aware of, in case it does lead to an unpleasant surprise. Just being thorough.

@phoddie
Copy link

phoddie commented Jan 16, 2021

@erights - understood.

To state the obvious: every engine has variances from the ECMA-262. Scripts can detect those. But beyond that, there's nothing here which is obviously exploitable - though the title suggests there might be.

@dckc dckc added the xsnap the XS execution tool label Apr 28, 2021
@dckc
Copy link
Member Author

dckc commented May 21, 2021

Some tangible impact: endojs/endo#737 , endojs/endo#739

cc @kriskowal

@dckc
Copy link
Member Author

dckc commented May 21, 2021

@michaelfig what is this UTF-16 widget in our IBC stuff?

@michaelfig
Copy link
Member

michaelfig commented May 21, 2021

@michaelfig what is this UTF-16 widget in our IBC stuff?

I rely on the accuracy of conversions in https://github.com/Agoric/agoric-sdk/tree/master/SwingSet/src/vats/network/bytes.js. @phoddie, does this hold under XS?

const octets = new Array(256).fill(0).map((_, i) => i);
// We receive some external JSON data (from Golang) in the following format:
const bstring1 = String.fromCharCode.apply(null, octets);
const bstring2 = JSON.parse(JSON.stringify(bstring1));
assert(bstring1 === bstring2);
// We want to extract the octets:
const octets2 = bstring2.split('').map(c => c.charCodeAt(0));
// And to be able to reencode them:
const bstring3 = String.fromCharCode.apply(null, octets2);
assert(bstring2 === bstring3);

I'm not saying that XS needs to support this usage, only that we need a coherent way to encode and transport bytes through our marshalling from Golang JSON. The existing implementation in bytes.js is hacky, but works from GoJSON->Node.js and back. Now I need to know how to get from GoJSON->XS->GoJSON.

Again, this should be properly solved with the correct (and hopefully ES or Web JS standard) abstractions/APIs. Choosing how to do this is for people who understand XS and the standards.

I don't have much choice on the Golang side, short of exploding the Golang []byte into an array of individual octets (wow, much transport overhead), or using a different codec than the default Golang "encoding/json" one and JS (both Node.JS and XS) globalThis.JSON. Ideally, I don't want to change the Golang side, since we have much more flexibility finding a solution to this constrained problem.

But if we have a good solution on the JS side that needs a change, I will adopt it and change Golang to match. I just don't want to exchange one hack for another.

@michaelfig
Copy link
Member

Sorry, didn't mean to tag you, Peter when one of us could easily check this equivalence on XS.

@phoddie
Copy link

phoddie commented May 21, 2021

No problem. I hadn't caught up with this issue yet anyway. FWIW (and only tangentially relevant) - if the patch I shared with @kriskowal early today to String.fromArrayBuffer is a worthwhile direction, we will need to make corresponding changes to ArrayBuffer.fromString to allow consistent roundtrips.

@dckc
Copy link
Member Author

dckc commented May 21, 2021

@michaelfig what is this UTF-16 widget in our IBC stuff?

I rely on the accuracy of conversions ... does this hold under XS?

Thanks for the test sketch. Yes, it holds.

$ yarn test -sv test/test-xsnap.js -m 'round-*'
  ✔ round-trip byte sequences from golang for dynamic IBC

I tweaked the test sketch to run with our other xsnap tests: 658830e.

please excuse the temporary lack of branch discipline. Expect a cleaned up PR in due course.

p.p.s. I wonder if using ava-xs would be more straightforward than using vat.evaluate() in tests like this one.

@dckc dckc added this to the XS Memory Safety Audit milestone Jul 21, 2021
@dckc dckc added documentation Improvements or additions to documentation and removed needs-design labels Sep 9, 2021
@dckc dckc changed the title UTF-8 in XS: compatibility, security impact? UTF-8 in XS: clearly document Sep 9, 2021
@erights
Copy link
Member

erights commented Nov 18, 2021

Ok, but what about the JS spec? Is it compat with the JSON spec?

@FUDCo
Copy link
Contributor

FUDCo commented Nov 18, 2021

It looks like the relevant piece of the ECMA-262 spec is https://262.ecma-international.org/12.0/#sec-quotejsonstring

If read this right the spec says to do some quoting stuff so the whacky bits of the JavaScript string will round-trip through JSON, but the strings in the JSON itself won't contain the whacky bits. Not 100% sure I'm reading it right though.

@mhofman
Copy link
Member

mhofman commented Nov 18, 2021

I have to admit I don't understand the JS spec on that. It seems mostly to create a new utf-16 string, escaping codepoints that not valid or some of the ASCII chars that need to be escaped. I suppose that it's then used somewhere else and converted to utf-8, since it can assume it's a valid utf-16 encoded string?

@dckc dckc changed the title UTF-8 in XS: clearly document UTF-8 in XS: risk of confusion with UTF-16 Nov 18, 2021
@dckc dckc unassigned warner Nov 18, 2021
@erights erights added the audit-zestival Vulnerability assessment of ERTP + Zoe label Jan 10, 2022
@Tartuffo Tartuffo added the MN-1 label Jan 20, 2022
@dckc dckc changed the title UTF-8 in XS: risk of confusion with UTF-16 CESU-8 support in XS to address UTF-8/UTF-16 confusion risk Jan 20, 2022
@dckc
Copy link
Member Author

dckc commented Jan 20, 2022

Jan 10 update from @phoddie : "our investigation into CESU-8 support has gone well. We are working through the full set of changes now. ..."

@kriskowal
Copy link
Member

Moddable has posted an xsnap release that integrates CESU-8. We either need a corresponding xst binary that we can use to run test262 tests side-by-side in endojs/endo, indirectly the same but througheshost, or to move xsnap to endojs/endo. Then we’ll be in a position to verify equivalence.

Breakdown:

This initial work would also unblock cross-testing other SES features like Compartment dynamic import.

Dan’s got a draft up for progress:
endojs/endo#681

@phoddie
Copy link

phoddie commented Feb 2, 2022

You may know all this already, but just in case:

@kriskowal
Copy link
Member

kriskowal commented Feb 2, 2022 via email

@Tartuffo
Copy link
Contributor

Tartuffo commented Feb 3, 2022

@kriskowal This does not have an area label that is covered by our weekly tech / planning meetings. Can you assign the proper label? We cover: agd, agoric-cosmos, amm, core economy, cosmic-swingset, endo, getrun, governance, installation-bundling, metering, run-protocol, staking, swingset, swingset-runner, token economy, wallet, zoe contract,

@dckc
Copy link
Member Author

dckc commented Feb 3, 2022

@Tartuffo xsnap seems like the appropriate label and we do have weekly endo / SES meetings that cover it. @kriskowal and I touched base about this yesterday, in particular.

@Tartuffo
Copy link
Contributor

Tartuffo commented Feb 3, 2022

@dckc Agreed, I just added it to the ZH filter for the Zoe/ERTP meeting.

@kriskowal
Copy link
Member

The xsnap label is more appropriate to Endo or Kernel gardening, possibly both, since the cadence of Endo grooming is not established yet.

@Tartuffo Tartuffo removed the MN-1 label Feb 7, 2022
@Tartuffo Tartuffo removed this from the XS Memory Safety Audit milestone Feb 8, 2022
@Tartuffo Tartuffo added this to the Mainnet 1 milestone Mar 23, 2022
@dckc
Copy link
Member Author

dckc commented Mar 25, 2022

I think this is addressed by #4832. I suppose it's cost-effective to add a test or two to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit-zestival Vulnerability assessment of ERTP + Zoe documentation Improvements or additions to documentation security xsnap the XS execution tool
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants