Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fromUtf8 in web3.toHex #1398

Merged
merged 5 commits into from
Mar 13, 2018
Merged

Use fromUtf8 in web3.toHex #1398

merged 5 commits into from
Mar 13, 2018

Conversation

wjmelements
Copy link

Fixes: #1179

This fixes an issue I am having where if I submit Korean bytes parameters through web3.js it sends garbage. It likely fixes other utf8 issues as well.

Before:

web3.toHex('내가 제일 잘 나가')
"0xb0b4ac0020c81cc77c20c79820b098ac00"
web3.toUtf8(web3.toHex('내가 제일 잘 나가'))
inpage.js:14308 Uncaught Error: Invalid UTF-8 detected
    at c (inpage.js:14308)
    at Object.decode (inpage.js:14308)
    at Proxy.toUtf8 (inpage.js:14308)
    at <anonymous>:1:6

After:

web3.toHex('내가 제일 잘 나가')
"0xeb82b4eab08020eca09cec9dbc20ec9e9820eb8298eab080"
web3.toUtf8(web3.toHex('내가 제일 잘 나가'))
"내가 제일 잘 나가"

@wjmelements
Copy link
Author

This breaks one test case:

{ value: '\u0003\u0000\u0000\u00005èÆÕL]\u0012|<9d>ξ<9e>\u001a7«<9b>\u00052\u0011(Ð<97>Y\n<\u0010\u0000\u0000\u0000\u0000\u0000\u0000e!ßd/ñõì\f:z¦Î¦±ç·÷Í¢Ëß\u00076*<85>\b<8e><97>ñ<9e>ùC1ÉUÀé2\u001aÓ<86>B<8c>',
      expected: '0x0300000035e8c6d54c5d127c9dcebe9e1a37ab9b05321128d097590a3c100000000000006521df642ff1f5ec0c3a7aa6cea6b1e7b7f7cda2cbdf07362a85088e97f19ef94331c955c0e9321ad386428c'}

@wjmelements
Copy link
Author

With ascii, the encoding is 0x0300000035e8c6d54c5d127c9dcebe9e1a37ab9b05321128d097590a3c100000000000006521df642ff1f5ec0c3a7aa6cea6b1e7b7f7cda2cbdf07362a85088e97f19ef94331c955c0e9321ad386428c.

I contend that 0x0300000035c3a8c386c3954c5d127cc29dc38ec2bec29e1a37c2abc29b05321128c390c297590a3c100000000000006521c39f642fc3b1c3b5c3ac0c3a7ac2a6c38ec2a6c2b1c3a7c2b7c3b7c38dc2a2c38bc39f07362ac28508c28ec297c3b1c29ec3b94331c38955c380c3a9321ac393c28642c28c is the desired utf8 encoding of the data and that the test case is incorrect.

This differs from the current behavior of fromUtf8 because code point 0 terminates the string. The test cases expect this to remain the behavior. I would expect code point 0 to not terminate the string, and for it to be represented by 00.

@coveralls
Copy link

coveralls commented Feb 24, 2018

Coverage Status

Coverage increased (+0.01%) to 90.736% when pulling b41cdd6 on wjmelements:utf8 into 3c86456 on ethereum:develop.

@wjmelements
Copy link
Author

I'm preserving the trimming behavior in fromUtf8 by adding an optional parameter, allowZero.

One advantage of not preserving this trimming behavior would be that we could assert utils.toUtf8(utils.fromUtf8(test.value)) == test.value in the test cases.

One reason to preserve the trimming behavior is that some conceivable client might depend on it.

@wjmelements
Copy link
Author

Until this is determined, as a workaround you can do

contract.bytesFn(web3.fromUtf8(utf8str), (err,result)=>{...});

@frozeman frozeman merged commit bd6a890 into web3:develop Mar 13, 2018
@frozeman
Copy link
Contributor

frozeman commented Mar 13, 2018

I compared it to how i handle that in web3.js 1.0 branch and your change seems to be correct.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants