-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encoder #2050
Fix encoder #2050
Conversation
Unsafe? |
We need it to be deterministic and unambiguous. Otherwise, the consistency of each implementation cannot be maintained. |
Yes, with |
Currently parsing a With this PR, "+" will be parsed to "+" and "\u002B" will also be parsed to "+". (I assume we don't have cases where input strings includes "\u...."? If so we should recognize it as corresponding character though.) |
It seems deterministic https://github.com/ahsonkhan/corefx/blob/master/src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/UnsafeRelaxedJavaScriptEncoder.cs but we should know if we have the same in go ( @roman-khimov ) |
It's not easy to tell all the side-effects of this change, but at the same time we don't have neo-project/neo-modules#375 problem at all even now:
(I don't have this contract, obviously, but the script generated has
|
C# and Go are different in encoding: https://medium.com/swlh/utf-8-encoding-in-go-14b459564ccd |
Ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must completely determine the logic of the encoder, so as to ensure the consistency of all implementations. Otherwise, we have to create a custom encoder. Also, I don't like unsafe.
I'd rather merge it as is, because it solves the problem and the behavior is documented. It's "unsafety" is also documented well:
I think we're perfectly fine with this. |
I don't know, is it safe for us to allow |
It's going to be valid UTF-8 JSON in the end, so as long as
I don't think it's a problem. |
Have you considered those control characters? |
Mimicking JObject serialization (yes, 0x20 is not in control range): using System;
using System.IO;
using System.Text.Encodings.Web;
using System.Text.Json;
public class Program
{
public static void Main()
{
string controls = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20";
MemoryStream ms = new MemoryStream();
Utf8JsonWriter writer = new Utf8JsonWriter(ms, new JsonWriterOptions
{
SkipValidation = true,
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping
});
writer.WriteStringValue(controls);
writer.Flush();
byte[] res = ms.ToArray();
string result = System.Text.Encoding.UTF8.GetString(res);
Console.WriteLine(result);
}
} Output:
For the fun of it, Go version: package main
import (
"encoding/json"
"fmt"
"os"
)
func main() {
var controls = "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20"
b, err := json.Marshal(controls)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
fmt.Println(string(b))
} Output:
That turns out to be a little less fun than was intended as upper/lower case and escapings differ a bit... |
Here is the python result import json
p = '\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20'
print(json.dumps(p)) output:
It's the same result with C#. |
Except it's not exactly the same. |
And to make it even more fun try replacing C#: |
Open this again? |
Close neo-project/neo-modules#375