Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add decoder Option NoValidateJSON for skipping JSON faster #696

Merged
merged 22 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 7 additions & 9 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ jobs:
- name: Benchmark Target
run: |
export SONIC_NO_ASYNC_GC=1
export SONIC_BENCH_SINGLE=1
go test -run ^$ -count=10 -benchmem -bench 'Benchmark(Encoder|Decoder)_(Generic|Binding)_Sonic' ./decoder >> /var/tmp/sonic_bench_target.out
go test -run ^$ -count=10 -benchmem -bench 'Benchmark(Get|Set)One_Sonic|BenchmarkParseSeven_Sonic' ./ast >> /var/tmp/sonic_bench_target.out
go test -run ^$ -count=10 -benchtime=100000x -benchmem -bench 'BenchmarkDecoder_(Generic|Binding)_Sonic' ./decoder >> /var/tmp/sonic_bench_target.out
go test -run ^$ -count=10 -benchtime=200000x -benchmem -bench 'BenchmarkEncoder_(Generic|Binding)_Sonic' ./encoder >> /var/tmp/sonic_bench_target.out
go test -run ^$ -count=10 -benchtime=500000x -benchmem -bench 'Benchmark(Get|Set)One_Sonic|BenchmarkParseSeven_Sonic' ./ast >> /var/tmp/sonic_bench_target.out

- name: Clear repository
run: sudo rm -fr $GITHUB_WORKSPACE && mkdir $GITHUB_WORKSPACE
Expand All @@ -44,12 +44,10 @@ jobs:
- name: Benchmark main
run: |
export SONIC_NO_ASYNC_GC=1
export SONIC_BENCH_SINGLE=1
go test -run ^$ -count=10 -benchmem -bench 'Benchmark(Encoder|Decoder)_(Generic|Binding)_Sonic' ./decoder >> /var/tmp/sonic_bench_main.out
go test -run ^$ -count=10 -benchmem -bench 'Benchmark(Get|Set)One_Sonic|BenchmarkParseSeven_Sonic' ./ast >> /var/tmp/sonic_bench_main.out
go test -run ^$ -count=10 -benchtime=100000x -benchmem -bench 'BenchmarkDecoder_(Generic|Binding)_Sonic' ./decoder >> /var/tmp/sonic_bench_main.out
go test -run ^$ -count=10 -benchtime=200000x -benchmem -bench 'BenchmarkEncoder_(Generic|Binding)_Sonic' ./encoder >> /var/tmp/sonic_bench_main.out
go test -run ^$ -count=10 -benchtime=500000x -benchmem -bench 'Benchmark(Get|Set)One_Sonic|BenchmarkParseSeven_Sonic' ./ast > /var/tmp/sonic_bench_main.out

- name: Diff bench
run: |
go get golang.org/x/perf/cmd/benchstat && go install golang.org/x/perf/cmd/benchstat
benchstat -format=csv /var/tmp/sonic_bench_target.out /var/tmp/sonic_bench_main.out
# run: ./scripts/bench.py -t 0.05 -d /var/tmp/sonic_bench_target.out,/var/tmp/sonic_bench_main.out x
./scripts/bench.py -t 0.10 -d /var/tmp/sonic_bench_target.out,/var/tmp/sonic_bench_main.out x
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,4 @@ fuzz/testdata
*__debug_bin*
*pprof
*coverage.txt
tools/venv/*
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,17 +380,11 @@ type Visitor interface {
See [ast/visitor.go](https://github.com/bytedance/sonic/blob/main/ast/visitor.go) for detailed usage. We also implement a demo visitor for `UserNode` in [ast/visitor_test.go](https://github.com/bytedance/sonic/blob/main/ast/visitor_test.go).

## Compatibility

Sonic **DOES NOT** ensure to support all environments, due to the difficulty of developing high-performance codes. For developers who use sonic to build their applications in different environments, we have the following suggestions:

- Developing on **Mac M1**: Make sure you have Rosetta 2 installed on your machine, and set `GOARCH=amd64` when building your application. Rosetta 2 can automatically translate x86 binaries to arm64 binaries and run x86 applications on Mac M1.
- Developing on **Linux arm64**: You can install qemu and use the `qemu-x86_64 -cpu max` command to convert x86 binaries to amr64 binaries for applications built with sonic. The qemu can achieve a similar transfer effect to Rosetta 2 on Mac M1.

For developers who want to use sonic on Linux arm64 without qemu, or those who want to handle JSON strictly consistent with `encoding/json`, we provide some compatible APIs as `sonic.API`

- `ConfigDefault`: the sonic's default config (`EscapeHTML=false`,`SortKeys=false`...) to run on sonic-supporting environment. It will fall back to `encoding/json` with the corresponding config, and some options like `SortKeys=false` will be invalid.
- `ConfigStd`: the std-compatible config (`EscapeHTML=true`,`SortKeys=true`...) to run on sonic-supporting environment. It will fall back to `encoding/json`.
- `ConfigFastest`: the fastest config (`NoQuoteTextMarshaler=true`) to run on sonic-supporting environment. It will fall back to `encoding/json` with the corresponding config, and some options will be invalid.
For developers who want to use sonic to meet diffirent scenarios, we provide some integrated configs as `sonic.API`
- `ConfigDefault`: the sonic's default config (`EscapeHTML=false`,`SortKeys=false`...) to run sonic fast meanwhile ensure security.
- `ConfigStd`: the std-compatible config (`EscapeHTML=true`,`SortKeys=true`...)
- `ConfigFastest`: the fastest config (`NoQuoteTextMarshaler=true`) to run on sonic as fast as possible.
Sonic **DOES NOT** ensure to support all environments, due to the difficulty of developing high-performance codes. On non-sonic-supporting environment, the implementation will fall back to `encoding/json`. Thus beflow configs will all equal to `ConfigStd`.

## Tips

Expand Down Expand Up @@ -482,6 +476,12 @@ But `ast.Visitor` is not a very handy API. You might need to write a lot of code
### Buffer Size
Sonic use memory pool in many places like `encoder.Encode`, `ast.Node.MarshalJSON` to improve performace, which may produce more memory usage (in-use) when server's load is high. See [issue 614](https://github.com/bytedance/sonic/issues/614). Therefore, we introduce some options to let user control the behavior of memory pool. See [option](https://pkg.go.dev/github.com/bytedance/sonic@v1.11.9/option#pkg-variables) package.

### Faster JSON skip
For compatibility. Sonic use FSM scanning to validate JSON when decoding raw JSON or encoding `json.Marshaler`, which is much slower than SIMD-implemented skipping. If user has many redundant JSON value and DO NOT NEED to strictly validate JSON correctness, you can enable below options:
- `Config.NoValidateSkipJSON`: for faster skipping JSON when decoding, such as unknown fields, mismatched values, and redundant array elements
- `Config.NoValidateJSONMarshaler`: avoid validating JSON when encoding `json.Marshaler`
- `SearchOption.ValidateJSON`: indicates if validate located JSON value when `Get`

## Community

Sonic is a subproject of [CloudWeGo](https://www.cloudwego.io/). We are committed to building a cloud native ecosystem.
16 changes: 5 additions & 11 deletions README_ZH_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,17 +380,11 @@ type Visitor interface {
详细用法参看 [ast/visitor.go](https://github.com/bytedance/sonic/blob/main/ast/visitor.go),我们还为 `UserNode` 实现了一个示例 `ast.Visitor`,你可以在 [ast/visitor_test.go](https://github.com/bytedance/sonic/blob/main/ast/visitor_test.go) 中找到它。

## 兼容性

由于开发高性能代码的困难性, Sonic **不**保证对所有环境的支持。对于在不同环境中使用 Sonic 构建应用程序的开发者,我们有以下建议:

- 在 **Mac M1** 上开发:确保在您的计算机上安装了 Rosetta 2,并在构建时设置 `GOARCH=amd64` 。 Rosetta 2 可以自动将 x86 二进制文件转换为 arm64 二进制文件,并在 Mac M1 上运行 x86 应用程序。
- 在 **Linux arm64** 上开发:您可以安装 qemu 并使用 `qemu-x86_64 -cpu max` 命令来将 x86 二进制文件转换为 arm64 二进制文件。qemu可以实现与Mac M1上的Rosetta 2类似的转换效果。

对于希望在不使用 qemu 下使用 sonic 的开发者,或者希望处理 JSON 时与 `encoding/JSON` 严格保持一致的开发者,我们在 `sonic.API` 中提供了一些兼容性 API

- `ConfigDefault`: 在支持 sonic 的环境下 sonic 的默认配置(`EscapeHTML=false`,`SortKeys=false`等)。行为与具有相应配置的 `encoding/json` 一致,一些选项,如 `SortKeys=false` 将无效。
- `ConfigStd`: 在支持 sonic 的环境下与标准库兼容的配置(`EscapeHTML=true`,`SortKeys=true`等)。行为与 `encoding/json` 一致。
- `ConfigFastest`: 在支持 sonic 的环境下运行最快的配置(`NoQuoteTextMarshaler=true`)。行为与具有相应配置的 `encoding/json` 一致,某些选项将无效。
对于想要使用sonic来满足不同场景的开发人员,我们提供了一些集成配置:
- `ConfigDefault`: sonic的默认配置 (`EscapeHTML=false`, `SortKeys=false`…) 保证性能同时兼顾安全性。
- `ConfigStd`: 与 `encoding/json` 保证完全兼容的配置
- `ConfigFastest`: 最快的配置(`NoQuoteTextMarshaler=true...`) 保证性能最优但是会缺少一些安全性检查(validate UTF8 等)
Sonic **不**确保支持所有环境,由于开发高性能代码的困难。在不支持声音的环境中,实现将回落到 `encoding/json`。因此上述配置将全部等于`ConfigStd`。

## 注意事项

Expand Down
5 changes: 5 additions & 0 deletions api.go
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ type Config struct {
// NoValidateJSONMarshaler indicates that the encoder should not validate the output string
// after encoding the JSONMarshaler to JSON.
NoValidateJSONMarshaler bool

// NoValidateJSONSkip indicates the decoder should not validate the JSON value when skipping it,
// such as unknown-fields, mismatched-type, redundant elements..
NoValidateJSONSkip bool

// NoEncoderNewline indicates that the encoder should not add a newline after every message
NoEncoderNewline bool
Expand All @@ -109,6 +113,7 @@ var (
ConfigFastest = Config{
NoQuoteTextMarshaler: true,
NoValidateJSONMarshaler: true,
NoValidateJSONSkip: true,
}.Froze()
)

Expand Down
2 changes: 2 additions & 0 deletions decoder/decoder_compat.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ const (
_F_use_number = types.B_USE_NUMBER
_F_validate_string = types.B_VALIDATE_STRING
_F_allow_control = types.B_ALLOW_CONTROL
_F_no_validate_json = types.B_NO_VALIDATE_JSON
)

type Options uint64
Expand All @@ -53,6 +54,7 @@ const (
OptionDisableUnknown Options = 1 << _F_disable_unknown
OptionCopyString Options = 1 << _F_copy_string
OptionValidateString Options = 1 << _F_validate_string
OptionNoValidateJSON Options = 1 << _F_no_validate_json
)

func (self *Decoder) SetOptions(opts Options) {
Expand Down
1 change: 1 addition & 0 deletions decoder/decoder_native.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ const (
OptionDisableUnknown Options = api.OptionDisableUnknown
OptionCopyString Options = api.OptionCopyString
OptionValidateString Options = api.OptionValidateString
OptionNoValidateJSON Options = api.OptionNoValidateJSON
)

// StreamDecoder is the decoder context object for streaming input.
Expand Down
86 changes: 79 additions & 7 deletions decoder/decoder_native_test.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
//go:build (amd64 && go1.17 && !go1.24) || (arm64 && go1.20 && !go1.24)
// +build amd64,go1.17,!go1.24 arm64,go1.20,!go1.24


/*
* Copyright 2021 ByteDance Inc.
*
Expand All @@ -21,15 +20,88 @@
package decoder

import (
`encoding/json`
_`strings`
`testing`
_`reflect`
"encoding/json"
"fmt"
_ "reflect"
"strings"
_ "strings"
"testing"
"time"

`github.com/bytedance/sonic/internal/rt`
`github.com/stretchr/testify/assert`
"github.com/bytedance/sonic/internal/rt"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)


func BenchmarkSkipValidate(b *testing.B) {
type skiptype struct {
A int `json:"a"` // mismatched
B string `json:"-"` // ommited
C [1]int `json:"c"` // fast int
D struct {} `json:"d"` // empty struct
E map[string]int `json:"e"` // mismatched elem
// Unknonwn
}
type C struct {
name string
json string
expTime float64
}
var sam = map[int]interface{}{}
for i := 0; i < 1; i++ {
sam[i] = _BindingValue
}
comptd, err := json.Marshal(sam)
if err != nil {
b.Fatal("invalid json")
}
compt := string(comptd)
var cases = []C{
{"mismatched", `{"a":`+compt+`}`, 5},
{"ommited", `{"b":`+compt+`}`, 5},
{"number", `{"c":[`+strings.Repeat("-1.23456e-19,", 1000)+`1]}`, 1.5},
{"unknown", `{"unknown":`+compt+`}`, 5},
{"empty", `{"d":`+compt+`}`, 5},
{"mismatched elem", `{"e":`+compt+`}`, 5},
}
_ = NewDecoder(`{}`).Decode(&skiptype{})

var avg1, avg2 time.Duration
for _, c := range cases {
b.Run(c.name, func(b *testing.B) {
b.Run("validate", func(b *testing.B) {
b.ResetTimer()
t1 := time.Now()
for i := 0; i < b.N; i++ {
var obj1 = &skiptype{}
// validate skip
d := NewDecoder(c.json)
_ = d.Decode(obj1)
}
d1 := time.Since(t1)
avg1 = d1/time.Duration(b.N)
})
b.Run("fast", func(b *testing.B) {
b.ResetTimer()
t2 := time.Now()
for i := 0; i < b.N; i++ {
var obj2 = &skiptype{}
// fask skip
d := NewDecoder(c.json)
d.SetOptions(OptionNoValidateJSON)
_ = d.Decode(obj2)
}
d2 := time.Since(t2)
avg2 = d2/time.Duration(b.N)
})
// fast skip must be expTime x faster
require.True(b, float64(avg1)/float64(avg2) > c.expTime, fmt.Sprintf("%v/%v=%v", avg1, avg2, float64(avg1)/float64(avg2)))
})
}
}


func TestSkipMismatchTypeAmd64Error(t *testing.T) {
// t.Run("struct", func(t *testing.T) {
// println("TestSkipError")
Expand Down
21 changes: 10 additions & 11 deletions decoder/decoder_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,16 @@
package decoder

import (
`encoding/json`
`runtime`
`runtime/debug`
`strings`
`sync`
`testing`
`time`

`github.com/stretchr/testify/assert`
`github.com/stretchr/testify/require`
"encoding/json"
"runtime"
"runtime/debug"
"strings"
"sync"
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)

func TestMain(m *testing.M) {
Expand Down Expand Up @@ -85,7 +85,6 @@ func init() {
_ = json.Unmarshal([]byte(TwitterJson), &_BindingValue)
}


func TestSkipMismatchTypeError(t *testing.T) {
t.Run("struct", func(t *testing.T) {
println("TestSkipError")
Expand Down
1 change: 1 addition & 0 deletions internal/decoder/api/decoder.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ const (
OptionDisableUnknown = consts.OptionDisableUnknown
OptionCopyString = consts.OptionCopyString
OptionValidateString = consts.OptionValidateString
OptionNoValidateJSON = consts.OptionNoValidateJSON
)

type (
Expand Down
3 changes: 3 additions & 0 deletions internal/decoder/consts/option.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,11 @@ const (
F_disable_unknown = 3
F_copy_string = 4


F_use_number = types.B_USE_NUMBER
F_validate_string = types.B_VALIDATE_STRING
F_allow_control = types.B_ALLOW_CONTROL
F_no_validate_json = types.B_NO_VALIDATE_JSON
)

type Options uint64
Expand All @@ -26,6 +28,7 @@ const (
OptionDisableUnknown Options = 1 << F_disable_unknown
OptionCopyString Options = 1 << F_copy_string
OptionValidateString Options = 1 << F_validate_string
OptionNoValidateJSON Options = 1 << F_no_validate_json
)

const (
Expand Down
Loading
Loading