Skip to content

Commit

Permalink
feat: add ast.Visitor for fully decoding JSON into custom generic dat…
Browse files Browse the repository at this point in the history
…a containers without using ast.Node (#471)

Co-authored-by: zhongxinghong <zhongxinghong@bytedance.com>
  • Loading branch information
zhongxinghong and zhongxinghong authored Jul 6, 2023
1 parent 6108485 commit 78fd91e
Show file tree
Hide file tree
Showing 4 changed files with 1,074 additions and 2 deletions.
47 changes: 46 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,42 @@ println(string(buf) == string(exp)) // true
- iteration: `Values()`, `Properties()`, `ForEach()`, `SortKeys()`
- modification: `Set()`, `SetByIndex()`, `Add()`

### Ast.Visitor
Sonic provides an advanced API for fully parsing JSON into non-standard types (neither `struct` not `map[string]interface{}`) without using any intermediate representation (`ast.Node` or `interface{}`). For example, you might have the following types which are like `interface{}` but actually not `interface{}`:
```go
type UserNode interface {}

// the following types implement the UserNode interface.
type (
UserNull struct{}
UserBool struct{ Value bool }
UserInt64 struct{ Value int64 }
UserFloat64 struct{ Value float64 }
UserString struct{ Value string }
UserObject struct{ Value map[string]UserNode }
UserArray struct{ Value []UserNode }
)
```
Sonic provides the following API to return **the preorder traversal of a JSON AST**. The `ast.Visitor` is a SAX style interface which is used in some C++ JSON library. You should implement `ast.Visitor` by yourself and pass it to `ast.Preorder()` method. In your visitor you can make your custom types to represent JSON values. There may be an O(n) space container (such as stack) in your visitor to record the object / array hierarchy.
```go
func Preorder(str string, visitor Visitor, opts *VisitorOptions) error

type Visitor interface {
OnNull() error
OnBool(v bool) error
OnString(v string) error
OnInt64(v int64, n json.Number) error
OnFloat64(v float64, n json.Number) error
OnObjectBegin(capacity int) error
OnObjectKey(key string) error
OnObjectEnd() error
OnArrayBegin(capacity int) error
OnArrayEnd() error
}
```

See [ast/visitor.go](https://github.com/bytedance/sonic/blob/main/ast/visitor.go) for detailed usage. We also implement a demo visitor for `UserNode` in [ast/visitor_test.go](https://github.com/bytedance/sonic/blob/main/ast/visitor_test.go).

## Compatibility
Sonic **DOES NOT** ensure to support all environments, due to the difficulty of developing high-performance codes. For developers who use sonic to build their applications in different environments, we have the following suggestions:

Expand Down Expand Up @@ -315,7 +351,7 @@ func init() {
err := sonic.Pretouch(reflect.TypeOf(v))

// with more CompileOption...
err := sonic.Pretouch(reflect.TypeOf(v),
err := sonic.Pretouch(reflect.TypeOf(v),
// If the type is too deep nesting (nesting depth > option.DefaultMaxInlineDepth),
// you can set compile recursive loops in Pretouch for better stability in JIT.
option.WithCompileRecursiveDepth(loop),
Expand Down Expand Up @@ -362,5 +398,14 @@ Why? Because `ast.Node` stores its children using `array`:

**CAUTION:** `ast.Node` **DOESN'T** ensure concurrent security directly, due to its **lazy-load** design. However, you can call `Node.Load()`/`Node.LoadAll()` to achieve that, which may bring performance reduction while it still works faster than converting to `map` or `interface{}`

### Ast.Node or Ast.Visitor?
For generic data, `ast.Node` should be enough for your needs in most cases.

However, `ast.Node` is designed for partially processing JSON string. It has some special designs such as lazy-load which might not be suitable for directly parsing the whole JSON string like `Unmarshal()`. Although `ast.Node` is better then `map` or `interface{}`, it's also a kind of intermediate representation after all if your final types are customized and you have to convert the above types to your custom types after parsing.

For better performance, in previous case the `ast.Visitor` will be the better choice. It performs JSON decoding like `Unmarshal()` and you can directly use your final types to represents a JSON AST without any intermediate representations.

But `ast.Visitor` is not a very handy API. You might need to write a lot of code to implement your visitor and carefully maintain the tree hierarchy during decoding. Please read the comments in [ast/visitor.go](https://github.com/bytedance/sonic/blob/main/ast/visitor.go) carefully if you decide to use this API.

## Community
Sonic is a subproject of [CloudWeGo](https://www.cloudwego.io/). We are committed to building a cloud native ecosystem.
46 changes: 45 additions & 1 deletion README_ZH_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,41 @@ println(string(buf) == string(exp)) // true
- 迭代: `Values()`, `Properties()`, `ForEach()`, `SortKeys()`
- 修改: `Set()`, `SetByIndex()`, `Add()`

### `Ast.Visitor`
Sonic 提供了一个高级的 API 用于直接全量解析 JSON 到非标准容器里 (既不是 `struct` 也不是 `map[string]interface{}`) 且不需要借助任何中间表示 (`ast.Node``interface{}`)。举个例子,你可能定义了下述的类型,它们看起来像 `interface{}`,但实际上并不是:
```go
type UserNode interface {}

// the following types implement the UserNode interface.
type (
UserNull struct{}
UserBool struct{ Value bool }
UserInt64 struct{ Value int64 }
UserFloat64 struct{ Value float64 }
UserString struct{ Value string }
UserObject struct{ Value map[string]UserNode }
UserArray struct{ Value []UserNode }
)
```
Sonic 提供了下述的 API 来返回 **“对 JSON AST 的前序遍历”**`ast.Visitor` 是一个 SAX 风格的接口,这在某些 C++ 的 JSON 解析库中被使用到。你需要自己实现一个 `ast.Visitor`,将它传递给 `ast.Preorder()` 方法。在你的实现中你可以使用自定义的类型来表示 JSON 的值。在你的 `ast.Visitor` 中,可能需要有一个 O(n) 空间复杂度的容器(比如说栈)来记录 object / array 的层级。
```go
func Preorder(str string, visitor Visitor, opts *VisitorOptions) error

type Visitor interface {
OnNull() error
OnBool(v bool) error
OnString(v string) error
OnInt64(v int64, n json.Number) error
OnFloat64(v float64, n json.Number) error
OnObjectBegin(capacity int) error
OnObjectKey(key string) error
OnObjectEnd() error
OnArrayBegin(capacity int) error
OnArrayEnd() error
}
```
详细用法参看 [ast/visitor.go](https://github.com/bytedance/sonic/blob/main/ast/visitor.go),我们还为 `UserNode` 实现了一个示例 `ast.Visitor`,你可以在 [ast/visitor_test.go](https://github.com/bytedance/sonic/blob/main/ast/visitor_test.go) 中找到它。

## 兼容性
由于开发高性能代码的困难性, Sonic ****保证对所有环境的支持。对于在不同环境中使用 Sonic 构建应用程序的开发者,我们有以下建议:

Expand Down Expand Up @@ -331,7 +366,7 @@ func init() {
err := sonic.Pretouch(reflect.TypeOf(v))

// with more CompileOption...
err := sonic.Pretouch(reflect.TypeOf(v),
err := sonic.Pretouch(reflect.TypeOf(v),
// If the type is too deep nesting (nesting depth > option.DefaultMaxInlineDepth),
// you can set compile recursive loops in Pretouch for better stability in JIT.
option.WithCompileRecursiveDepth(loop),
Expand Down Expand Up @@ -381,6 +416,15 @@ go someFunc(user)

**注意**:由于 `ast.Node` 的惰性加载设计,其**不能**直接保证并发安全性,但你可以调用 `Node.Load()` / `Node.LoadAll()` 来实现并发安全。尽管可能会带来性能损失,但仍比转换成 `map``interface{}` 更为高效。

### 使用 `ast.Node` 还是 `ast.Visitor`
对于泛型数据的解析,`ast.Node` 在大多数场景上应该能够满足你的需求。

然而,`ast.Node` 是一种针对部分解析 JSON 而设计的泛型容器,它包含一些特殊设计,比如惰性加载,如果你希望像 `Unmarshal()` 那样直接解析整个 JSON,这些设计可能并不合适。尽管 `ast.Node` 相较于 `map``interface{}` 来说是更好的一种泛型容器,但它毕竟也是一种中间表示,如果你的最终类型是自定义的,你还得在解析完成后将上述类型转化成你自定义的类型。

在上述场景中,如果想要有更极致的性能,`ast.Visitor` 会是更好的选择。它采用和 `Unmarshal()` 类似的形式解析 JSON,并且你可以直接使用你的最终类型去表示 JSON AST,而不需要经过额外的任何中间表示。

但是,`ast.Visitor` 并不是一个很易用的 API。你可能需要写大量的代码去实现自己的 `ast.Visitor`,并且需要在解析过程中仔细维护树的层级。如果你决定要使用这个 API,请先仔细阅读 [ast/visitor.go](https://github.com/bytedance/sonic/blob/main/ast/visitor.go) 中的注释。

## 社区

Sonic 是 [CloudWeGo](https://www.cloudwego.io/) 下的一个子项目。我们致力于构建云原生生态系统。
Loading

0 comments on commit 78fd91e

Please sign in to comment.