Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(java): standardizing fury java spec #1240

Merged
merged 35 commits into from
Feb 28, 2024

Conversation

chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Dec 20, 2023

What do these changes do?

This PR refine fury java serialization format spec. The cross-language object graph serialization spec is similar and will be added in a later PR, but it needs more discuss.

This PR added some new spec which hasn't been implemented in current java implementation:

Some parts has been omitted in this spec:

  • object serialization with schema evolution support by write field in a KV like pattern: this will be replaced by schema evolution mode described in this spec in the future.

Currently fury doesn't provide binary compatibility, the spec may be revised in the future.

Related issue number

Closes #1239

#1238

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@chaokunyang chaokunyang marked this pull request as draft December 20, 2023 17:08
@chaokunyang chaokunyang force-pushed the refine_fury_java_spec branch from 3317c83 to a8fce2f Compare December 21, 2023 12:14
docs/.DS_Store Outdated Show resolved Hide resolved
@chaokunyang chaokunyang marked this pull request as ready for review December 21, 2023 15:34
@chaokunyang
Copy link
Collaborator Author

chaokunyang commented Dec 21, 2023

@wangweipeng2 @PragmaTwice @bytemain Could you help review this PR?

@chaokunyang
Copy link
Collaborator Author

Hi @wangweipeng2 @PragmaTwice @bytemain, do you have other comments about this PR? I hope all concerns has been addressed.

chaokunyang and others added 3 commits February 27, 2024 20:09
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
@chaokunyang
Copy link
Collaborator Author

cc @LiangliangSui

chaokunyang and others added 2 commits February 28, 2024 10:30
Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest looks good to me.

chaokunyang and others added 4 commits February 28, 2024 23:12
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
Co-authored-by: Twice <twice@apache.org>
@chaokunyang chaokunyang merged commit 0eee00d into apache:main Feb 28, 2024
30 checks passed
@chaokunyang chaokunyang deleted the refine_fury_java_spec branch April 1, 2024 06:29
chaokunyang added a commit that referenced this pull request Apr 12, 2024
…th (#1486)

<!--
**Thanks for contributing to Fury.**

**If this is your first time opening a PR on fury, you can refer to
[CONTRIBUTING.md](https://github.com/apache/incubator-fury/blob/main/CONTRIBUTING.md).**

Contribution Checklist

- The **Apache Fury (incubating)** community has restrictions on the
naming of pr titles. You can also find instructions in
[CONTRIBUTING.md](https://github.com/apache/incubator-fury/blob/main/CONTRIBUTING.md).

- Fury has a strong focus on performance. If the PR you submit will have
an impact on performance, please benchmark it first and provide the
benchmark result here.
-->

## What does this PR do?
This PR optimize string serialization by concating coder and length into
a long and serialize it together, it can save one byte for small string
which the length is less than 32
<!-- Describe the purpose of this PR. -->


## Related issues
#1240 
<!--
Is there any related issue? Please attach here.

- #xxxx0
- #xxxx1
- #xxxx2
-->


## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/incubator-fury/issues/new/choose)
describing the need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?


## Benchmark
Before
```
Benchmark                                  (bufferType)   (objectType)  (references)   Mode  Cnt        Score       Error  Units
UserTypeDeserializeSuite.fury_deserialize         array  MEDIA_CONTENT         false  thrpt   50  2576580.395 ± 55698.851  ops/s
```
This PR:
```java
Benchmark                                  (bufferType)   (objectType)  (references)   Mode  Cnt        Score       Error  Units
UserTypeDeserializeSuite.fury_deserialize         array  MEDIA_CONTENT         false  thrpt   50  2591695.102 ± 48174.940  ops/s
```

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
chaokunyang added a commit that referenced this pull request Apr 15, 2024
<!--
**Thanks for contributing to Fury.**

**If this is your first time opening a PR on fury, you can refer to
[CONTRIBUTING.md](https://github.com/apache/incubator-fury/blob/main/CONTRIBUTING.md).**

Contribution Checklist

- The **Apache Fury (incubating)** community has restrictions on the
naming of pr titles. You can also find instructions in
[CONTRIBUTING.md](https://github.com/apache/incubator-fury/blob/main/CONTRIBUTING.md).

- Fury has a strong focus on performance. If the PR you submit will have
an impact on performance, please benchmark it first and provide the
benchmark result here.
-->

## What does this PR do?
This PR implements meta string encoding described in [fury java
serialization
spec](https://fury.apache.org/docs/specification/fury_java_serialization_spec#meta-string)
and [xlang serialization
spec](https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#meta-string)

We have `3/8` space saveing for most string:
```java
    // utf8 use 30 bytes, we use only 19 bytes
    assertEquals(encoder.encode("org.apache.fury.benchmark.data").getBytes().length, 19);
    // utf8 use 12 bytes, we use only 9 bytes.
    assertEquals(encoder.encode("MediaContent").getBytes().length, 9);
```


The integration with ClassResolver is left in another PR.

## Related issues
#1240 
#1413 


## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/incubator-fury/issues/new/choose)
describing the need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?


## Benchmark

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
chaokunyang added a commit that referenced this pull request Apr 27, 2024
## What does this PR do?

This PR implements meta string encoding described in [xlang
serialization
spec](https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#meta-string)


## Related issues
- #1240 
- #1413
- #1514



## Does this PR introduce any user-facing change?

- [No] Does this PR introduce any public API change?
- [No] Does this PR introduce any binary protocol compatibility change?

---------

Co-authored-by: Shawn Yang <chaokunyang@apache.org>
chaokunyang added a commit that referenced this pull request May 2, 2024
## What does this PR do?

This PR implements type meta encoding for java proposed in #1240 .

The type meta encoding in xlang spec proposed in #1413 will be finished
in another PR based on this PR.

The spec has been updated too:

type meta header
```
|      8 bytes meta header      | meta size |   variable bytes   |  variable bytes   | variable bytes |
+-------------------------------+-----------|--------------------+-------------------+----------------+
| 7 bytes hash + 1 bytes header | 1~2 bytes | current class meta | parent class meta |      ...       |
```

And the encoding for packge/class/field name has been updated to:
```
- Package name encoding(omitted when class is registered):
    - encoding algorithm: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL`
    - Header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63`  will be used to indicate size `0~62`,
      the value `63` the size need more byte to read, the encoding will encode `size - 62` as a varint next.
- Class name encoding(omitted when class is registered):
    - encoding algorithm: `UTF8/LOWER_UPPER_DIGIT_SPECIAL/FIRST_TO_LOWER_SPECIAL/ALL_TO_LOWER_SPECIAL`
    - header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63`  will be used to indicate size `1~64`,
      the value `63` the size need more byte to read, the encoding will encode `size - 63` as a varint next.
- Field info:
    - header(8
      bits): `3 bits size + 2 bits field name encoding + polymorphism flag + nullability flag + ref tracking flag`.
      Users can use annotation to provide those info.
        - 2 bits field name encoding:
            - encoding: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL/TAG_ID`
            - If tag id is used, i.e. field name is written by an unsigned varint tag id. 2 bits encoding will be `11`.
        - size of field name:
            - The `3 bits size: 0~7`  will be used to indicate length `1~7`, the value `6` the size read more bytes,
              the encoding will encode `size - 7` as a varint next.
            - If encoding is `TAG_ID`, then num_bytes of field name will be used to store tag id.
    - Field name: If type id is set, type id will be used instead. Otherwise meta string encoding length and data will
      be written instead.
```

## Meta size
Before this PR:
```java
class org.apache.fury.benchmark.data.MediaContent 78
class org.apache.fury.benchmark.data.Media 208
class org.apache.fury.benchmark.data.Image 114
```

With this PR:
```java
class org.apache.fury.benchmark.data.MediaContent 53
class org.apache.fury.benchmark.data.Media 114
class org.apache.fury.benchmark.data.Image 68
```

The size of class meta reduced by half, which is a great gain.

The size can be reduded more if we introduce field name hash, but it's
not related to this PR. We can discuss it in another PR.

## Related issues

#1240 
#203 
#202 


## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/incubator-fury/issues/new/choose)
describing the need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?


## Benchmark

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java][Doc] Java object graph specification
4 participants