Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(java): type meta encoding for java #1556

Merged
merged 39 commits into from
May 2, 2024

Conversation

chaokunyang
Copy link
Collaborator

@chaokunyang chaokunyang commented Apr 22, 2024

What does this PR do?

This PR implements type meta encoding for java proposed in #1240 .

The type meta encoding in xlang spec proposed in #1413 will be finished in another PR based on this PR.

The spec has been updated too:

type meta header

|      8 bytes meta header      | meta size |   variable bytes   |  variable bytes   | variable bytes |
+-------------------------------+-----------|--------------------+-------------------+----------------+
| 7 bytes hash + 1 bytes header | 1~2 bytes | current class meta | parent class meta |      ...       |

And the encoding for packge/class/field name has been updated to:

- Package name encoding(omitted when class is registered):
    - encoding algorithm: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL`
    - Header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63`  will be used to indicate size `0~62`,
      the value `63` the size need more byte to read, the encoding will encode `size - 62` as a varint next.
- Class name encoding(omitted when class is registered):
    - encoding algorithm: `UTF8/LOWER_UPPER_DIGIT_SPECIAL/FIRST_TO_LOWER_SPECIAL/ALL_TO_LOWER_SPECIAL`
    - header: `6 bits size | 2 bits encoding flags`. The `6 bits size: 0~63`  will be used to indicate size `1~64`,
      the value `63` the size need more byte to read, the encoding will encode `size - 63` as a varint next.
- Field info:
    - header(8
      bits): `3 bits size + 2 bits field name encoding + polymorphism flag + nullability flag + ref tracking flag`.
      Users can use annotation to provide those info.
        - 2 bits field name encoding:
            - encoding: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL/TAG_ID`
            - If tag id is used, i.e. field name is written by an unsigned varint tag id. 2 bits encoding will be `11`.
        - size of field name:
            - The `3 bits size: 0~7`  will be used to indicate length `1~7`, the value `6` the size read more bytes,
              the encoding will encode `size - 7` as a varint next.
            - If encoding is `TAG_ID`, then num_bytes of field name will be used to store tag id.
    - Field name: If type id is set, type id will be used instead. Otherwise meta string encoding length and data will
      be written instead.

Meta size

Before this PR:

class org.apache.fury.benchmark.data.MediaContent 78
class org.apache.fury.benchmark.data.Media 208
class org.apache.fury.benchmark.data.Image 114

With this PR:

class org.apache.fury.benchmark.data.MediaContent 53
class org.apache.fury.benchmark.data.Media 114
class org.apache.fury.benchmark.data.Image 68

The size of class meta reduced by half, which is a great gain.

The size can be reduded more if we introduce field name hash, but it's not related to this PR. We can discuss it in another PR.

Related issues

#1240
#203
#202

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change?
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@chaokunyang chaokunyang marked this pull request as ready for review April 30, 2024 16:23
@chaokunyang
Copy link
Collaborator Author

@theweipeng @LiangliangSui @PragmaTwice Could you help take a look at this PR?

@chaokunyang
Copy link
Collaborator Author

Hi @MrChang0, if you are using meta share mode, I believe your case can benefit from this PR too.

@MrChang0
Copy link
Contributor

MrChang0 commented May 2, 2024

Hi @MrChang0, if you are using meta share mode, I believe your case can benefit from this PR too.

I perfer think COMPATIBLE + metaContext is best way for me, it takes high performance, little data and we can add/delete fields(not like protosuff)
btw, shared metaContext will take some trouble in RPC so I serialize metaContext every time. will this feature merge recently so that I can use it when after holiday.

@chaokunyang
Copy link
Collaborator Author

Hi @MrChang0, if you are using meta share mode, I believe your case can benefit from this PR too.

I perfer think COMPATIBLE + metaContext is best way for me, it takes high performance, little data and we can add/delete fields(not like protosuff) btw, shared metaContext will take some trouble in RPC so I serialize metaContext every time. will this feature merge recently so that I can use it when after holiday.

This will be merged soon and I will submit some prs later to support serialize set and meta context everytime automatically. Automatic meta share mode will be the default compatible mode on the end

@chaokunyang chaokunyang merged commit 63bbe45 into apache:main May 2, 2024
33 checks passed
chaokunyang added a commit that referenced this pull request May 3, 2024
## What does this PR do?

Fix ci failuure introduced in #1556 :

![image](https://github.com/apache/incubator-fury/assets/12445254/5977b4ca-07b9-456b-82d8-a2779a08d01f)


## Related issues

<!--
Is there any related issue? Please attach here.

- #xxxx0
- #xxxx1
- #xxxx2
-->


## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/incubator-fury/issues/new/choose)
describing the need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?


## Benchmark

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
chaokunyang added a commit that referenced this pull request May 6, 2024
## What does this PR do?

Update type meta field info spec:
```
- field info:
    - header(8
      bits): `3 bits size + 2 bits field name encoding + polymorphism flag + nullability flag + ref tracking flag`.
      Users can use annotation to provide those info.
        - 2 bits field name encoding:
            - encoding: `UTF8/ALL_TO_LOWER_SPECIAL/LOWER_UPPER_DIGIT_SPECIAL/TAG_ID`
            - If tag id is used, i.e. field name is written by an unsigned varint tag id. 2 bits encoding will be `11`.
        - size of field name:
            - The `3 bits size: 0~7`  will be used to indicate length `1~7`, the value `7` indicates to read more bytes,
              the encoding will encode `size - 7` as a varint next.
            - If encoding is `TAG_ID`, then num_bytes of field name will be used to store tag id.
        - ref tracking: when set to 1, ref tracking will be enabled for this field.
        - nullability: when set to 1, this field can be null.
        - polymorphism: when set to 1, the actual type of field will be the declared field type even the type if
          not `final`.
    - field name: If tag id is set, tag id will be used instead. Otherwise meta string encoding `[length]` and data will
      be written instead.
```

## Related issues
#1556 


## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/incubator-fury/issues/new/choose)
describing the need to do so and update the document if necessary.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?


## Benchmark

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants