Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Optimize collection serialization protocol by homogenization #923

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
4328ad2
add codegen invocation annotation
chaokunyang Sep 24, 2023
1b3990c
optimize collection serialization protocol by homogeneous info
chaokunyang Sep 24, 2023
1aa85c1
implement interpreter optimized collection read/write
chaokunyang Oct 1, 2023
18f31f9
refine jit if/comparator exprs
chaokunyang Oct 2, 2023
e17f179
implement jit collection optimization
chaokunyang Oct 2, 2023
2e8dd80
add tests
chaokunyang Oct 2, 2023
ae76a30
update depth uo make generics push work
chaokunyang Oct 2, 2023
b85c57a
fix collection opt jit
chaokunyang Oct 2, 2023
b442a02
add collection nested opt tests
chaokunyang Oct 2, 2023
6f5b1a0
write decl class for meta share
chaokunyang Oct 2, 2023
9d037c7
use walkpath to reuse classinfo/holder
chaokunyang Oct 2, 2023
5bf51c5
fix get classinfo
chaokunyang Oct 2, 2023
ba66bc4
inline classinfo to get smaller code size
chaokunyang Oct 2, 2023
82f508b
split methods into small methods
chaokunyang Oct 2, 2023
114cb07
add non final object type tests
chaokunyang Oct 2, 2023
a0a0b88
misc fix
chaokunyang Oct 2, 2023
28f6224
add missing header
chaokunyang Oct 2, 2023
d56dd48
fix class resolver test
chaokunyang Oct 3, 2023
d932002
fix jit method split
chaokunyang Oct 3, 2023
b1b65d5
update classinfo only for not decl type
chaokunyang Oct 3, 2023
a9efdcb
Fix method split for collection jit
chaokunyang Oct 3, 2023
27ae673
add map with set elements test
chaokunyang Oct 3, 2023
dbbecc6
Optimize StringBuilder/StringBuffer serialization (#908)
pandalee99 Sep 27, 2023
d38a803
Bump release versin to 0.1.2 (#924)
chaokunyang Sep 27, 2023
d4cbbe1
[Doc] add basic type java format doc (#928)
chaokunyang Oct 3, 2023
a1598b8
[Java] speed test codegen speed by avoid duplicate codegen (#929)
chaokunyang Oct 3, 2023
ab82f4b
add collection serialization java design doc
chaokunyang Oct 3, 2023
3bf5f92
update doc
chaokunyang Oct 3, 2023
b7c1594
update doc
chaokunyang Oct 3, 2023
e936a0a
Merge remote-tracking branch 'ant/main' into optimize_collection_seri…
chaokunyang Oct 3, 2023
8a71b65
debug ci
chaokunyang Oct 3, 2023
6921600
Workaround G1ParScanThreadState::copy_to_survivor_space crash
chaokunyang Oct 3, 2023
6339490
add iterate array bench results
chaokunyang Oct 3, 2023
cbef05a
add benchmark suite
chaokunyang Oct 3, 2023
070675d
fix jvm g1 workaround
chaokunyang Oct 3, 2023
0f3fdab
add CollectionSuite header
chaokunyang Oct 3, 2023
f1ee12f
fix crash
chaokunyang Oct 3, 2023
1ceb201
skip unnecessary compress number
chaokunyang Oct 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/protocols/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Serialization Protocols
- For Java Object Graph Protocol, see [java_object_graph_guide](java_object_graph.md) doc.
- For Cross Language Object Graph Protocol, see [xlang_object_graph_guide](./xlang_object_graph.md) doc.
- For Row Format Protocol, see [row format_guide](./row_format.md) doc.
- For Java Object Graph Protocol, see [java_object_graph_format](java_object_graph.md) doc.
- For Cross Language Object Graph Protocol, see [xlang_object_graph_format](./xlang_object_graph.md) doc.
- For Row Format Protocol, see [row format](./row_format.md) doc.
30 changes: 30 additions & 0 deletions docs/protocols/java_object_graph.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,38 @@ Which encoding to choose:
- For JDK9+: fury use `coder` in `String` object for encoding, `ascii`/`utf-16` will be used for encoding.
- If the string is encoded by `utf-8`, then fury will use `utf-8` to decode the data. But currently fury doesn't enable utf-8 encoding by default for java. Cross-language string serialization of fury use `utf-8` by default.

## Array

## Collection
> All collection serializer must extends `io.fury.serializer.CollectionSerializers.CollectionSerializer`.

Format:
```java
length(positive varint) | collection header | elements header | elements data
```

### collection header
- For `ArrayList/LinkedArrayList/HashSet/LinkedHashSet`, this will be empty.
- For `TreeSet`, this will be `Comparator`
- For subclass of `ArrayList`, this may be extra object field info.

### elements header
In most cases, all collection elements are same type and not null, elements header will encode those homogeneous
information to avoid the cost of writing it for every elements. Specifically, there are four kinds of information
which will be encoded by elements header, each use one bit:
- Whether track elements ref, use first bit `0b1` of header to flag it.
- Whether collection has null, use second bit `0b10` of header to flag it. If ref tracking is enabled for this
element type, this flag is invalid.
- Whether collection elements type is not declare type, use 3rd bit `0b100` of header to flag it.
- Whether collection elements type different, use 4rd bit `0b1000` of header to flag it.

By default, all bits are unset, which means all elements won't track ref, all elements are same type,, not null and the
actual element is the declare type in custom class field.

### elements data
Based on the elements header, the serialization of elements data may skip `ref flag`/`null flag`/`element class info`.

`io.fury.serializer.CollectionSerializers.CollectionSerializer#write/read` can be taken as an example.

## Map

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@

package io.fury.benchmark;

import io.fury.memory.MemoryBuffer;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Random;
import org.openjdk.jmh.Main;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
Expand Down Expand Up @@ -44,24 +48,91 @@ public void setup() {
}
}

@Benchmark
// @Benchmark
public Object clearObjectArray(ArrayState state) {
Arrays.fill(state.objects, null);
return state.objects;
}

@Benchmark
// @Benchmark
public Object clearObjectArrayByCopy(ArrayState state) {
System.arraycopy(state.nilArray, 0, state.objects, 0, state.objects.length);
return state.objects;
}

@Benchmark
// @Benchmark
public Object clearIntArray(ArrayState state) {
Arrays.fill(state.ints, 0);
return state.ints;
}

private static Integer[] array = new Integer[100];
private static List<Integer> list = new ArrayList<>(100);

private static MemoryBuffer buffer = MemoryBuffer.newHeapBuffer(32);

static {
Random random = new Random(7);
for (int i = 0; i < 100; i++) {
int x = random.nextInt();
array[i] = x;
list.add(i, x);
}
}

// Benchmark Mode Cnt Score Error Units
// ArraySuite.iterateArray thrpt 3 18107614.727 ± 25969433.513 ops/s
// ArraySuite.iterateList thrpt 3 9448162.588 ± 13139664.082 ops/s
// ArraySuite.iterateList2 thrpt 3 14678631.109 ± 14579521.954 ops/s
// ArraySuite.serializeList thrpt 3 1659718.571 ± 1323226.629 ops/s
@Benchmark
public Object iterateArray() {
int count = 0;
for (Integer o : array) {
if (o != null) {
count += o;
}
}
return count;
}

@Benchmark
public Object iterateList() {
int count = 0;
for (Integer o : list) {
if (o != null) {
count += o;
}
}
return count;
}

@Benchmark
public Object iterateList2() {
int count = 0;
int size = list.size();
for (int i = 0; i < size; i++) {
Integer o = list.get(i);
if (o != null) {
count += o;
}
}
return count;
}

@Benchmark
public Object serializeList() {
buffer.writerIndex(0);
int size = list.size();
for (int i = 0; i < size; i++) {
Integer o = list.get(i);
if (o != null) {
buffer.writeVarInt(o);
}
}
return buffer;
}

// Mac Monterey 12.1: 2.6 GHz 6-Core Intel Core i7
// JDK11
// Benchmark (arraySize) Mode Cnt Score Error Units
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/*
* Copyright 2023 The Fury Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package io.fury.benchmark;

import io.fury.Fury;
import java.util.ArrayList;
import java.util.List;
import org.openjdk.jmh.Main;
import org.openjdk.jmh.annotations.Benchmark;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* Test suite for collection.
*
* @author chaokunyang
*/
public class CollectionSuite {
private static final Logger LOG = LoggerFactory.getLogger(CollectionSuite.class);

public static void main(String[] args) throws Exception {
if (args.length == 0) {
String commandLine = "io.*CollectionSuite.* -f 3 -wi 5 -i 5 -t 1 -w 2s -r 2s -rf csv";
System.out.println(commandLine);
args = commandLine.split(" ");
}
Main.main(args);
}

private static Fury fury = Fury.builder().build();
private static List<Integer> list1 = new ArrayList<>(1024);
private static byte[] list1Bytes;

static {
for (int i = 0; i < 1024; i++) {
list1.add(i % 255);
}
list1Bytes = fury.serialize(list1);
LOG.info("Size: {}", list1Bytes.length);
}

@Benchmark
public Object serializeArrayList() {
return fury.serialize(list1);
}

@Benchmark
public Object deserializeArrayList() {
return fury.deserialize(list1Bytes);
}
// Benchmark Mode Cnt Score Error Units
// CollectionSuite.deserializeArrayList thrpt 3 175281.624 ± 142913.891 ops/s
// CollectionSuite.serializeArrayList thrpt 3 137648.540 ± 158192.786 ops/s
}
25 changes: 25 additions & 0 deletions java/fury-core/src/main/java/io/fury/Fury.java
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,17 @@ public <T> void writeRef(MemoryBuffer buffer, T obj, Serializer<T> serializer) {
}
}

/** Write object class and data without tracking ref. */
public void writeNullable(MemoryBuffer buffer, Object obj) {
if (obj == null) {
buffer.writeByte(Fury.NULL_FLAG);
} else {
buffer.writeByte(Fury.NOT_NULL_VALUE_FLAG);
writeNonRef(buffer, obj);
}
}

/** Write object class and data without tracking ref. */
public void writeNullable(MemoryBuffer buffer, Object obj, ClassInfoCache classInfoCache) {
if (obj == null) {
buffer.writeByte(Fury.NULL_FLAG);
Expand Down Expand Up @@ -781,6 +792,16 @@ public Object readNonRef(MemoryBuffer buffer, ClassInfoCache classInfoCache) {
return readDataInternal(buffer, classResolver.readClassInfo(buffer, classInfoCache));
}

/** Read object class and data without tracking ref. */
public Object readNullable(MemoryBuffer buffer) {
byte headFlag = buffer.readByte();
if (headFlag == Fury.NULL_FLAG) {
return null;
} else {
return readNonRef(buffer);
}
}

/** Class should be read already. */
public Object readData(MemoryBuffer buffer, ClassInfo classInfo) {
depth++;
Expand Down Expand Up @@ -1197,6 +1218,10 @@ public void setDepth(int depth) {
this.depth = depth;
}

public void incDepth(int diff) {
this.depth += diff;
}

// Invoked by jit
public StringSerializer getStringSerializer() {
return stringSerializer;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
/*
* Copyright 2023 The Fury Authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package io.fury.annotation;

import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;

/**
* An annotation to mark a method will be invoked by generated method. This annotation is used for
* documentation only.
*
* @author chaokunyang
*/
@Retention(RetentionPolicy.SOURCE)
public @interface CodegenInvoke {}
Loading