Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize StringBuilder/StringBuffer serialization #908

Merged
merged 15 commits into from
Sep 27, 2023

Conversation

pandalee99
Copy link
Contributor

What do these changes do?

To optimize the code, I avoiding the unnecessary string conversion and directly operating on the internal character array and length of the StringBuffer. This would involve modifying the write() and read() methods to work directly with the StringBuffer's internal data structure.

Related issue number

Closes #94

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@chaokunyang
Copy link
Collaborator

Hi @pandalee99 , thanks for taking time to contribute to fury, I left some comments, could you please take a look at it?

@chaokunyang
Copy link
Collaborator

chaokunyang commented Sep 19, 2023

StringBuilder didn't provide a zero-copy constructor, can we optimize write in this PR only to reduce complexity.

For example, we can keep protocol consistent with StringSerializer by :

  static Tuple2<ToByteFunction, Function> builderCache;

  private static synchronized Tuple2<ToByteFunction, Function> getBuilderFunc() {
    if (builderCache == null) {
      Function getValue = (Function) Functions.makeGetterFunction(
        StringBuilder.class.getSuperclass(), "getValue");
      if (Platform.JAVA_VERSION > 8) {
        ToByteFunction getCoder = (ToByteFunction) Functions.makeGetterFunction(
          StringBuilder.class.getSuperclass(), "getCoder");
        builderCache = Tuple2.of(getCoder, getValue);
      } else {
        builderCache = Tuple2.of(null, getValue);
      }
    }
    return builderCache;
  }

  public static abstract class AbstractStringBuilderSerializer<T> extends Serializer<T> {
    protected final ToByteFunction getCoder;
    protected final Function getValue;
    protected final StringSerializer stringSerializer;

    public AbstractStringBuilderSerializer(Fury fury, Class type) {
      super(fury, type);
      Tuple2<ToByteFunction, Function> builderFunc = getBuilderFunc();
      getCoder = builderFunc.f0;
      getValue = builderFunc.f1;
      stringSerializer = fury.getStringSerializer();
    }

    @Override
    public void write(MemoryBuffer buffer, T value) {
      if (Platform.JAVA_VERSION > 8) {
        byte coder = getCoder.applyAsByte(value);
        byte[] v = (byte[]) getValue.apply(value);
        buffer.writeByte(coder);
        buffer.writeBytesWithSizeEmbedded(v);
      } else {
        char[] v = (char[]) getValue.apply(value);
        if (StringSerializer.isAscii(v)) {
          stringSerializer.writeJDK8Ascii(buffer, v, v.length);
        } else {
          stringSerializer.writeJDK8UTF16(buffer, v, v.length);
        }
      }
    }
  }

  public static final class StringBuilderSerializer extends AbstractStringBuilderSerializer<StringBuilder> {
    
    public StringBuilderSerializer(Fury fury) {
      super(fury, StringBuilder.class);
    }

    @Override
    public StringBuilder read(MemoryBuffer buffer) {
      return new StringBuilder(stringSerializer.readJavaString(buffer));
    }
  }

Same thing can be done for StringBuffer

@pandalee99
Copy link
Contributor Author

    char[] v = (char[]) getValue.apply(value);

Thank you, but I found a problem, the problem is, if using this way to get char[], then the final result is wrong, the length of char[] will be 19, the content will be "strnullnullnullnull..." This could be a jdk bug

@chaokunyang
Copy link
Collaborator

chaokunyang commented Sep 26, 2023

@pandalee99 io.fury.util.function.ToByteFunction is not defined in jdk library, we can't use it to generate accessor lambda for jdk classes, otherwise jdk will raise ClassNotFoundException. sorry for misleading comments.

Can we add a new makeFunction in Functions:

  public static Object makeGetterFunction(Method method) {
    return makeGetterFunction(method, method.getReturnType());
  }
  
  public static Object makeGetterFunction(Method method, Class<?> returnType) {
    MethodHandles.Lookup lookup = _JDKAccess._trustedLookup(method.getDeclaringClass());
    try {
      // Why `lookup.findGetter` doesn't work?
      // MethodHandle handle = lookup.findGetter(field.getDeclaringClass(), field.getName(),
      // field.getType());
      MethodHandle handle = lookup.unreflect(method);
      return _JDKAccess.makeGetterFunction(lookup, handle, returnType);
    } catch (IllegalAccessException ex) {
      throw new RuntimeException(ex);
    }
  }

then creating accessor by :

    Method getCoder = StringBuilder.class.getSuperclass().getDeclaredMethod("getCoder");
    ToIntFunction<CharSequence> o = (ToIntFunction<CharSequence>) makeGetterFunction(lookup, lookup.unreflect(getCoder), int.class);
    System.out.println(o);
    System.out.println(o.applyAsInt(new StringBuilder("abc")));
    System.out.println(o.applyAsInt(new StringBuilder("abc你好")));

@pandalee99
Copy link
Contributor Author

pandalee99 commented Sep 26, 2023

thanks.
but,Some of the method of the Functions are repeated. and the lookup in the java.lang.CharacterName is private .maybe have other way, and the ToIntFunction maybe not able to Type conversion
and then,
How to solve the problem of local build failure in jdk9+? When I compile, it tells me that sun.misc does not exist

pandalee99 and others added 2 commits September 27, 2023 11:26
commit

Co-authored-by: Shawn <shawn.ck.yang@gmail.com>
commit

Co-authored-by: Shawn <shawn.ck.yang@gmail.com>
@pandalee99
Copy link
Contributor Author

so good!

Copy link
Collaborator

@chaokunyang chaokunyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you so much for your contribution. It's not an easy work for such optimization

@chaokunyang chaokunyang merged commit e71cf72 into apache:main Sep 27, 2023
chaokunyang added a commit to chaokunyang/fury that referenced this pull request Oct 3, 2023
* Optimize StringBuilder/StringBuffer serialization

* try to optimize StringBuilder

* first to Check code Style

* hidden

* hidden

* bug fix and check code style

* delete excess code and add buffers to try testing

* fix

* try to fix problem

* fix function

* code fix

* code fix again

* Update java/fury-core/src/main/java/io/fury/serializer/Serializers.java

commit

Co-authored-by: Shawn <shawn.ck.yang@gmail.com>

* Update java/fury-core/src/main/java/io/fury/serializer/Serializers.java

commit

Co-authored-by: Shawn <shawn.ck.yang@gmail.com>

---------

Co-authored-by: pankoli <pankoli@tencent.com>
Co-authored-by: Shawn <shawn.ck.yang@gmail.com>
chaokunyang added a commit that referenced this pull request Oct 3, 2023
…923)

* add codegen invocation annotation

* optimize collection serialization protocol by homogeneous info

* implement interpreter optimized collection read/write

* refine jit if/comparator exprs

* implement jit collection optimization

* add tests

* update depth uo make generics push work

* fix collection opt jit

* add collection nested opt tests

* write decl class for meta share

* use walkpath to reuse classinfo/holder

* fix get classinfo

* inline classinfo to get smaller code size

* split methods into small methods

* add non final object type tests

* misc fix

* add missing header

* fix class resolver test

* fix jit method split

* update classinfo only for not decl type

* Fix method split for collection jit

* add map with set elements test

* Optimize StringBuilder/StringBuffer serialization (#908)

* Optimize StringBuilder/StringBuffer serialization

* try to optimize StringBuilder

* first to Check code Style

* hidden

* hidden

* bug fix and check code style

* delete excess code and add buffers to try testing

* fix

* try to fix problem

* fix function

* code fix

* code fix again

* Update java/fury-core/src/main/java/io/fury/serializer/Serializers.java

commit

Co-authored-by: Shawn <shawn.ck.yang@gmail.com>

* Update java/fury-core/src/main/java/io/fury/serializer/Serializers.java

commit

Co-authored-by: Shawn <shawn.ck.yang@gmail.com>

---------

Co-authored-by: pankoli <pankoli@tencent.com>
Co-authored-by: Shawn <shawn.ck.yang@gmail.com>

* Bump release versin to 0.1.2 (#924)

* [Doc] add basic type java format doc (#928)

add basic type java format doc

* [Java] speed test codegen speed by avoid duplicate codegen (#929)

* speed test codegen speed by avoid duplicate codegen

* fix cache

* fix cllass gc

* use a standalone lock for every key

* refine gc trigger

* skip cache for furyGC tests

* fix gc tests

* lint code

* add collection serialization java design doc

* update doc

* update doc

* debug ci

* Workaround G1ParScanThreadState::copy_to_survivor_space crash

* add iterate array bench results

* add benchmark suite

* fix jvm g1 workaround

* add CollectionSuite header

* fix crash

* skip unnecessary compress number

---------

Co-authored-by: PAN <46820719+pandalee99@users.noreply.github.com>
Co-authored-by: pankoli <pankoli@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Java] Optimize StringBuilder/StringBuffer serialization
2 participants