Skip to content

Commit

Permalink
Apply recent C optimizations to Java encoder (#725)
Browse files Browse the repository at this point in the history
* Make benchmark runnable without oj available

* Port convert_UTF8_to_ASCII_only_JSON to Java

This is new specialized logic to reduce overhead when appending
ASCII-only strings to the generated JSON.

Original code by @byroot

See #620

* Align string generate method with generate_json_string

* Port convert_UTF8_to_JSON from C

Also includes updated logic for generate (generate_json_string)
based on current C code.

Original code by @byroot

See #620

* Use external iteration to reduce alloc

Lots of surrounding state so just take the hit of a Set and
Iterator rather than a big visitor object.

* Remove unused imports

* Inline ConvertBytes logic for long to byte[]

This change duplicates some code from JRuby to allow rendering the
fixnum value to a shared byte array rather than allocating new for
each value. Since fixnum dumping is a leaf operation, only one is
needed per session.

* Eliminate * import

* Restructure handlers for easier profiling

Anonymous classes show up as unnamed, numbered classes in profiles
which makes them difficult to read.

* Avoid allocation when writing Array delimiters

Rather than allocating a buffer to hold N copies of arrayNL, just
write it N times. We're buffering into a stream anyway.

This makes array dumping zero-alloc other than buffer growth.

* Move away from Handler abstraction

Since there's a fixed number of types we have special dumping logic
for, this abstraction just introduces overhead we don't need. This
patch starts moving away from indirecting all dumps through the
Handler abstraction and directly generating from the type switch.
This also aligns better with the main loop of the C code and should
inline and optimize better.

* Match C version of fbuffer_append_long

* Minor tweaks to reduce complexity

* Reimpl byte[] stream without synchronization

The byte[] output stream used here extended ByteArrayOutputStream
from the JDK, which sychronizes all mutation operations (like
writes). Since this is only going to be used once within a given
call stack, it needs no synchronization.

This change more than triples the performance of a benchmark of
dumping an array of empty arrays and should increase performance
of all dump forms.

* Reduce overhead in repeats

* Return incoming array if only one repeat is needed and array is
  exact size.
* Only retrieve ByteList fields once for repeat writes.

* Use equivalent of rb_sym2str

* Microoptimizations for ByteList stream

* Cast to byte not necessary

* Refactor this for better inlining

* More tiny tweaks to reduce overhead of generateString

* Refactor to avoid repeated boolean checks

* Eliminate memory accesses for digits

The math is much faster here than array access, due to bounds
checking and pointer dereferencing.

* Loosen visibility to avoid accessor methods

Java will generated accessor methods for private fields, burning
some inlining budget.

* Modify parser bench to work without oj or rapidjson
  • Loading branch information
headius authored Feb 6, 2025
1 parent c84daef commit 3232907
Show file tree
Hide file tree
Showing 7 changed files with 830 additions and 423 deletions.
66 changes: 62 additions & 4 deletions java/src/json/ext/ByteListDirectOutputStream.java
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,72 @@
import org.jcodings.Encoding;
import org.jruby.util.ByteList;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.Arrays;

public class ByteListDirectOutputStream extends OutputStream {
private byte[] buffer;
private int length;

public class ByteListDirectOutputStream extends ByteArrayOutputStream {
ByteListDirectOutputStream(int size) {
super(size);
buffer = new byte[size];
}

public ByteList toByteListDirect(Encoding encoding) {
return new ByteList(buf, 0, count, encoding, false);
return new ByteList(buffer, 0, length, encoding, false);
}

@Override
public void write(int b) throws IOException {
int currentLength = this.length;
int newLength = currentLength + 1;
byte[] buffer = ensureBuffer(this, newLength);
buffer[currentLength] = (byte) b;
this.length = newLength;
}

@Override
public void write(byte[] bytes, int start, int length) throws IOException {
int currentLength = this.length;
int newLength = currentLength + length;
byte[] buffer = ensureBuffer(this, newLength);
System.arraycopy(bytes, start, buffer, currentLength, length);
this.length = newLength;
}

@Override
public void write(byte[] bytes) throws IOException {
int myLength = this.length;
int moreLength = bytes.length;
int newLength = myLength + moreLength;
byte[] buffer = ensureBuffer(this, newLength);
System.arraycopy(bytes, 0, buffer, myLength, moreLength);
this.length = newLength;
}

private static byte[] ensureBuffer(ByteListDirectOutputStream self, int minimumLength) {
byte[] buffer = self.buffer;
int myCapacity = buffer.length;
int diff = minimumLength - myCapacity;
if (diff > 0) {
buffer = self.buffer = grow(buffer, myCapacity, diff);
}

return buffer;
}

private static byte[] grow(byte[] oldBuffer, int myCapacity, int diff) {
// grow to double current buffer length or capacity + diff, whichever is greater
int newLength = myCapacity + Math.max(myCapacity, diff);
// check overflow
if (newLength < 0) {
// try just diff length in case it can fit
newLength = myCapacity + diff;
if (newLength < 0) {
throw new ArrayIndexOutOfBoundsException("cannot allocate array of size " + myCapacity + "+" + diff);
}
}
return Arrays.copyOf(oldBuffer, newLength);
}
}
4 changes: 3 additions & 1 deletion java/src/json/ext/ByteListTranscoder.java
Original file line number Diff line number Diff line change
Expand Up @@ -143,9 +143,11 @@ protected void quoteStart() {
* until the character before it.
*/
protected void quoteStop(int endPos) throws IOException {
int quoteStart = this.quoteStart;
if (quoteStart != -1) {
ByteList src = this.src;
append(src.unsafeBytes(), src.begin() + quoteStart, endPos - quoteStart);
quoteStart = -1;
this.quoteStart = -1;
}
}

Expand Down
Loading

0 comments on commit 3232907

Please sign in to comment.