-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No array column but get "Array index out of range: 1048576" #378
Comments
Good search, can you try to modify the compression codes from:
to
I think this may be the reason. |
Thanks for the suggestion. I've thought about this as well. The problem is, based on the comments of the function /**
* Compresses <code>src[srcOff:srcOff+srcLen]</code> into
* <code>dest[destOff:destOff+maxDestLen]</code> and returns the compressed
* length.
*
* This method will throw a {@link LZ4Exception} if this compressor is unable
* to compress the input into less than <code>maxDestLen</code> bytes. To
* prevent this exception to be thrown, you should make sure that
* <code>maxDestLen >= maxCompressedLength(srcLen)</code>.
*
* @param src the source data
* @param srcOff the start offset in src
* @param srcLen the number of bytes to compress
* @param dest the destination buffer
* @param destOff the start offset in dest
* @param maxDestLen the maximum number of bytes to write in dest
* @throws LZ4Exception if maxDestLen is too small
* @return the compressed size
*/
public abstract int compress(byte[] src, int srcOff, int srcLen, byte[] dest, int destOff, int maxDestLen); If the exception is caused by last parameter(aka. In CompressedBuffedWriter.java, we can find So I think if |
Yes, surely From the comment: param maxDestLen the maximum number of bytes to write in dest, but what if the maxDestLen is calculated from the destOff ? Now we give a too large value to |
Good point, let me give it a try |
Oh, I think that's the reason: |
Re-ran the pipeline for the whole night with the new-built Jar, the problem seems to persist. In fact, I found in the code of the latest release, it used int res = lz4Compressor.compress(writtenBuf, 0, position, compressedBuffer, 9 + 16); Based on LZ4Compressor.java, it is an overloading method of public final int compress(byte[] src, int srcOff, int srcLen, byte[] dest, int destOff) {
return compress(src, srcOff, srcLen, dest, destOff, dest.length - destOff);
} So it's same as int res = lz4Compressor.compress(writtenBuf, 0, position, compressedBuffer, 9 + 16, compressedBuffer.length - (9 + 16)); Since But still, we didn't find the root cause of "Array index out of range: 1048576" |
I may find the reason. In the code of realising public final int compress(byte[] src, int srcOff, int srcLen, byte[] dest,
int destOff, int maxDestLen) {
checkRange(src, srcOff, srcLen);
checkRange(dest, destOff, maxDestLen);
... Considering public static void checkRange(byte[] buf, int off) {
if (off < 0 || off >= buf.length) {
throw new ArrayIndexOutOfBoundsException(off);
}
}
public static void checkRange(byte[] buf, int off, int len) {
checkLength(len);
if (len > 0) {
checkRange(buf, off);
checkRange(buf, off + len - 1);
}
} So this problem could also be caused by the array The variable public BinarySerializer(BuffedWriter writer, boolean enableCompress) {
this.enableCompress = enableCompress;
BuffedWriter compressBuffer = null;
if (enableCompress) {
compressBuffer = new CompressedBuffedWriter(ClickHouseDefines.SOCKET_SEND_BUFFER_BYTES, writer);
}
either = new Either<>(writer, compressBuffer);
} The I'll try again with a bigger value for |
Seems the position is 1048577, but I did not find any reason to see why.
You can modify the code, print logs if the position is larger than 1048576. |
Environment
Error logs
Steps to reproduce
Using spark to ingest a large dataframe(> 100M rows) with many columns(>5000) into Clickhouse.
Here is the code I use:
Other descriptions
Based on this issue, this could be caused by inserting array column. But the dataframe I inserted contains only StringType, timestamp and LongType.
I've also tried to do some investigation on these source codes:
Unfortunately, I still couldn't find the cause.
The text was updated successfully, but these errors were encountered: