Skip to content

[FLINK-38138] Fix OOB error in ColumnarArrayData #26826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

voonhous
Copy link
Member

What is the purpose of the change

We have noticed that an OOB error is being thrown when trying to invoke getBinary when the following conditions are met:

  1. There are multiple elements in a HeapArrayVector
  2. When trying to get an element from a non-zero offset of the array
  3. When trying to convert this array that we have obtained to a binary
Caused by: java.lang.IllegalArgumentException: 72 > 36
	at java.util.Arrays.copyOfRange(Arrays.java:3519)
	at org.apache.flink.table.data.columnar.ColumnarArrayData.getBinary(ColumnarArrayData.java:138)
	at org.apache.hudi.table.format.cow.vector.ColumnarGroupRowData.getBinary(ColumnarGroupRowData.java:121)
	at org.apache.flink.table.data.RowData.lambda$createFieldGetter$245ca7d1$3(RowData.java:228)
	at org.apache.flink.table.runtime.typeutils.RowDataSerializer.toBinaryRow(RowDataSerializer.java:207)
	at org.apache.flink.table.data.writer.AbstractBinaryWriter.writeRow(AbstractBinaryWriter.java:147)
	at org.apache.flink.table.data.writer.BinaryArrayWriter.writeRow(BinaryArrayWriter.java:30)
	at org.apache.flink.table.data.writer.BinaryWriter.write(BinaryWriter.java:155)
	at org.apache.flink.table.runtime.typeutils.MapDataSerializer.toBinaryMap(MapDataSerializer.java:179)
	at org.apache.flink.table.runtime.typeutils.MapDataSerializer.copy(MapDataSerializer.java:113)
	at org.apache.flink.table.runtime.typeutils.MapDataSerializer.copy(MapDataSerializer.java:44)
	at org.apache.flink.table.runtime.typeutils.RowDataSerializer.copyRowData(RowDataSerializer.java:170)
	at org.apache.flink.table.runtime.typeutils.RowDataSerializer.copy(RowDataSerializer.java:131)

The error below is caused by the code below:

@Override
public byte[] getBinary(int pos) {
    BytesColumnVector.Bytes byteArray = getByteArray(pos);
    if (byteArray.len == byteArray.data.length) {
        return byteArray.data;
    } else {
        return Arrays.copyOfRange(byteArray.data, byteArray.offset, byteArray.len);
    }
} 

The function signature of Arrays.copyOfRange is as such:

public static byte[] copyOfRange(byte[] original, int from, int to) 

The java.util.Arrays.copyOfRange(original, from, to) method copies the specified range of the original array into a new array.

  1. from: The initial index of the range to be copied, inclusive.
  2. to: The final index of the range to be copied, exclusive.

Hence, the correct way of invoking `Arrays.copyOfRange`` should be:

return Arrays.copyOfRange(
        byteArray.data, byteArray.offset, byteArray.offset + byteArray.len);  

Brief change log

  • Fix the OOB error by ensuring that getBinary is calling Arrays.copyOfRange with the correct arguments.

Verifying this change

  • Added unit tests for ColumnarArrayData to verify fix. Test fails before fix, but passes after

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 23, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

import org.junit.jupiter.api.DisplayName;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertArrayEquals;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are trying to go away from junit assertions to assertj
can you please replace it here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants