-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IncrementalIndex.add() barfs when InputRow.getDimensions() has duplicates #63
Comments
…essor parallel query chunk processing at broker
I've reproduced this in test. The issue only occurs if the duplicate dimension has never been seen before. Each time an InputRow is added we initialize a 2D array (1 cell in top level for each dimension, with an array containing the unique values for this dimension) with size = the # of dimensions seen so far. We will see the duplicate dimension twice in this update loop. On the second time, we will find an index for this dimension (set the first time), and attempt to set the new set of values for this dimension to this index in the 2D array. However, this array was initialized based on the number of dimensions seen as of the last call to add (which is 1 too small, since we've never seen this dimension before), and we hit index out of bounds. In the case that the duplicate dimension had been seen before (in a previous call to add), the 2D array will already be properly sized. We will find an index for the duplicate dimension both times we see it in the update loop, and set it to the same set of values in the 2D array. It seems to me like duplicate dimensions should be an error, but I would like to clarify what the expected behavior is in this case. 80d8eedcf7422479020fa9388cd66f55dc74230d |
+1 on having an error for duplicate dimension, |
So assuming it is aggreed that this case is an error I see two approaches.
The existing lookup (does this dimension have an index) can be used to detect duplicates on the first occurance of the dimension (i.e. Found an index but array is too small, turning what is now an index out of bounds into something more descriptive), however it is not sufficient to detect duplicate dimensions if this dimension was seen on a previous row (the index will be found and the array sufficiently large). |
Essentially, this is alternative 2 without the need for the additional set |
Is this a hypothetical, or is this occurring somewhere in the wild? |
We can probably close right? #2017 is merged |
Fixed by #2017 |
[Backport] GREATEST/LEAST post-aggregators in SQL (apache#8719)
If InputRow.getDimensions() has duplicates, IncrementalIndex.add() fails with
java.lang.ArrayIndexOutOfBoundsException: 4
at com.metamx.druid.index.v1.IncrementalIndex.add(IncrementalIndex.java:148)
at com.metamx.druid.realtime.Sink.add(Sink.java:98)
at com.metamx.druid.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:176)
[12:29pm]
The text was updated successfully, but these errors were encountered: