Implement shortcut for 0.0 and 1.0 percentile calculations, and add two new tests #382

JingHuaMan · 2021-04-23T14:51:18Z

If the percentile is 0 or 1, it's unnecessary to sort all the elements in the container before getting the output by the index; instead, we can just scan all the items in one traversal and get the minimum or maximum with a time complexity of O(n).

All the tests passed after the modification. Besides, in the test testPercentileWithStringsAndFunction, since the function is String::length and each string in the stream is of length 1, the sorting result should be the same as the original order of the input elements.

I think the orginal test is not enough, so I add two new tests for the function percentileBy.

…wo new tests

…two new tests

lukaseder · 2021-05-21T08:19:35Z

jOOL-java-8/src/main/java/org/jooq/lambda/Agg.java

+     *
+     * @param function map the items in the streams into values
+     * @param comparator comparator used for sorting the items
+     * @return a collector that calculates the derived <code>PERCENTILE_DISC(percentile)</code> function


It's probably a good idea to document these in general, but I prefer this be done in a separate task, for the entire API, not just this method. I've created a new issue to track this: #388.

lukaseder · 2021-05-21T08:20:53Z

jOOL-java-8/src/main/java/org/jooq/lambda/Agg.java

+        // CS304 Issue link: https://github.com/jOOQ/jOOL/issues/376
+        if (percentile == 0.0)
+            // If percentile is 0, this is the same as taking the item with the minimum value.
+            return minBy(function, comparator);


The comment says the same thing as the method call, so it isn't really necessary. I'll remove it again after merging.

lukaseder · 2021-05-21T08:22:56Z

jOOL-java-8/src/main/java/org/jooq/lambda/Agg.java

     */
    public static <T, U> Collector<T, ?, Optional<T>> percentileBy(double percentile, Function<? super T, ? extends U> function, Comparator<? super U> comparator) {
        if (percentile < 0.0 || percentile > 1.0)
            throw new IllegalArgumentException("Percentile must be between 0.0 and 1.0");

+        // CS304 Issue link: https://github.com/jOOQ/jOOL/issues/376


I don't know what CS304 means. Some reference to some external tracking system? There's no need for this, I will remove it. The convention to track github issues (if necessary) would be to use:

// [#376] Rationale

lukaseder · 2021-05-21T08:24:49Z

jOOL-java-8/src/main/java/org/jooq/lambda/Agg.java

+            // If there are multiple maxima, take the last one.
+            return maxBy(function, (o1, o2) -> {
+                int compareResult = comparator.compare(o1, o2);
+                return compareResult == 0 ? -1 : compareResult;


This hack violates the Comparator contract. We can't implement it like this.

This seems to be the correct way to implement this:

collectingAndThen(maxAllBy(function, comparator), s -> s.findLast())

Emmm, I think my solution is not incorrect, because the function maxBy will not sort the elements but compare them one by one, and the input comparator will not be modified. So even though this implementation violates the design philosophy of Comparator, it works and there seems no potential problem with it.

It's perfectly fine to implement this with maxAllBy, but in the worst case the space complexity will be O(n). That's my concern.

It's incorrect. If comparator.compare(o1, o2) == 0, then comparator.compare(o2, o1) == 0, yet you return -1 in both cases. I don't want to spend the time now to find an edge case where this breaks sorting algorithms, but it should be easy to get an intuition about how this hack feels very wrong

because the function maxBy will not sort the elements but compare them one by one

You shouldn't rely on such an implementation detail.

So even though this implementation violates the design philosophy of Comparator, it works and there seems no potential problem with it.

Famous last words :)

It's perfectly fine to implement this with maxAllBy, but in the worst case the space complexity will be O(n). That's my concern.

I'm open to other suggestions, but correctness always beats performance.

Yes, you are right. Thanks for your reply!

JingHuaMan and others added 5 commits April 23, 2021 22:41

Implement shortcut for 0.0 and 1.0 percentile calculations, and add t…

934fe92

…wo new tests

Implement shortcut for 0.0 and 1.0 percentile calculations, and add …

8fd0eee

…two new tests

Modify the javadoc for the function PERCENTILEBY

dabfeaf

Modify javadoc for PERCENTILEBY

d3b7bce

Modify javadoc for PERCENTILEBY

7a4a5cd

lukaseder mentioned this pull request May 21, 2021

Feature issue 376 #381

Closed

lukaseder added this to the Version 0.9.15 milestone May 21, 2021

lukaseder added P: Medium T: Enhancement labels May 21, 2021

lukaseder reviewed May 21, 2021

View reviewed changes

lukaseder merged commit ee8b7e1 into jOOQ:main May 21, 2021

lukaseder reviewed May 21, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement shortcut for 0.0 and 1.0 percentile calculations, and add two new tests #382

Implement shortcut for 0.0 and 1.0 percentile calculations, and add two new tests #382

JingHuaMan commented Apr 23, 2021

lukaseder May 21, 2021

lukaseder May 21, 2021

lukaseder May 21, 2021

lukaseder May 21, 2021

lukaseder May 21, 2021

JingHuaMan May 21, 2021

lukaseder May 21, 2021

JingHuaMan May 21, 2021

Implement shortcut for 0.0 and 1.0 percentile calculations, and add two new tests #382

Implement shortcut for 0.0 and 1.0 percentile calculations, and add two new tests #382

Conversation

JingHuaMan commented Apr 23, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment