Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IN predicate in ColumnValue SegmentPruner (#6756) #6776

Merged
merged 1 commit into from
Apr 15, 2021

Conversation

GSharayu
Copy link
Contributor

@GSharayu GSharayu commented Apr 12, 2021

Server side segment pruning is currently supported for =, RANGE filter operators using min-max value stats (segment metadata). Similarly, bloom filter is also used for = filter.

For IN filter operator, we should add support for min-max value based pruning if the number of values in the IN clause are below a certain threshold.

Adding this support for large number of values in IN clause won't be helpful as the pruning may not happen (since values are likely to be spread across several segments) and the time to prune itself may negate the benefits. So, let's start with a configurable value with default being less than 10.

Issue #6756

private int _inPredicateThreshold;
public static final int DEFAULT_VALUE_FOR_IN_PREDICATE = 10;
public static final String CONFIG_MAX_VALUE_FOR_IN_PREDICATE = "pinot.segment.pruner.columnvalue.in.threshold";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be moved to CommonConstants in inner Server class.

We should use pinot.server.query.executor.pruner prefix since that is what is used by QueryExecutorConfig when it creates SegmentPrunerConfig. Please check the QueryExecutorConfig constructor.

Secondly, since this config is applicable to min-max column value pruner on the server, we should use the convention pinot.server.query.executor.pruner.columnvaluesegmentpruner.<your new in predicate config>

This is because we create the config for each type of pruner as seen in the following code in SegmentPrunerConfig

for (String segmentPrunerName : segmentPrunerNames) {
      _segmentPrunerNames.add(segmentPrunerName);
      _segmentPrunerConfigs.add(segmentPrunerConfig.subset(segmentPrunerName));
    }

When init() is called on ColumnValueSegmentPruner, it is initialized with all configs under pinot.server.query.executor.pruner.columnvaluesegmentpruner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

* <li>Column min/max value</li>
* </ul>
*/
private boolean pruneInPredicate(IndexSegment segment, InPredicate inPredicate, Map<String, DataSource> dataSourceCache) {
Copy link
Contributor

@siddharthteotia siddharthteotia Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) please consider adding more details to javadoc highlighting when the pruning won't happen if number of values is greater than threshold. Also add the return value - true if the segment can be pruned , false otherwise ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

return true;
}

private boolean checkMinMaxRange(DataSourceMetadata dataSourceMetadata, Comparable value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) please add brief javadoc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@codecov-io
Copy link

Codecov Report

Merging #6776 (bb212ba) into master (e9170aa) will increase coverage by 22.69%.
The diff coverage is 96.29%.

❗ Current head bb212ba differs from pull request most recent head 679dd22. Consider uploading reports for the commit 679dd22 to get more accurate results
Impacted file tree graph

@@              Coverage Diff              @@
##             master    #6776       +/-   ##
=============================================
+ Coverage     43.35%   66.04%   +22.69%     
- Complexity        7       12        +5     
=============================================
  Files          1411     1411               
  Lines         68666    68685       +19     
  Branches       9918     9924        +6     
=============================================
+ Hits          29768    45363    +15595     
+ Misses        36373    20106    -16267     
- Partials       2525     3216      +691     
Flag Coverage Δ Complexity Δ
integration ? ?
unittests 66.04% <96.29%> (?) 0.00 <0.00> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...ot/core/query/pruner/ColumnValueSegmentPruner.java 84.95% <96.29%> (-5.47%) 0.00 <0.00> (ø)
...a/org/apache/pinot/minion/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
.../apache/pinot/minion/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...pache/pinot/common/utils/grpc/GrpcQueryClient.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...pinot/minion/exception/TaskCancelledException.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...t/core/startree/plan/StarTreeDocIdSetPlanNode.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
.../core/startree/plan/StarTreeTransformPlanNode.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...core/startree/plan/StarTreeProjectionPlanNode.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
...t/minion/executor/MinionTaskZkMetadataManager.java 0.00% <0.00%> (-100.00%) 0.00% <0.00%> (ø%)
... and 1059 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9170aa...679dd22. Read the comment docs.

@@ -36,6 +38,8 @@
import org.apache.pinot.spi.data.FieldSpec.DataType;
import org.apache.pinot.spi.env.PinotConfiguration;

import static org.apache.pinot.common.utils.CommonConstants.Server.CONFIG_THRESHOLD_FOR_IN_PREDICATE;
import static org.apache.pinot.common.utils.CommonConstants.Server.DEFAULT_VALUE_FOR_IN_PREDICATE_THRESHOLD;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) style check - please avoid static imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@GSharayu GSharayu force-pushed the pinot_6756 branch 2 times, most recently from bca5caf to ff73b5a Compare April 12, 2021 20:58
@GSharayu GSharayu changed the title Support IN predicate in ColumnValue SegmentPruner(#6756) Support IN predicate in ColumnValue SegmentPruner (#6756) Apr 12, 2021
Comment on lines 240 to 241
public static final int DEFAULT_VALUE_FOR_IN_PREDICATE_THRESHOLD = 10;
public static final String CONFIG_THRESHOLD_FOR_IN_PREDICATE= "pinot.server.query.executor.pruner.columnvaluesegmentpruner.in.threshold";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final int DEFAULT_VALUE_FOR_IN_PREDICATE_THRESHOLD = 10;
public static final String CONFIG_THRESHOLD_FOR_IN_PREDICATE= "pinot.server.query.executor.pruner.columnvaluesegmentpruner.in.threshold";
public static final String CONFIG_OF_VALUE_PRUNER_IN_PREDICATE_THRESHOLD = "pinot.server.query.executor.pruner.columnvaluesegmentpruner.inpredicate.threshold";
public static final int DEFAULT_VALUE_PRUNER_IN_PREDICATE_THRESHOLD = 10;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@Override
public void init(PinotConfiguration config) {
_inPredicateThreshold = config.getProperty(Server.CONFIG_THRESHOLD_FOR_IN_PREDICATE, Server.DEFAULT_VALUE_FOR_IN_PREDICATE_THRESHOLD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we exclude the prefix pinot.server.query.executor.pruner.columnvaluesegmentpruner from the config key?

Copy link
Contributor

@siddharthteotia siddharthteotia Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might make the name look cleaner. However, keeping the dot notation based name makes it consistent with all the config keys we have today. Also the name makes it self explanatory that it is for pruner on server query execution

What do you guys think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds okay to me to keep the prefix

* <li> false if the value is greater than min value or value is smaller than max value</li>
* </ul>
*/
private boolean checkMinMaxRange(DataSourceMetadata dataSourceMetadata, Comparable value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest returning false if the value is not within the range, which is more intuitive.
(Optional) Also, we can slightly improve the performance by first perform the null check on the min/max value, then loop over the values to compare, instead of doing null checks for each value.

Copy link
Contributor Author

@GSharayu GSharayu Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! for part 1. For the optional part, how do you suggest to do that as the function might be needed to be split into separate functions for minValue and maxValue check independently for cleaner approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it this way then. This is cleaner, and should have travail performance impact.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the config key

@Override
public void init(PinotConfiguration config) {
_inPredicateThreshold = config.getProperty(Server.CONFIG_OF_VALUE_PRUNER_IN_PREDICATE_THRESHOLD, Server.DEFAULT_VALUE_PRUNER_IN_PREDICATE_THRESHOLD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect because when the config is passed from the SegmentPrunerProvider, the prefix is already removed, and here you should use key inpredicate.threshold to access the config.
The reason why the test works is because you directly pass the top-level config into this pruner class, which is not the case in the production code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

* <li> false if the value is greater than min value or value is smaller than max value</li>
* </ul>
*/
private boolean checkMinMaxRange(DataSourceMetadata dataSourceMetadata, Comparable value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it this way then. This is cleaner, and should have travail performance impact.

* <ul>
* <li>Column min/max value</li>
* </ul>
* Returns:
Copy link
Contributor

@amrishlal amrishlal Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Javadoc appears to be very verbose for this function and is taking up too many vertical lines. Can we keep it concise by getting rid of <ul>. Also I think @return is the standard way of specifying function return values (unless something has changed recently). I would suggest just having the following as javadoc:

/**
 * @return true if size of values is greater than threshold or if value is greater than min and less than max; 
 *  otherwise, false.
 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Comment on lines 273 to 272
/**
* Check if the comparable value is within min/max range
* <ul>
* <li>Column min/max value</li>
* </ul>
* Returns:
* <ul>
* <li> true if the value is greater than min value or value is smaller than max value</li>
* <li> false if the value is smaller than min value or value is greater than max value</li>
* </ul>
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. A single @return should be sufficient and will describe what the function is doing well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@GSharayu GSharayu force-pushed the pinot_6756 branch 5 times, most recently from 378fad9 to 968a815 Compare April 14, 2021 16:32
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@@ -237,6 +237,8 @@
"pinot.server.instance.realtime.alloc.offheap.direct";
public static final String PREFIX_OF_CONFIG_OF_PINOT_FS_FACTORY = "pinot.server.storage.factory";
public static final String PREFIX_OF_CONFIG_OF_PINOT_CRYPTER = "pinot.server.crypter";
public static final String CONFIG_OF_VALUE_PRUNER_IN_PREDICATE_THRESHOLD = "inpredicate.threshold";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the full config key here which can be used to update the server config, and introduce another constant in ColumnValueSegmentPruner for the config key without the prefix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants