The default pruning threshold for inpredicate might be too low #14380

UOETianleZhang · 2024-11-04T20:42:19Z

We are trying to use bloom filters to reduce the latency of queries that have a long IN caluse (number of elements in the IN caluse is ~50). However we see bllom filters are not taking effects.

After digging it, we found there is a server config which will disable pruning if the number of values in the in predicate is larger than 10 (default).

Do we know the reason of setting this default number as 10? Applying pruning on a large IN clause will lead to diminishing returns, but even if we take this into consideration, 10 looks too conversative for me.

jasperjiaguo · 2024-11-04T20:50:05Z

Yes, in our case we see a quite substantial improvement with bloomfilter pruning added to a high cardinality dictionary-enabled column (@UOETianleZhang can probably share some anonymous number here). This kind of tells us for dictionary enabled column binary search is slower than hashing (bloomfilter). Therefore the gain would proably be more prominent when the number of values in in clause is larger? Unless we are sure that these values would exist in every segment we query.

UOETianleZhang · 2024-11-04T20:57:26Z

With some benchmarking, we see increasing the limit will have a great improvement on the latency.

Column cardinality: 5,573,103
Column type: STRING.
Query pattern: SELECT column FROM table WHERE column IN (var1, var2, ... var41) LIMIT 1

After increasing the pruning limit from 10 to 100, the latency reduced from 20 seconds to 286 miliseconds.

jasperjiaguo · 2024-11-04T21:08:43Z

Oh I think the PR is #6776

Jackie-Jiang added the troubleshooting label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The default pruning threshold for inpredicate might be too low #14380

The default pruning threshold for inpredicate might be too low #14380

UOETianleZhang commented Nov 4, 2024

jasperjiaguo commented Nov 4, 2024

UOETianleZhang commented Nov 4, 2024

jasperjiaguo commented Nov 4, 2024

The default pruning threshold for inpredicate might be too low #14380

The default pruning threshold for inpredicate might be too low #14380

Comments

UOETianleZhang commented Nov 4, 2024

jasperjiaguo commented Nov 4, 2024

UOETianleZhang commented Nov 4, 2024

jasperjiaguo commented Nov 4, 2024