You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PIT on shard with search_after and PIT on slice with search_after achieve similar results. However, increasing the slice count with PIT on slice with search_after leads to higher query latency.
Test
PIT with search_after, In the read path, create a PIT of index and assigned to a Spark task. there is only 1 Spark tasks running to read data from OpenSearch.
PIT on shard with search_after: In the read path, each OpenSearch index shard is assigned to a Spark task. For example, if the OpenSearch index has 5 shards, then 5 Spark tasks will run in parallel.
PIT on slice with search_after. In the read path, OpenSearch index is split into slice, each slice is assigned to a Spark task. For instance, if the OpenSearch index has 10 slices, then 10 Spark tasks will run in paralle.
Note: if slice count equal to shard count, slice query is rewritten as shard query (MatchAllDocsQuery on shards). the query performance is as same as PIT on shard with search_after.
Query
PIT with search_after
PIT on shard with search_after
PIT on slice with search_after
SELECT COUNT(*) FROM dev.default.logs-181998
49745.0
18256.0
18667.0
SELECT COUNT(*) FROM dev.default.logs-181998 WHERE status <> 0;
50981.0
18247.0
19006.0
SELECT COUNT(*), AVG(size) FROM dev.default.logs-181998;
49494.0
19951.0
20171.0
SELECT AVG(CAST(size AS BIGINT)) FROM dev.default.logs-181998;
50906.0
19719.0
20326.0
SELECT MIN(@timestamp), MAX(@timestamp) FROM dev.default.logs-181998;
44016.0
20459.0
19915.0
SELECT status, COUNT() FROM dev.default.logs-181998 WHERE status <> 0 GROUP BY status ORDER BY COUNT() DESC;
50560.0
20686.0
20550.0
increase slice count
Ideally, increasing the slice count should enhance parallelization. However, test results indicate that query latency increases when the slice count is raised.
The root cause is slice query is rewrite as TermsSliceQuery which use _id as field. For instance, the following query. is rewrite as TermsSliceQuery on shard 0 (MatchNoDocsQuery on other shards). The cost of this filter is O(N*M) where N is the number of unique terms in the dictionary and M is the average number of documents per term. In our case, the cost of this fitler is O(N) (_id is unique, and M=1).
Query
Slice Counts == Shard Counts
Slice Counts = 2 * Shard Counts
SELECT COUNT(*) FROM dev.default.logs-181998
18667
35511
SELECT COUNT(*) FROM dev.default.logs-181998 WHERE status <> 0;
19006
34497
SELECT COUNT(*), AVG(size) FROM dev.default.logs-181998;
20171
32213
SELECT AVG(CAST(size AS BIGINT)) FROM dev.default.logs-181998;
20326
35226
SELECT MIN(@timestamp), MAX(@timestamp) FROM dev.default.logs-181998;
19915
32996
SELECT status, COUNT() FROM dev.default.logs-181998 WHERE status <> 0 GROUP BY status ORDER BY COUNT() DESC;
20550
37419
The text was updated successfully, but these errors were encountered:
OpenSearch Reader Performance evaluation with PIT
Summary
Test
Note: if slice count equal to shard count, slice query is rewritten as shard query (MatchAllDocsQuery on shards). the query performance is as same as PIT on shard with search_after.
logs-181998
logs-181998
WHERE status <> 0;logs-181998
;logs-181998
;@timestamp
), MAX(@timestamp
) FROM dev.default.logs-181998
;Ideally, increasing the slice count should enhance parallelization. However, test results indicate that query latency increases when the slice count is raised.
The root cause is slice query is rewrite as TermsSliceQuery which use _id as field. For instance, the following query. is rewrite as TermsSliceQuery on shard 0 (MatchNoDocsQuery on other shards). The cost of this filter is O(N*M) where N is the number of unique terms in the dictionary and M is the average number of documents per term. In our case, the cost of this fitler is O(N) (_id is unique, and M=1).
logs-181998
logs-181998
WHERE status <> 0;logs-181998
;logs-181998
;@timestamp
), MAX(@timestamp
) FROM dev.default.logs-181998
;The text was updated successfully, but these errors were encountered: