Performance Test - Evaluate OpenSearch read performance #403

penghuo · 2024-06-28T15:19:31Z

OpenSearch Reader Performance evaluation with PIT

PIT on shard with search_after and PIT on slice with search_after achieve similar results. However, increasing the slice count with PIT on slice with search_after leads to higher query latency.

PIT with search_after, In the read path, create a PIT of index and assigned to a Spark task. there is only 1 Spark tasks running to read data from OpenSearch.
PIT on shard with search_after: In the read path, each OpenSearch index shard is assigned to a Spark task. For example, if the OpenSearch index has 5 shards, then 5 Spark tasks will run in parallel.
PIT on slice with search_after. In the read path, OpenSearch index is split into slice, each slice is assigned to a Spark task. For instance, if the OpenSearch index has 10 slices, then 10 Spark tasks will run in paralle.
Note: if slice count equal to shard count, slice query is rewritten as shard query (MatchAllDocsQuery on shards). the query performance is as same as PIT on shard with search_after.

Query	PIT with search_after	PIT on shard with search_after	PIT on slice with search_after
SELECT COUNT(*) FROM dev.default.`logs-181998`	49745.0	18256.0	18667.0
SELECT COUNT(*) FROM dev.default.`logs-181998` WHERE status <> 0;	50981.0	18247.0	19006.0
SELECT COUNT(*), AVG(size) FROM dev.default.`logs-181998`;	49494.0	19951.0	20171.0
SELECT AVG(CAST(size AS BIGINT)) FROM dev.default.`logs-181998`;	50906.0	19719.0	20326.0
SELECT MIN(`@timestamp`), MAX(`@timestamp`) FROM dev.default.`logs-181998`;	44016.0	20459.0	19915.0
SELECT status, COUNT() FROM dev.default.logs-181998 WHERE status <> 0 GROUP BY status ORDER BY COUNT() DESC;	50560.0	20686.0	20550.0

increase slice count
Ideally, increasing the slice count should enhance parallelization. However, test results indicate that query latency increases when the slice count is raised.
The root cause is slice query is rewrite as TermsSliceQuery which use _id as field. For instance, the following query. is rewrite as TermsSliceQuery on shard 0 (MatchNoDocsQuery on other shards). The cost of this filter is O(N*M) where N is the number of unique terms in the dictionary and M is the average number of documents per term. In our case, the cost of this fitler is O(N) (_id is unique, and M=1).

Query	Slice Counts == Shard Counts	Slice Counts = 2 * Shard Counts
SELECT COUNT(*) FROM dev.default.`logs-181998`	18667	35511
SELECT COUNT(*) FROM dev.default.`logs-181998` WHERE status <> 0;	19006	34497
SELECT COUNT(*), AVG(size) FROM dev.default.`logs-181998`;	20171	32213
SELECT AVG(CAST(size AS BIGINT)) FROM dev.default.`logs-181998`;	20326	35226
SELECT MIN(`@timestamp`), MAX(`@timestamp`) FROM dev.default.`logs-181998`;	19915	32996
SELECT status, COUNT() FROM dev.default.logs-181998 WHERE status <> 0 GROUP BY status ORDER BY COUNT() DESC;	20550	37419

The text was updated successfully, but these errors were encountered:

penghuo mentioned this issue Jun 28, 2024

[EPIC] Zero-ETL - OpenSearch Table #185

Open

12 tasks

github-actions bot added the untriaged label Jun 28, 2024

penghuo mentioned this issue Jul 2, 2024

support shard level split on read path #402

Merged

penghuo closed this as completed Jul 12, 2024

penghuo reopened this Jul 12, 2024

penghuo removed the untriaged label Jul 15, 2024

penghuo self-assigned this Jul 15, 2024

penghuo mentioned this issue Jul 15, 2024

Support read OpenSearch using PIT #430

Closed

3 tasks

penghuo closed this as completed Aug 6, 2024