Skip to content

Conversation

@jordan-powers
Copy link
Contributor

@jordan-powers jordan-powers commented Jun 12, 2025

Follow-up to #126492 to use the json parsing optimizations for match_only_text fields.

Relates to #129072.

@jordan-powers jordan-powers requested a review from martijnvg June 12, 2025 21:52
@jordan-powers jordan-powers self-assigned this Jun 12, 2025
@jordan-powers jordan-powers requested a review from a team as a code owner June 12, 2025 21:52
@jordan-powers jordan-powers added >non-issue auto-backport Automatically create backport pull requests when merged :StorageEngine/Mapping The storage related side of mappings v8.19.0 v9.1.0 labels Jun 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

/**
* Reader that decodes UTF-8 formatted bytes into chars.
*/
public class UTF8DecodingReader extends Reader {
Copy link
Contributor Author

@jordan-powers jordan-powers Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically equivalent to:

final var reader = new InputStreamReader(new ByteArrayInputStream(utfBytes.bytes(), utfBytes.offset(), utfBytes.length()), StandardCharsets.UTF_8);

But according to the microbenchmarks, using the InputStreamReader/ByteArrayInputStream is really slow, and was actually slower than the original string-based implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the results from the microbenchmark:

Benchmark                                                  (nDocs)  Mode  Cnt    Score   Error  Units
OptimizedTextBenchmark.indexDocuments (baseline)           1048576  avgt    5  581.242 ± 3.050  ms/op
OptimizedTextBenchmark.indexDocuments (UTF8DecodingReader) 1048576  avgt    5  544.477 ± 4.961  ms/op
OptimizedTextBenchmark.indexDocuments (InputStreamReader)  1048576  avgt    5  852.380 ± 6.238  ms/op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes, this implementation makes a lot of sense then. Did you also run this PR against elastic/logs (enterprise) benchmark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet, but I plan to today

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this a final class?

@jordan-powers jordan-powers changed the title Text fields optimized text Use optimized text in match_only_text fields Jun 12, 2025
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left a few questions.

/**
* Reader that decodes UTF-8 formatted bytes into chars.
*/
public class UTF8DecodingReader extends Reader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes, this implementation makes a lot of sense then. Did you also run this PR against elastic/logs (enterprise) benchmark?

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

/**
* Reader that decodes UTF-8 formatted bytes into chars.
*/
public class UTF8DecodingReader extends Reader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this a final class?

@martijnvg
Copy link
Member

Looks like serverless check failed, because of:

[2025-06-13T22:21:59,803][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [search-TgDYeLu] fatal error in thread [elasticsearch[search-TgDYeLu][esql_worker][T#95]], exiting
java.lang.AssertionError: java.lang.IllegalStateException: can't release already released object [IntArrayBlock[positions=35, mvOrdering=UNORDERED, vector=IntArrayVector[positions=35, values=[5, 3, 0, 1, 5, 4, 5, 5, 5, 5, ...]]]]
	at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:64) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.compute.operator.AbstractPageMappingToIteratorOperator$AppendBlocksIterator.close(AbstractPageMappingToIteratorOperator.java:335) ~[?:?]
	at org.elasticsearch.compute.operator.AbstractPageMappingToIteratorOperator$RuntimeTrackingIterator.close(AbstractPageMappingToIteratorOperator.java:161) ~[?:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:39) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:72) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.compute.operator.AbstractPageMappingToIteratorOperator.close(AbstractPageMappingToIteratorOperator.java:136) ~[?:?]
	at org.elasticsearch.compute.operator.Driver.closeEarlyFinishedOperators(Driver.java:317) ~[?:?]
	at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:290) ~[?:?]
	at org.elasticsearch.compute.operator.Driver.run(Driver.java:185) ~[?:?]
	at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:406) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.compute.operator.DriverScheduler$1.doRun(DriverScheduler.java:57) ~[?:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:35) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1044) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27) ~[elasticsearch-9.1.0-SNAPSHOT.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1095) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:619) ~[?:?]
	at java.lang.Thread.run(Thread.java:1447) ~[?:?]
Caused by: java.lang.IllegalStateException: can't release already released object [IntArrayBlock[positions=35, mvOrdering=UNORDERED, vector=IntArrayVector[positions=35, values=[5, 3, 0, 1, 5, 4, 5, 5, 5, 5, ...]]]]
	at org.elasticsearch.compute.data.AbstractNonThreadSafeRefCounted.decRef(AbstractNonThreadSafeRefCounted.java:40) ~[?:?]
	at org.elasticsearch.compute.data.AbstractNonThreadSafeRefCounted.close(AbstractNonThreadSafeRefCounted.java:59) ~[?:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:39) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:48) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.compute.data.IntLookup.close(IntLookup.java:96) ~[?:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:39) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.core.Releasables.close(Releasables.java:48) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	at org.elasticsearch.core.Releasables.closeExpectNoException(Releasables.java:62) ~[elasticsearch-core-9.1.0-SNAPSHOT.jar:?]
	... 18 more

Which killed the search node and many tests failed because of this. I don't think it is related to this change.

@jordan-powers
Copy link
Contributor Author

Ok, I benchmarked this with the elastic/logs, and it doesn't seem to have sped it up very much. If anything, it slowed down the throughput by 1.5%.

In light of that, I'm not sure if we want to continue forward with this change.

|                                                        Metric |                                   Task |        Baseline |       Contender |        Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|---------------------------------------:|----------------:|----------------:|------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                        |   843.881       |   845.464       |     1.58262 |    min |   +0.19% |
|             Min cumulative indexing time across primary shard |                                        |     2.76688     |     2.80793     |     0.04105 |    min |   +1.48% |
|          Median cumulative indexing time across primary shard |                                        |    10.8209      |    10.6022      |    -0.21865 |    min |   -2.02% |
|             Max cumulative indexing time across primary shard |                                        |   155.056       |   148.468       |    -6.58773 |    min |   -4.25% |
|           Cumulative indexing throttle time of primary shards |                                        |     0           |     0           |     0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                        |     0           |     0           |     0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                        |     0           |     0           |     0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                        |     0           |     0           |     0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                        |   271.282       |   282.542       |    11.26    |    min |   +4.15% |
|                      Cumulative merge count of primary shards |                                        |   452           |   447           |    -5       |        |   -1.11% |
|                Min cumulative merge time across primary shard |                                        |     0.421517    |     0.507383    |     0.08587 |    min |  +20.37% |
|             Median cumulative merge time across primary shard |                                        |     1.9479      |     2.5712      |     0.6233  |    min |  +32.00% |
|                Max cumulative merge time across primary shard |                                        |    60.6158      |    65.9096      |     5.29378 |    min |   +8.73% |
|              Cumulative merge throttle time of primary shards |                                        |    83.5016      |    85.6329      |     2.13128 |    min |   +2.55% |
|       Min cumulative merge throttle time across primary shard |                                        |     0.0956      |     0.100017    |     0.00442 |    min |   +4.62% |
|    Median cumulative merge throttle time across primary shard |                                        |     0.453217    |     0.591433    |     0.13822 |    min |  +30.50% |
|       Max cumulative merge throttle time across primary shard |                                        |    17.9542      |    20.317       |     2.36283 |    min |  +13.16% |
|                     Cumulative refresh time of primary shards |                                        |    12.0246      |    11.3049      |    -0.71975 |    min |   -5.99% |
|                    Cumulative refresh count of primary shards |                                        |  6773           |  6487           |  -286       |        |   -4.22% |
|              Min cumulative refresh time across primary shard |                                        |     0.0152333   |     0.0136      |    -0.00163 |    min |  -10.72% |
|           Median cumulative refresh time across primary shard |                                        |     0.0656667   |     0.1037      |     0.03803 |    min |  +57.92% |
|              Max cumulative refresh time across primary shard |                                        |     3.19272     |     2.74677     |    -0.44595 |    min |  -13.97% |
|                       Cumulative flush time of primary shards |                                        |   164.641       |   158.741       |    -5.90037 |    min |   -3.58% |
|                      Cumulative flush count of primary shards |                                        |  6375           |  6025           |  -350       |        |   -5.49% |
|                Min cumulative flush time across primary shard |                                        |     0.60075     |     0.605083    |     0.00433 |    min |   +0.72% |
|             Median cumulative flush time across primary shard |                                        |     2.14908     |     2.19497     |     0.04588 |    min |   +2.14% |
|                Max cumulative flush time across primary shard |                                        |    27.8946      |    25.8562      |    -2.03843 |    min |   -7.31% |
|                                       Total Young Gen GC time |                                        |   304.427       |   312.299       |     7.872   |      s |   +2.59% |
|                                      Total Young Gen GC count |                                        |  9128           |  9292           |   164       |        |   +1.80% |
|                                         Total Old Gen GC time |                                        |     0           |     0           |     0       |      s |    0.00% |
|                                        Total Old Gen GC count |                                        |     0           |     0           |     0       |        |    0.00% |
|                                                  Dataset size |                                        |    66.4907      |    66.1832      |    -0.30756 |     GB |   -0.46% |
|                                                    Store size |                                        |    66.4907      |    66.1832      |    -0.30756 |     GB |   -0.46% |
|                                                 Translog size |                                        |     1.85905     |     4.093       |     2.23396 |     GB | +120.17% |
|                                        Heap used for segments |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                      Heap used for doc values |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                           Heap used for terms |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                           Heap used for norms |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                          Heap used for points |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                   Heap used for stored fields |                                        |     0           |     0           |     0       |     MB |    0.00% |
|                                                 Segment count |                                        |  1405           |  1496           |    91       |        |   +6.48% |
|                                   Total Ingest Pipeline count |                                        |     4.88622e+08 |     4.88622e+08 |     0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                                        |     2.86538e+07 |     2.87425e+07 | 88671       |     ms |   +0.31% |
|                                  Total Ingest Pipeline failed |                                        |     0           |     0           |     0       |        |    0.00% |
|                                                Min Throughput |                             bulk-index |  1051.65        |   955.535       |   -96.119   | docs/s |   -9.14% |
|                                               Mean Throughput |                             bulk-index | 33868.7         | 33389.5         |  -479.148   | docs/s |   -1.41% |
|                                             Median Throughput |                             bulk-index | 33915.6         | 33412.8         |  -502.861   | docs/s |   -1.48% |
|                                                Max Throughput |                             bulk-index | 37093.8         | 36478.7         |  -615.09    | docs/s |   -1.66% |
|                                       50th percentile latency |                             bulk-index |  1530.83        |  1536.39        |     5.55938 |     ms |   +0.36% |
|                                       90th percentile latency |                             bulk-index |  2714.8         |  2728.48        |    13.6765  |     ms |   +0.50% |
|                                       99th percentile latency |                             bulk-index |  4725.48        |  4777.42        |    51.9447  |     ms |   +1.10% |
|                                     99.9th percentile latency |                             bulk-index |  8483.2         |  8272.94        |  -210.258   |     ms |   -2.48% |
|                                    99.99th percentile latency |                             bulk-index | 11502.3         | 11182.9         |  -319.377   |     ms |   -2.78% |
|                                      100th percentile latency |                             bulk-index | 16531.9         | 17475.5         |   943.6     |     ms |   +5.71% |
|                                  50th percentile service time |                             bulk-index |  1531.29        |  1537.26        |     5.96548 |     ms |   +0.39% |
|                                  90th percentile service time |                             bulk-index |  2714.72        |  2727.99        |    13.2743  |     ms |   +0.49% |
|                                  99th percentile service time |                             bulk-index |  4694.21        |  4784.81        |    90.6033  |     ms |   +1.93% |
|                                99.9th percentile service time |                             bulk-index |  8482.51        |  8262.62        |  -219.887   |     ms |   -2.59% |
|                               99.99th percentile service time |                             bulk-index | 11505.4         | 11157.4         |  -347.979   |     ms |   -3.02% |
|                                 100th percentile service time |                             bulk-index | 16531.9         | 17475.5         |   943.6     |     ms |   +5.71% |
|                                                    error rate |                             bulk-index |     0           |     0           |     0       |      % |    0.00% |

@martijnvg
Copy link
Member

If anything, it slowed down the throughput by 1.5%.

Given that even median indexing throughput varies, I think that this is within noise. I don't think this change makes things slower. I suspect we don't see a gain here, given that that are many other moving parts here. The micro benchmark suggested a minor improvement. I think not having to create java strings during indexing for match only fields is a beneficial. Additionally block loaders for match only text fields can work with bytes refs instead of strings, which I think is beneficial as well.

In light of that, I'm not sure if we want to continue forward with this change.

So I think we can move forward with the change and merge it in.

@jordan-powers
Copy link
Contributor Author

Ok, I'll merge it and keep an eye on the nightlies. We can always revert it if we see something we don't like.

@jordan-powers jordan-powers merged commit 5d19997 into elastic:main Jun 17, 2025
24 checks passed
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19

elasticsearchmachine pushed a commit that referenced this pull request Jun 18, 2025
Follow-up to #126492 to use the json parsing optimizations for
match_only_text fields.

Relates to #129072.
@jordan-powers jordan-powers deleted the text-fields-optimized-text branch July 28, 2025 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants