Cache the creation of parsers within `DateProcessor` #92880

HiDAl · 2023-01-12T15:42:27Z

This PR introduces a cache mechanism within the DateProcessor class. This cache helps to memoize the creation of expensive objects, which are created through calls to DateFormatter.forPattern(...) method and other expensive methods.

We have benchmarked the change, obtaining an increase in performance between 10% and 80% for some of the logging benchmarks (~80% for logs-postgresql.log-1.2.0).

Benchmark

Benchmarked using esbench (use-case=simple, track.name=elastic/logs)
Beckhmarked 2 caches (size 16 & 64). There's a slight difference in processing time between them (200ns - 1ms), with the smaller cache as a winner.
Below results are summarised (Full results)

BASELINE

Ingest Summary:
===============
          count  time_in_millis  time_in_nanos
total  32367760         1532279          47340

Pipeline Summary:
=================
                                          count  time_in_millis  time_in_nanos  percent
logs-nginx.access-1.2.0                 1769000          188822         106740    24.0%
logs-postgresql.log-1.2.0                320000          185256         578924    23.6%
logs-system.syslog-1.6.4                2234000          109214          48888    13.9%
logs-system.auth-1.6.4                   195881           72172         368447     9.2%
logs-redis.log-1.1.0                    1600000           50652          31658     6.4%
.fleet_final_pipeline-1                 9398881           47060           5007     6.0%
logs-kafka.log-0.5.0                     640000           28595          44680     3.6%
logs-apache.access-1.3.0                 320000           27248          85150     3.5%
logs-mysql.error-1.1.0                   560000           24513          43774     3.1%
logs-apache.error-1.3.0                  320000           19929          62278     2.5%
logs-postgresql.log-1.2.0-pipeline-log   320000           14646          45769     1.9%
logs-nginx.error-1.2.0                   320000           12374          38669     1.6%
logs-mysql.slowlog-1.1.0                 160000            5424          33900     0.7%

CACHE OF SIZE 16

Ingest Summary:
===============
          count  time_in_millis  time_in_nanos
total  33376000         1412871          42332

Pipeline Summary:
=================
                                          count  time_in_millis  time_in_nanos  percent
logs-nginx.access-1.2.0                 1920000          188206          98024    29.3%
logs-system.auth-1.6.4                   320000          116025         362577    18.0%
logs-system.syslog-1.6.4                2240000           91281          40751    14.2%
.fleet_final_pipeline-1                 9848000           44558           4525     6.9%
logs-postgresql.log-1.2.0                320000           44092         137788     6.9%
logs-redis.log-1.1.0                    1600000           37506          23442     5.8%
logs-kafka.log-0.5.0                     728000           26900          36951     4.2%
logs-apache.access-1.3.0                 320000           25326          79144     3.9%
logs-mysql.error-1.1.0                   640000           21607          33761     3.4%
logs-apache.error-1.3.0                  320000           18109          56591     2.8%
logs-postgresql.log-1.2.0-pipeline-log   320000           13532          42288     2.1%
logs-nginx.error-1.2.0                   320000           10569          33029     1.6%
logs-mysql.slowlog-1.1.0                 160000            5708          35675     0.9%

CACHE OF SIZE 64

Ingest Summary:
===============
          count  time_in_millis  time_in_nanos
total  33376000         1445614          43313

Pipeline Summary:
=================
                                          count  time_in_millis  time_in_nanos  percent
logs-nginx.access-1.2.0                 1920000          195122         101626    29.3%
logs-system.auth-1.6.4                   320000          123478         385868    18.5%
logs-system.syslog-1.6.4                2240000           94484          42181    14.2%
.fleet_final_pipeline-1                 9848000           46163           4688     6.9%
logs-postgresql.log-1.2.0                320000           43839         136997     6.6%
logs-redis.log-1.1.0                    1600000           37604          23503     5.6%
logs-kafka.log-0.5.0                     728000           27726          38086     4.2%
logs-apache.access-1.3.0                 320000           25662          80194     3.8%
logs-mysql.error-1.1.0                   640000           23132          36144     3.5%
logs-apache.error-1.3.0                  320000           18514          57857     2.8%
logs-postgresql.log-1.2.0-pipeline-log   320000           13572          42413     2.0%
logs-nginx.error-1.2.0                   320000           11415          35672     1.7%
logs-mysql.slowlog-1.1.0                 160000            6134          38338     0.9%

elasticsearchmachine · 2023-01-12T15:42:52Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2023-01-12T15:42:52Z

Hi @HiDAl, I've created a changelog YAML for you.

HiDAl · 2023-01-12T15:44:28Z

@elasticsearchmachine generate changelog

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/DateProcessor.java

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/DateFormat.java

joegallo · 2023-01-13T17:06:38Z

@HiDAl can you grab a heapdump during a run of the Logging benchmark? I'm thinking we've got ~20 date processors running there, each with a cache up to 16 of these things. So that's (up to) 320 of them. I'm guessing they're pretty small, but how small?

That is, clearly we're making a time/space tradeoff here. The stats above give us a clear sense of the time we're buying (tons! awesome!), but I'm curious how much space we're giving up to buy that.

A separate interesting question. What if we wanted to have zero cache evictions? Could you temporarily add some extra logging and run the Logging benchmark again? I'd be very curious what the normal usage pattern is here. For example, are some date processors just using one or two cached date-parsing-functions (embarrassingly cacheable! perfect!) and others are churning their cache constantly as they try to rotate through 512 different functions?

An alternative implementation might be a static cache based around a ConcurrentMap<Key, SoftReference<Function<String, ZonedDateTime>>> -- in that case we could make it bigger than 16, even something like 256 or 512 would probably be reasonable.

I'm super in favor of getting a cache like this in, but I want to be very thoughtful about it.

HiDAl · 2023-01-17T11:08:22Z

@joegallo well, I've taken the heap dump, and here's what I found out:

DateProcessor is the second bigger processor (right after GrokProcessor [1])

The whole set of DateProcessor (18 objects for this pipeline configuration), represents 0.03% of the reserved memory

obviously, being the cache a big portion of this memory (~128KB):

Even though some of the DateProcessors are using values coming from the documents for the locale or the timezone (part of the cache key and with the potential to have a high dispersion), they're still highly cacheable [2]: we only have 22 elements across all the Caches

Cache lifecycle

On the other hand, adding some basic logging (89b9e0), we have found out:

There are no evictions for this pipeline configuration:

../logs/server$ grep -i 'purging' default.log
../logs/server$

The cache size is small [3]:

../logs/server$ grep -i 'cache size' default.log  | cut -d ']' -f 6 | sort -n | uniq -c
   7389  Cache size: 1
  10660  Cache size: 2
   6425  Cache size: 3
    173  Cache size: 4
   3235  Cache size: 6

About having a common static cache

Running a new benchmark against having a big and shared cache (commit 86b35) we obtain the following results:

Ingest & Index Summary:
=======================
          count  time_in_millis  time_in_nanos  ingest_time_%  index_time_in_millis
total  33324250         1409999          42312          25.0%               4220354

Pipeline Summary:
=================
                                          count  time_in_millis  time_in_nanos  percent
logs-nginx.access-1.2.0                 1920000          185653          96695    28.4%
logs-system.auth-1.6.4                   320000          119544         373574    18.3%
logs-system.syslog-1.6.4                2240000           94827          42334    14.5%
logs-postgresql.log-1.2.0                320000           48548         151713     7.4%
.fleet_final_pipeline-1                 9822129           44440           4525     6.8%
logs-redis.log-1.1.0                    1600000           36968          23105     5.7%
logs-kafka.log-0.5.0                     702131           26312          37475     4.0%
logs-apache.access-1.3.0                 320000           25239          78872     3.9%
logs-mysql.error-1.1.0                   640000           22179          34655     3.4%
logs-apache.error-1.3.0                  320000           18090          56532     2.8%
logs-postgresql.log-1.2.0-pipeline-log   320000           15339          47935     2.3%
logs-nginx.error-1.2.0                   320000           11169          34904     1.7%
logs-mysql.slowlog-1.1.0                 160000            5838          36488     0.9%

These results are slightly better than having a small & per-processor cache. Also, when checking the lifecycle of the cache (size and evictions), we get the following:

../logs/server$ grep -i 'cache size' default.log  | cut -d ']' -f 6 | sort -n | uniq -c
    512  Cache size: 1
    558  Cache size: 3
   1530  Cache size: 4
    466  Cache size: 5

../logs/server$ grep -i 'purging' default.log
../logs/server$

As you can see, the cache has only 5 different values along the benchmark, compared with the 22 values of the non-shared cache (read footnote [3]). So, for this specific benchmark and probably for other scenarios where the cache key exists in a broader domain, the static cache is the best option, due to:

In the case the configuration uses the default locale & timezone and only a small set of format, the values will be highly-cacheable and the default key won't be created along the DateProcessors
Since we already calculated the size of a cache entry, having a cache of a maximum of 512 would use only ~3.5MB of memory (from screenshot 1: 131264 B / 18 * 512 = 3.56MB), even if we want to avoid evictions, we could enlarge even more the cache.
Since it's a shared cache the max size would be constant, no mattering the number of processors (compared with a static cache of 512, a cluster having more than 32 processors, with a per-processor cache, would use more memory in the worst scenario)

[1] This is interesting: why is Grok taking so much space? Let us pull a bit of the thread here later
[2] Important consider that this is true for this pipeline configuration and data set: From the heap dump, we can see that the domain of the cache key (the combination of format, timezone, and locale) it's really small, so highly cacheable.
[3] I tried to find a way to relate each pipeline with its cache size but I didn't find an easy way to do it without having to hack a lot of the code, but I think this version is enough to have a global view

HiDAl · 2023-01-26T08:35:33Z

the last push introduces the static cache with a default value of 256, configurable through a settings

joegallo

+1, but I've been involved enough in the creation of this that I think a second reviewer would be a good thing.

masseyke · 2023-01-26T18:32:19Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/DateProcessor.java

+
+            element = supplier.get();
+            if (map.size() >= capacity) {
+                map.clear();


This is kind of surprising. Whenever the cache is full, we empty it and start over? Is this to avoid the complexity of a concurrent LRU cache? Or is the idea that we'll rarely see more than 256 formats in practice, so this is just for safety and will rarely be hit?

Yes! this decision was made mainly because we want to avoid the complexity of LRU (or any eviction policy) within this code path. The decision to make the size 256 was primarily based on experiments with different benchmarks.

I think it's worth adding a one line comment in the code above the clear to explain its purpose.

joegallo · 2023-01-26T21:23:17Z

Oh, I missed that this version of things doesn't have the SoftReferences around the values. I think we want that, or at least, if we're not going to go that way I'd like to understand why.

HiDAl · 2023-01-27T10:06:36Z

Oh, I missed that this version of things doesn't have the SoftReferences around the values. I think we want that, or at least, if we're not going to go that way I'd like to understand why.

@joegallo you're right, my bad! just pushed the needed changes using SoftReference<>s and some rudimentary tests.

cache potentially duped values in the DateProcessor, avoid the creation of disposable objects during the different executions of the processor

tests the Cache class

masseyke · 2023-01-27T15:29:04Z

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/DateProcessor.java

+    private static final Cache CACHE;
+
+    static {
+        var cacheSizeStr = System.getProperty(CACHE_CAPACITY_SETTING, "256");


What's the reason for doing this as a system property instead of a Settings thing (and related, are we documenting this setting)?

We don't expect users will have to change this value unless they see a big performance effect on their pipelines. In the opposite case, we wanted to provide an easy way to change the value if we receive an SDH (with the obvious memory consumption trade-off). I inspired by this other PR#80902 by @joegallo

+1, git grep 'System.getProperty("es' | grep -vi test/ shows a good number of examples of this as the pattern for non-customer-facing internal knobs that we don't expect to need to twiddle (but that we're making possible to twiddle as a favor to our future selves).

OK, just wanted to make sure it was intentional.

add a javadoc comment

cache potentially duped values in the `DateProcessor`, avoiding the creation of disposable objects during the different executions

joegallo · 2023-03-01T20:15:37Z

Related to #73918

HiDAl added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Data Management Meta label for data/management team v8.7.0 labels Jan 12, 2023

HiDAl changed the title ~~Add a cache in the DateProcessor for the formatters~~ Cache the creation of parsers within DateProcessor Jan 12, 2023

HiDAl force-pushed the palcantar/date-proc-cache branch from f5c2fb2 to ad030db Compare January 12, 2023 15:47

HiDAl requested a review from joegallo January 12, 2023 15:48

joegallo reviewed Jan 12, 2023

View reviewed changes

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/DateProcessor.java Outdated Show resolved Hide resolved

HiDAl force-pushed the palcantar/date-proc-cache branch from ad030db to cbd4b9a Compare January 12, 2023 16:20

joegallo reviewed Jan 12, 2023

View reviewed changes

modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/DateFormat.java Outdated Show resolved Hide resolved

HiDAl force-pushed the palcantar/date-proc-cache branch from cbd4b9a to 2537172 Compare January 13, 2023 09:17

HiDAl requested a review from joegallo January 13, 2023 09:32

HiDAl force-pushed the palcantar/date-proc-cache branch from 2537172 to c900abc Compare January 26, 2023 08:34

joegallo approved these changes Jan 26, 2023

View reviewed changes

joegallo requested a review from masseyke January 26, 2023 15:15

masseyke reviewed Jan 26, 2023

View reviewed changes

HiDAl requested a review from masseyke January 27, 2023 10:07

HiDAl and others added 4 commits January 27, 2023 11:29

transform into a constant the DateFormatter

37d1239

introduce a cache to DateProcessor

0426778

cache potentially duped values in the DateProcessor, avoid the creation of disposable objects during the different executions of the processor

Update docs/changelog/92880.yaml

b8281f7

uses SoftReferences<> for the cache values

d3eb4cd

tests the Cache class

HiDAl force-pushed the palcantar/date-proc-cache branch from fcccf77 to d3eb4cd Compare January 27, 2023 10:30

masseyke reviewed Jan 27, 2023

View reviewed changes

group cache variables and instances

c004397

add a javadoc comment

HiDAl requested a review from masseyke January 30, 2023 09:29

masseyke approved these changes Jan 31, 2023

View reviewed changes

HiDAl merged commit 7e0ed53 into elastic:main Jan 31, 2023

HiDAl deleted the palcantar/date-proc-cache branch January 31, 2023 14:08

This was referenced Mar 1, 2023

Low throughput rates for Cisco ASA module's Ingest Node pipeline elastic/beats#25768

Open

[Ingest Pipeline] Date processor performance issues - Discussion #73918

Closed

joegallo mentioned this pull request Apr 17, 2023

Better caching of parsers in DateProcessor #95299

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache the creation of parsers within `DateProcessor` #92880

Cache the creation of parsers within `DateProcessor` #92880

HiDAl commented Jan 12, 2023 •

edited

Loading

elasticsearchmachine commented Jan 12, 2023

elasticsearchmachine commented Jan 12, 2023

HiDAl commented Jan 12, 2023

joegallo commented Jan 13, 2023 •

edited

Loading

HiDAl commented Jan 17, 2023 •

edited

Loading

HiDAl commented Jan 26, 2023

joegallo left a comment

masseyke Jan 26, 2023

HiDAl Jan 27, 2023

joegallo Jan 27, 2023

joegallo commented Jan 26, 2023

HiDAl commented Jan 27, 2023 •

edited

Loading

masseyke Jan 27, 2023

HiDAl Jan 30, 2023

joegallo Jan 30, 2023

masseyke Jan 31, 2023

joegallo commented Mar 1, 2023

Cache the creation of parsers within DateProcessor #92880

Cache the creation of parsers within DateProcessor #92880

Conversation

HiDAl commented Jan 12, 2023 • edited Loading

Benchmark

BASELINE

CACHE OF SIZE 16

CACHE OF SIZE 64

elasticsearchmachine commented Jan 12, 2023

elasticsearchmachine commented Jan 12, 2023

HiDAl commented Jan 12, 2023

joegallo commented Jan 13, 2023 • edited Loading

HiDAl commented Jan 17, 2023 • edited Loading

Cache lifecycle

About having a common static cache

HiDAl commented Jan 26, 2023

joegallo left a comment

Choose a reason for hiding this comment

masseyke Jan 26, 2023

Choose a reason for hiding this comment

HiDAl Jan 27, 2023

Choose a reason for hiding this comment

joegallo Jan 27, 2023

Choose a reason for hiding this comment

joegallo commented Jan 26, 2023

HiDAl commented Jan 27, 2023 • edited Loading

masseyke Jan 27, 2023

Choose a reason for hiding this comment

HiDAl Jan 30, 2023

Choose a reason for hiding this comment

joegallo Jan 30, 2023

Choose a reason for hiding this comment

masseyke Jan 31, 2023

Choose a reason for hiding this comment

joegallo commented Mar 1, 2023

Cache the creation of parsers within `DateProcessor` #92880

Cache the creation of parsers within `DateProcessor` #92880

HiDAl commented Jan 12, 2023 •

edited

Loading

joegallo commented Jan 13, 2023 •

edited

Loading

HiDAl commented Jan 17, 2023 •

edited

Loading

HiDAl commented Jan 27, 2023 •

edited

Loading