DRILL-8504: Add Schema Caching to Splunk Plugin #2929

cgivre · 2024-07-24T18:58:29Z

DRILL-8504: Add Schema Caching to Splunk Plugin

Description

Whenever Drill executes a Splunk query, it must retrieve a list of indexes from Splunk. This step can add a considerable amount of time to the planning phase. This PR introduces a simple in-memory cache for the Splunk plugin which caches the list of indexes to avoid having to query Splunk repeatedly to obtain this information.

This PR also makes a few unrelated minor improvements:

Updates the test container to Splunk version 9.3 which at the time of writing is the most current version. I had to update some unit tests as a result.
Adds a new config option for the maximum columns returned in Splunk
Adds the actual SPL sent to Splunk to the query plan. This can be useful for debugging.

Documentation

(Added to README)
For every query that you send to Splunk from Drill, Drill will have to pull schema information from Splunk. If you have a lot of indexes, this process can cause slow planning time. To improve planning time, you can configure Drill to cache the index names so that it does not need to make additional calls to Splunk.

There are two configuration parameters for the schema caching: maxCacheSize and cacheExpiration. The maxCacheSize defaults to 10k bytes and the cacheExpiration defaults to 1024 minutes. To disable schema caching simply set the cacheExpiration parameter to a value less than zero.

Testing

Ran all unit tests and tested manually.

jnturton · 2024-08-03T10:26:34Z

contrib/storage-splunk/pom.xml

+ <dependency>
+ <groupId>com.github.ben-manes.caffeine</groupId>
+ <artifactId>caffeine</artifactId>
+ <version>2.9.3</version>


Can we achieve the same thing using Guava's caching? The reason I ask is that we already have this insanely big dependency tree and Guava is already in it...

https://www.baeldung.com/guava-cache

But so is caffeine now that I look! So I guess we can ignore this suggestion.

@jnturton Is that a +1? I somehow broke the versioning when I rebased on the current master, but I'll fix before merging.

Sorry, I got pulled away before I could continue but will complete the review today.

cgivre · 2024-08-25T18:54:02Z

@jnturton It looks like the GitHub CI is failing on the Hadoop 2 tests with Hive.

cgivre self-assigned this Jul 24, 2024

cgivre requested a review from jnturton July 25, 2024 16:29

cgivre changed the title ~~Add Index Cache to Splunk Plugin~~ DRILL-8504: Add Schema Caching to Splunk Plugin Jul 29, 2024

cgivre marked this pull request as ready for review July 29, 2024 15:00

cgivre added enhancement PRs that add a new functionality to Drill doc-impacting PRs that affect the documentation dependencies labels Jul 30, 2024

jnturton reviewed Aug 3, 2024

View reviewed changes

cgivre force-pushed the splunk_schema_cache branch from af3d11b to f372ad6 Compare August 5, 2024 05:22

cgivre force-pushed the splunk_schema_cache branch from 74f6b90 to 0d8364b Compare August 12, 2024 16:47

cgivre and others added 12 commits August 26, 2024 08:59

Initial Experiments

5b7d44a

Downgrade Caffeine to support Java 8

162a2b5

Various fixes

8fcc6d4

Added config options

6b576d2

Updated docs and unit tests

5894671

Added SPL to query plan

99f7a85

Moved SPL to group scan

32d32e5

Formatting and minor bug fix

f35c24e

Bump Splunk test to 9.3

88514d5

Fix unit itest

8bfa90f

Hopefully fixed UT

0457952

Clear swap space

fd2549a

cgivre force-pushed the splunk_schema_cache branch from 2ff2983 to fd2549a Compare August 26, 2024 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRILL-8504: Add Schema Caching to Splunk Plugin #2929

DRILL-8504: Add Schema Caching to Splunk Plugin #2929

cgivre commented Jul 24, 2024 •

edited

Loading

jnturton Aug 3, 2024

jnturton Aug 4, 2024

cgivre Aug 5, 2024

jnturton Aug 5, 2024

cgivre commented Aug 25, 2024

DRILL-8504: Add Schema Caching to Splunk Plugin #2929

Are you sure you want to change the base?

DRILL-8504: Add Schema Caching to Splunk Plugin #2929

Conversation

cgivre commented Jul 24, 2024 • edited Loading

DRILL-8504: Add Schema Caching to Splunk Plugin

Description

Documentation

Testing

jnturton Aug 3, 2024

Choose a reason for hiding this comment

jnturton Aug 4, 2024

Choose a reason for hiding this comment

cgivre Aug 5, 2024

Choose a reason for hiding this comment

jnturton Aug 5, 2024

Choose a reason for hiding this comment

cgivre commented Aug 25, 2024

cgivre commented Jul 24, 2024 •

edited

Loading