Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] [Tiered Caching] Make ehcache disk cache size setting be ByteSizeValue rather than long #13902

Conversation

peteralfonsi
Copy link
Contributor

Description

EhcacheDiskCache's DISK_CACHE_MAX_SIZE_IN_BYTES_SETTING should take ByteSizeValue, not long. This means users can specify values with units like "10G", "10GB", or "10737418240B" when passing settings in the command line, rather than having to pass "10737418240". This matches the behavior of similar settings like OpenSearchOnHeapCacheSettings.MAXIMUM_SIZE_IN_BYTES.

Added unit test around this change. Also manually tested with the IntelliJ debugger and commands like:
./gradlew run -Dtests.opensearch.opensearch.experimental.feature.pluggable.caching.enabled=true -Dtests.opensearch.indices.requests.cache.store.name=tiered_spillover -Dtests.opensearch.indices.requests.cache.tiered_spillover.onheap.store.name=opensearch_onheap -Dtests.opensearch.indices.requests.cache.tiered_spillover.disk.store.name=ehcache_disk -PinstalledPlugins="['cache-ehcache']" -Dtests.opensearch.indices.requests.cache.ehcache_disk.storage.path="/Volumes/workplace/opensearch/OpenSearch/build/testclusters/runTask-0/data/nodes/0/indices/request_cache" -Dtests.opensearch.indices.requests.cache.ehcache_disk.max_size_in_bytes=10M

to ensure the actual sizes received by ehcache are correct in all cases.

Related Issues

Part of tiered caching.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
      - [N/A] API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
    - [N/A] Commit changes are listed out in CHANGELOG.md file (See: Changelog)
    - [N/A] Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Copy link
Contributor

❕ Gradle check result for 7ce5180: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.action.admin.cluster.node.tasks.ResourceAwareTasksTests.testTaskResourceTrackingDuringTaskCancellation

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented May 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.77%. Comparing base (a2cef8f) to head (4510966).

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13902      +/-   ##
============================================
- Coverage     71.85%   71.77%   -0.09%     
+ Complexity    62490    62435      -55     
============================================
  Files          5145     5145              
  Lines        293372   293373       +1     
  Branches      42410    42410              
============================================
- Hits         210812   210567     -245     
- Misses        65274    65503     +229     
- Partials      17286    17303      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@jainankitk jainankitk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change should be backward compatible since the new nodes should be able to understand both. Approved!

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Jul 9, 2024
@jainankitk
Copy link
Collaborator

@peteralfonsi @sgup432 - Are we planning to merge this change? If yes, let us address the failing checks and get approval from one of the maintainers

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Copy link
Contributor

✅ Gradle check result for 294cdac: SUCCESS

Copy link
Contributor

✅ Gradle check result for 4510966: SUCCESS

EhcacheDiskCache.EhcacheDiskCacheFactory.EHCACHE_DISK_CACHE_NAME + ".max_size_in_bytes",
(key) -> Setting.longSetting(key, DEFAULT_CACHE_SIZE_IN_BYTES, NodeScope)
(key) -> Setting.memorySizeSetting(key, new ByteSizeValue(DEFAULT_CACHE_SIZE_IN_BYTES), NodeScope)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause any issue with upgrade path where older nodes have the setting as long type vs new nodes has it as ByteSizeValue type ? Probably not given it is NodeScope but wanted to double confirm on this.

Copy link
Collaborator

@jainankitk jainankitk Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause any issue with upgrade path where older nodes have the setting as long type vs new nodes has it as ByteSizeValue type ? Probably not given it is NodeScope but wanted to double confirm on this.

@sohami - Thinking again on this, it will break for old nodes. Since once the setting is preserved as memorySizeSetting in the cluster state, old nodes will not be able to apply the latest cluster state and cluster manager will kick those nodes out of the cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a new change so that there's a new ByteSizeSetting "max_size" rather than "max_size_in_bytes", which has been deprecated. If the new setting is present it uses that value. If not, it looks for the deprecated setting and uses that. If neither are present it uses the default value.

I'm not totally sure this complexity is worth it, it might just be better to keep the long setting as it was before. Let me know what you think @jainankitk @sohami

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
@peteralfonsi peteralfonsi force-pushed the ehcache-size-setting-bugfix branch from 588c00e to 934eefd Compare July 26, 2024 16:54
Copy link
Contributor

❌ Gradle check result for 588c00e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 934eefd: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@peteralfonsi
Copy link
Contributor Author

Flaky tests: #14294

…g-bugfix

Signed-off-by: Peter Alfonsi <petealft@amazon.com>
Copy link
Contributor

❌ Gradle check result for 27c7b22: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-changelog stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants