Skip to content

Conversation

@wortmanb
Copy link

As tested as I can make it, here's the first releasable version of deepfreeze for curator. Unit tests all pass; integration tests are a work in progress as they take so long to run that it's really difficult, and parallelizing them hasn't worked very well either.

Bret Wortman added 30 commits January 27, 2025 16:15
Verified and fixed code for removing old repositories.
For oneup, at least. Need to ensure this works for date-based rotation
too.
Removed commented-out code now that I know it's safe
Finally got black configured and disabled Flake. Much happier now.
templated these, which we'll use to track repos and thawsets inside of
the status index in elasticsearch
Unit tests for utility classes used by DeepFreeze.
These tests cover all remaining utility (module-level) functions. They
could perhaps be collected into a single file.
I plan to do this wherever possible, and anywhere it doesn't cause more
problems than it solves.
This is almost certainly incomplete, but I'll add to it as we go along.
This completely breaks a number of things, but I wanted to capture it
mid-stream so as not to lose it. Flaky network at BAH.
Set defaults for this code formatter, which is faster than black but can
format just as well and to the same standard.
Switched to Ruff. It really wants " instead of '.
Added s3client.py to encapsulate S3 client code for various providers
under a consistent inteface. Includes classes S3Client and its
implementation classes, plus a factory method to return a client object
for a particular provider.
Also made some updates to deepfreeze.py to comply with testing better.
Allows us to persist more details about the repo.
This doesn't need to be part of Curator, but I need it to figure out how
to do this. Might ask Aaron how to use Curator to do the same thing at
some point.
Added functions to support geting first and last timestamps from indices
within a given repository.
I'm ditching the test_deepfreeze.py file in favor of
test_deepfreeze_<action>.py files. Here are the first two (though I
haven't written the actual integration tests yet)
It's served its purpose.
Templated thaw and refreeze methods in the (perhaps silly) hope that we
can programatically thaw and re-freeze buckets or paths on the user's
behalf. The asynchronicity of this is a question for later...
Bret Wortman and others added 29 commits October 29, 2025 05:47
Added descriptions of all actions in markdown.
Due to issues in rotate, not all repos were being marked 'frozen'. This
necessitated adding repair_metadata, which can be used should this ever
occur again and serves as a foundation for other potential repair work
in the future.

Updated integration tests and fixes revealed by testing.
1. Parallelized AWS S3 API Calls (10-15x speedup on S3 checks)

  File: curator/actions/deepfreeze/utilities.py

  - Modified check_restore_status() to use ThreadPoolExecutor with 15
concurrent workers
  - Instead of checking objects sequentially (one by one), now checks up
to 15 objects in parallel
  - This is the biggest win - transforms sequential 10,000 API calls
from 16+ minutes to ~1 minute

  Technical details:
  - boto3 client is thread-safe, making this safe to implement
  - Separates instant-access objects (no check needed) from Glacier
objects (need parallel
  checking)
  - Uses concurrent.futures.as_completed() to process results as they
arrive

  2. Eliminated Redundant Status Checks (2x speedup on overall flow)

  Files: curator/actions/deepfreeze/thaw.py

  - Added status caching in both do_check_status() and
do_check_all_status()
  - Modified _display_thaw_status() to accept optional cache parameter
  - Previously called check_restore_status() twice per repository (once
for logic, once for
  display)
  - Now caches results from first check and reuses for display

  3. Added Progress Indicators (UX improvement)

  Files: curator/actions/deepfreeze/thaw.py

  - Shows "Checking repository X of Y..." as each repository is
processed
  - Gives users real-time feedback instead of appearing frozen
  - Uses existing rich library for clean terminal output

  4. Code Quality

  - All changes pass black formatting
  - All changes pass ruff linting
  - Backward compatible - no API changes

  Expected Performance Improvement

  Before: ~11 minutes (660 seconds)
  After: ~1-2 minutes (60-120 seconds)

  Overall speedup: 5-10x faster!

  Breakdown:

  - S3 API calls: 16 minutes → ~1 minute (15x faster)
  - Redundant checks eliminated: Cut remaining time in half
  - Total: 11 minutes → 1-2 minutes

  The exact improvement depends on:
  - Number of thaw requests
  - Number of repositories per request
  - Number of objects per repository
  - Network latency to AWS S3
Summary of Changes

  1. CLI Command (curator/cli_singletons/deepfreeze.py:344-370)

  Added the -f/--refrozen-retention-days option to the cleanup command:
  - Short flag: -f (mnemonic for "refrozen")
  - Long flag: --refrozen-retention-days
  - Type: integer
  - Default: None (uses config setting, typically 35 days)

  2. Cleanup Action (curator/actions/deepfreeze/cleanup.py)

  - Updated __init__ to accept refrozen_retention_days parameter
  - Modified _cleanup_old_thaw_requests() to use CLI override if
provided, otherwise fall back to
  settings value
  - Applied same logic to do_dry_run() method for consistent behavior
  - Updated class docstring to document the new parameter

  3. Schema Validation

  Added validation in two places:
  - option_defaults.py: Created refrozen_retention_days() function with
validation (1-365 days
  range, None allowed)
  - validators/options.py: Added the option to cleanup's validation
schema
1. Added NotFoundError import (line 7) - imported the specific exception
type from elasticsearch8
   to handle repository not found errors
  2. Added specific exception handling (lines 210-223) - added a new
exception handler that:
    - Specifically catches NotFoundError before the generic exception
handler
    - Detects when the error is a repository_missing_exception
(indicating the repository has
  already been unmounted)
    - Logs an INFO level message instead of ERROR: "Repository {name}
has already been unmounted,
  no indices to delete"
    - Returns gracefully with no indices deleted
    - For other NotFoundError cases, logs a WARNING instead of ERROR
Show counts in thaw list output
Detect and fix situation where a thaw request is submitted, acted upon
by AWS, but ignored by the requestor. If check-status is run after the
data is refrozen by AWS, this detects that and fixes the metadata to
show the request as being refrozen so it doesn't languish as a pending
request.
Updated test description to reflect the integration tests' unreliable
nature.
…er guide

- Add detailed overview and architecture documentation
- Document all actions: setup, rotate, status, thaw, refreeze, cleanup, repair-metadata
- Include quick start guide and common workflows
- Add cost optimization and scheduling recommendations
- Document ILM integration and troubleshooting

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ation to setup

Setup now requires --ilm_policy_name and --index_template_name options to ensure
deepfreeze is fully configured for data flow from the start.

Changes:
- Add required --ilm_policy_name option: creates new policy with tiering strategy
  (7d hot, 30d cold, 365d frozen, then delete) or updates existing policy to use
  the deepfreeze repository
- Add required --index_template_name option: attaches ILM policy to the template
  so new indices automatically use the deepfreeze configuration
- Add utility functions: get_ilm_policy, create_or_update_ilm_policy,
  update_index_template_ilm_policy (supports both composable and legacy templates)
- Remove deprecated --create_sample_ilm_policy flag
- Ensure delete phase always has delete_searchable_snapshot=false

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add ilm_policy_name and index_template_name fields to Settings class
- Setup now saves configured policy/template names to status index
- Rotate discovers policies/templates and updates stored names after each rotation
- Add precondition check: index template must exist before setup can proceed

This enables tracking of which ILM policies and index templates are
configured for deepfreeze, updated on each rotation to reflect the
current cluster state.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Extract deepfreeze functionality into independent package that can run
without curator dependency. Includes:

- S3 client abstraction (AwsS3Client) for Glacier operations
- All 7 action classes: setup, status, rotate, thaw, refreeze, cleanup, repair-metadata
- Click-based CLI mirroring curator_cli deepfreeze interface
- Lightweight ES client wrapper supporting all auth methods
- YAML configuration with default location (~/.deepfreeze/config.yml)
- Voluptuous schema validation for all options
- 262 tests verifying independence from curator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Move the standalone deepfreeze CLI package to github.com/elastic/deepfreeze.
This package is now maintained independently and installed via git dependency.

Removed:
- deepfreeze/ directory with CLI, config, validators, defaults
- All associated tests

The deepfreeze CLI functionality continues to be available through the
es-deepfreeze package from the new repository.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…freeze-core

Remove the deepfreeze action implementation files from curator.
These are now provided by the deepfreeze-core package which is
installed as a dependency.

Removed files:
- cleanup.py, constants.py, exceptions.py, helpers.py
- refreeze.py, repair_metadata.py, rotate.py, setup.py
- status.py, thaw.py, utilities.py

The curator/actions/deepfreeze/__init__.py continues to re-export
all functionality from deepfreeze_core for backward compatibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…eze-core

Update curator to import deepfreeze functionality from the external
deepfreeze-core package instead of local implementation files.

Changes:
- curator/actions/deepfreeze/__init__.py: re-export from deepfreeze_core
- curator/s3client.py: re-export S3 client from deepfreeze_core
- curator/cli_singletons/deepfreeze.py: add porcelain flags to CLI
- pyproject.toml: declare deepfreeze-core>=1.0.0 dependency

The deepfreeze-core package is installed from:
github.com/elastic/deepfreeze (packages/deepfreeze-core)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant