Skip to content

Commit

Permalink
Support custom configuration schema, and fault injection testing (#398)
Browse files Browse the repository at this point in the history
* Merge acto-dev commits

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Upload scripts

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix diff ignore field

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* add config for redis

* fix bugs in acto

* Patch Cass operator's CRD

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Delete jvm related config from cass-operator for now

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Ported elasticsearch operator cloud-on-k8s

* upload the config generation scripts.

* update the script for config values

* Change dir name because python module cannot have dot in name

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Support UnderSpecified schema for configuration testing

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix import issue

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix partial func name

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix get value by path

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix set value by path callsite

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix null value in toml

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Add cass config test

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Add cass config test

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Add mongodb config test

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix mongodb config schema

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix cass-operator config crd

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix string schema for loading unknown properties

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* updated ES operator config

* Workaround the cass-operator's config CRD

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix import path

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix merge error

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix cassandra config schema

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix tidb config mapping

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix tidb config mapping

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix boolean schema for configuration tests

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix mongodb config name

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* added steady state fault injection impl

* updated elastic search acto config

* added steady state fault injection

* Add mariadb config test

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Complete MongoDB configuration

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix configparser for MariaDB

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Retrieve all pod log when it is unhealthy

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Retry wait for pod to be ready

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix Cassandra configuration

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix result is_error check for deletion tests

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix deletion test

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Revert tidb config change

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Separate config test and func test

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix config name

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix mariadb config

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Put operator port into versioned dir

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix mariadb ini file parsing

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix mongodb configuration

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix configparser value set

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix post diff test and run health oracle for rejected inputs

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Update scripts

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix mariadb config

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Raise exception if precondition is not satisfied

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix cr path bug in deletion tests

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Use updated health oracle to check convergence

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Format

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix collecting steady_system_state

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Enlarge tidb operator wait time

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Correct Cassandra configuration schema

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Use semantic replicas tests for cass operator and mongodb operator

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Cleanup fault injection code

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix Cassandra configuration

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix: skip oracle if cli indicates invalid input

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* update bad values for configuration test

* Support custom oracle

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* add custom oracle for mongodb

* fix bugs in mongodb oracle

* add tidb oracle

* remove tidb commented codes

* update tidb oracle

* add oracle for mariadb and fixes some bugs in acto

* update maraidb oracel and fixes bugs in mongodb oracle

* fix bugs in oracle in mongodb

* fix bugs in acto

* fix mongodb oracle

* fix mariadb oracle

* fix bugs in mariadb oracle

* fix mongodb config

* make acto compatible with oracle

* fix bugs in acto

* fix cass-operator oracle

* add missing properties for objects in cass-config

* run cass-op with custom oracle

* add logs to tidb oracle

* fix bugs in tidb oracle

* Update the run scripts

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Delete unused scripts

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix failed unittests

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

* Fix cass test

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>

---------

Signed-off-by: Tyler Gu <jiaweig3@illinois.edu>
Co-authored-by: TZ-zzz <tangzhen1027@gmail.com>
Co-authored-by: yimingsu <yimingsu@node0.fault-injection.sieve-acto-pg0.wisc.cloudlab.us>
  • Loading branch information
3 people authored Dec 15, 2024
1 parent 2662b50 commit 572fc01
Show file tree
Hide file tree
Showing 220 changed files with 326,776 additions and 1,073 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ repos:
args: [--extra=dev, --output-file=requirements-dev.txt]
files: ^pyproject.toml$
- repo: https://github.com/psf/black
rev: 23.12.0
rev: 24.10.0
hooks:
- id: black
name: black
Expand Down
24 changes: 14 additions & 10 deletions acto/checker/checker_set.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

from typing import Optional

from acto.checker.checker import CheckerInterface

from acto.checker.impl.consistency import ConsistencyChecker
from acto.checker.impl.crash import CrashChecker
from acto.checker.impl.health import HealthChecker
Expand All @@ -23,10 +25,8 @@ def __init__(
trial_dir: str,
input_model: InputModel,
oracle_handle: OracleHandle,
checker_generators: Optional[list] = None,
custom_checker: Optional[type[CheckerInterface]] = None,
):
if checker_generators:
checker_generators.extend(checker_generators)
self.context = context
self.input_model = input_model
self.trial_dir = trial_dir
Expand All @@ -39,7 +39,12 @@ def __init__(
context=self.context,
input_model=self.input_model,
)
_ = oracle_handle

# Custom checker
self._oracle_handle = oracle_handle
self._custom_checker: Optional[CheckerInterface] = (
custom_checker(self._oracle_handle) if custom_checker else None
)

def check(
self,
Expand Down Expand Up @@ -68,12 +73,6 @@ def check(
num_delta,
)

# generation_result_path = os.path.join(
# self.trial_dir, f"generation-{generation:03d}-runtime.json"
# )
# with open(generation_result_path, "w", encoding="utf-8") as f:
# json.dump(run_result.to_dict(), f, cls=ActoEncoder, indent=4)

return OracleResults(
crash=self._crash_checker.check(
generation, snapshot, prev_snapshot
Expand All @@ -87,6 +86,11 @@ def check(
consistency=self._consistency_checker.check(
generation, snapshot, prev_snapshot
),
custom=(
self._custom_checker.check(generation, snapshot, prev_snapshot)
if self._custom_checker
else None
),
)

def count_num_fields(self, snapshot: Snapshot, prev_snapshot: Snapshot):
Expand Down
7 changes: 6 additions & 1 deletion acto/checker/impl/health.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,14 @@ class HealthChecker(CheckerInterface):
"""System health oracle"""

def check(
self, _: int, snapshot: Snapshot, __: Snapshot
self,
_: int = 0,
snapshot: Optional[Snapshot] = None,
__: Optional[Snapshot] = None,
) -> Optional[OracleResult]:
"""System health oracle"""
if snapshot is None:
return None
logger = get_thread_logger(with_prefix=True)

system_state = snapshot.system_state
Expand Down
33 changes: 33 additions & 0 deletions acto/checker/impl/state_compare.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from deepdiff.helper import NotPresent

from acto.k8s_util.k8sutil import canonicalize_quantity
from acto.common import flatten_dict


def is_none_or_not_present(value: Any) -> bool:
Expand Down Expand Up @@ -83,6 +84,18 @@ def input_config_is_subset_of_output_config(input_config: Any, output_config: An
return False
return False

def compare_application_config(input_config: Any, output_config: Any) -> bool:
if isinstance(input_config, dict) and isinstance(output_config, dict):
try:
set_input_config = flatten_dict(input_config, ["root"])
set_output_config = flatten_dict(output_config, ["root"])
for item in set_input_config:
if item not in set_output_config:
return False
return True
except configparser.Error:
return False
return False

class CompareMethods:
def __init__(self, enable_k8s_value_canonicalization=True):
Expand Down Expand Up @@ -143,3 +156,23 @@ def transform_field_value(self, in_prev, in_curr, out_prev, out_curr):

# return original values
return in_prev, in_curr, out_prev, out_curr

class CustomCompareMethods():
def __init__(self):
self.custom_equality_checkers = []
self.custom_equality_checkers.extend([compare_application_config])

def equals(self, left: Any, right: Any) -> bool:
"""
Compare two values. If the values are not equal, then try to use custom_equality_checkers to see if they are
@param left:
@param right:
@return:
"""
if left == right:
return True
else:
for equals in self.custom_equality_checkers:
if equals(left, right):
return True
return False
2 changes: 1 addition & 1 deletion acto/checker/impl/tests/test_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def input_model_and_context_mapping() -> (
Dict[str, Tuple[Dict, DeterministicInputModel]]
):
"""Returns a mapping from apiVersion to (context, input_model)"""
configs = glob.glob("./data/**/config.json")
configs = glob.glob("./data/**/config.json", recursive=True)
ret = {}
for config_path in configs:
with open(config_path, "r", encoding="utf-8") as f:
Expand Down
7 changes: 6 additions & 1 deletion acto/deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,12 @@ def wait_for_pod_ready(kubectl_client: KubectlClient) -> bool:
"""Wait for all pods to be ready"""
now = time.time()
try:
p = kubectl_client.wait_for_all_pods(timeout=600)
i = 0
while i < 3:
p = kubectl_client.wait_for_all_pods(timeout=600)
if p.returncode == 0:
break
i += 1
except subprocess.TimeoutExpired:
logging.error("Timeout waiting for all pods to be ready")
return False
Expand Down
63 changes: 35 additions & 28 deletions acto/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
import jsonpatch
import yaml

from acto.checker.checker import CheckerInterface
from acto.checker.checker_set import CheckerSet
from acto.checker.impl.health import HealthChecker
from acto.common import (
Expand Down Expand Up @@ -81,6 +82,11 @@ def apply_testcase(
testcase.mutator(field_curr_value), list(path)
)
curr = value_with_schema.raw_value()
else:
raise RuntimeError(
"Running test case while precondition fails"
f" {path} {field_curr_value}"
)

# Satisfy constraints
assumptions: list[tuple[PropertyPath, bool]] = []
Expand Down Expand Up @@ -141,16 +147,16 @@ def check_state_equality(
# remove pods that belong to jobs from both states to avoid observability problem
curr_pods = curr_system_state["pod"]
prev_pods = prev_system_state["pod"]
curr_system_state["pod"] = {
k: v
for k, v in curr_pods.items()
if v["metadata"]["owner_references"][0]["kind"] != "Job"
}
prev_system_state["pod"] = {
k: v
for k, v in prev_pods.items()
if v["metadata"]["owner_references"][0]["kind"] != "Job"
}

for k, v in curr_pods.items():
if "owner_reference" in v["metadata"] and v["metadata"]["owner_reference"] is not None and ["owner_references"][0]["kind"] == "Job":
continue
curr_system_state[k] = v

for k, v in prev_pods.items():
if "owner_reference" in v["metadata"] and v["metadata"]["owner_reference"] is not None and ["owner_references"][0]["kind"] == "Job":
continue
prev_system_state[k] = v

for obj in prev_system_state["secret"].values():
if "data" in obj and obj["data"] is not None:
Expand Down Expand Up @@ -249,8 +255,8 @@ def __init__(
runner_t: type,
checker_t: type,
wait_time: int,
custom_on_init: list[Callable],
custom_oracle: list[Callable],
custom_on_init: Optional[Callable],
custom_checker: Optional[type[CheckerInterface]],
workdir: str,
cluster: base.KubernetesEngine,
worker_id: int,
Expand Down Expand Up @@ -288,7 +294,7 @@ def __init__(
)

self.custom_on_init = custom_on_init
self.custom_oracle = custom_oracle
self.custom_checker = custom_checker
self.dryrun = dryrun
self.is_reproduce = is_reproduce

Expand Down Expand Up @@ -407,8 +413,8 @@ def run_trial(
)
# first run the on_init callbacks if any
if self.custom_on_init is not None:
for on_init in self.custom_on_init:
on_init(oracle_handle)
for callback in self.custom_on_init:
callback(oracle_handle)

runner: Runner = self.runner_t(
self.context,
Expand All @@ -423,7 +429,7 @@ def run_trial(
trial_dir,
self.input_model,
oracle_handle,
self.custom_oracle,
self.custom_checker,
)

curr_input = self.input_model.get_seed_input()
Expand Down Expand Up @@ -555,7 +561,6 @@ def run_trial(
run_result.oracle_result.differential = self.run_recovery( # pylint: disable=assigning-non-slot
runner
)
generation += 1
trial_err = run_result.oracle_result
setup_fail = True
break
Expand Down Expand Up @@ -586,7 +591,6 @@ def run_trial(
run_result.oracle_result.differential = self.run_recovery(
runner
)
generation += 1
trial_err = run_result.oracle_result
break

Expand All @@ -596,10 +600,10 @@ def run_trial(
break

if trial_err is not None:
trial_err.deletion = self.run_delete(runner, generation=generation)
trial_err.deletion = self.run_delete(runner, generation=0)
else:
trial_err = OracleResults()
trial_err.deletion = self.run_delete(runner, generation=generation)
trial_err.deletion = self.run_delete(runner, generation=0)

return TrialResult(
trial_id=trial_id,
Expand Down Expand Up @@ -767,9 +771,9 @@ def run_delete(
logger = get_thread_logger(with_prefix=True)

logger.debug("Running delete")
success = runner.delete(generation=generation)
deletion_failed = runner.delete(generation=generation)

if not success:
if deletion_failed:
return DeletionOracleResult(message="Deletion test case")
else:
return None
Expand Down Expand Up @@ -884,13 +888,16 @@ def __init__(

self.sequence_base = 0

self.custom_oracle: Optional[type[CheckerInterface]] = None
self.custom_on_init: Optional[Callable] = None
if operator_config.custom_oracle is not None:
module = importlib.import_module(operator_config.custom_oracle)
self.custom_oracle = module.CUSTOM_CHECKER
self.custom_on_init = module.ON_INIT
else:
self.custom_oracle = None
self.custom_on_init = None
if hasattr(module, "CUSTOM_CHECKER") and issubclass(
module.CUSTOM_CHECKER, CheckerInterface
):
self.custom_checker = module.CUSTOM_CHECKER
if hasattr(module, "ON_INIT"):
self.custom_on_init = module.ON_INIT

# Generate test cases
self.test_plan = self.input_model.generate_test_plan(
Expand Down Expand Up @@ -1125,7 +1132,7 @@ def run(self) -> list[OracleResults]:
self.checker_type,
self.operator_config.wait_time,
self.custom_on_init,
self.custom_oracle,
self.custom_checker,
self.workdir_path,
self.cluster,
i,
Expand Down
2 changes: 0 additions & 2 deletions acto/input/constraint.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
""""""

from typing import Literal, Optional

import pydantic
Expand Down
Loading

0 comments on commit 572fc01

Please sign in to comment.