Support custom configuration schema, and fault injection testing (#398)

* Merge acto-dev commits Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Upload scripts Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix diff ignore field Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * add config for redis * fix bugs in acto * Patch Cass operator's CRD Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Delete jvm related config from cass-operator for now Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Ported elasticsearch operator cloud-on-k8s * upload the config generation scripts. * update the script for config values * Change dir name because python module cannot have dot in name Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Support UnderSpecified schema for configuration testing Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix import issue Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix partial func name Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix get value by path Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix set value by path callsite Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix null value in toml Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Add cass config test Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Add cass config test Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Add mongodb config test Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix mongodb config schema Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix cass-operator config crd Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix string schema for loading unknown properties Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * updated ES operator config * Workaround the cass-operator's config CRD Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix import path Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix merge error Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix cassandra config schema Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix tidb config mapping Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix tidb config mapping Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix boolean schema for configuration tests Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix mongodb config name Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * added steady state fault injection impl * updated elastic search acto config * added steady state fault injection * Add mariadb config test Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Complete MongoDB configuration Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix configparser for MariaDB Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Retrieve all pod log when it is unhealthy Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Retry wait for pod to be ready Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix Cassandra configuration Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix result is_error check for deletion tests Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix deletion test Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Revert tidb config change Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Separate config test and func test Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix config name Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix mariadb config Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Put operator port into versioned dir Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix mariadb ini file parsing Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix mongodb configuration Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix configparser value set Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix post diff test and run health oracle for rejected inputs Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Update scripts Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix mariadb config Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Raise exception if precondition is not satisfied Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix cr path bug in deletion tests Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Use updated health oracle to check convergence Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Format Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix collecting steady_system_state Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Enlarge tidb operator wait time Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Correct Cassandra configuration schema Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Use semantic replicas tests for cass operator and mongodb operator Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Cleanup fault injection code Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix Cassandra configuration Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix: skip oracle if cli indicates invalid input Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * update bad values for configuration test * Support custom oracle Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * add custom oracle for mongodb * fix bugs in mongodb oracle * add tidb oracle * remove tidb commented codes * update tidb oracle * add oracle for mariadb and fixes some bugs in acto * update maraidb oracel and fixes bugs in mongodb oracle * fix bugs in oracle in mongodb * fix bugs in acto * fix mongodb oracle * fix mariadb oracle * fix bugs in mariadb oracle * fix mongodb config * make acto compatible with oracle * fix bugs in acto * fix cass-operator oracle * add missing properties for objects in cass-config * run cass-op with custom oracle * add logs to tidb oracle * fix bugs in tidb oracle * Update the run scripts Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Delete unused scripts Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix failed unittests Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> * Fix cass test Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> --------- Signed-off-by: Tyler Gu <jiaweig3@illinois.edu> Co-authored-by: TZ-zzz <tangzhen1027@gmail.com> Co-authored-by: yimingsu <yimingsu@node0.fault-injection.sieve-acto-pg0.wisc.cloudlab.us>
xlab-uiuc · Dec 15, 2024 · 572fc01 · 572fc01
1 parent 2662b50
commit 572fc01
Show file tree

Hide file tree

Showing 220 changed files with 326,776 additions and 1,073 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -16,7 +16,7 @@ repos:
         args: [--extra=dev, --output-file=requirements-dev.txt]
         files: ^pyproject.toml$
   - repo: https://github.com/psf/black
-    rev: 23.12.0
+    rev: 24.10.0
     hooks:
       - id: black
         name: black

diff --git a/acto/checker/checker_set.py b/acto/checker/checker_set.py
@@ -2,6 +2,8 @@
 
 from typing import Optional
 
+from acto.checker.checker import CheckerInterface
+
 from acto.checker.impl.consistency import ConsistencyChecker
 from acto.checker.impl.crash import CrashChecker
 from acto.checker.impl.health import HealthChecker
@@ -23,10 +25,8 @@ def __init__(
         trial_dir: str,
         input_model: InputModel,
         oracle_handle: OracleHandle,
-        checker_generators: Optional[list] = None,
+        custom_checker: Optional[type[CheckerInterface]] = None,
     ):
-        if checker_generators:
-            checker_generators.extend(checker_generators)
         self.context = context
         self.input_model = input_model
         self.trial_dir = trial_dir
@@ -39,7 +39,12 @@ def __init__(
             context=self.context,
             input_model=self.input_model,
         )
-        _ = oracle_handle
+
+        # Custom checker
+        self._oracle_handle = oracle_handle
+        self._custom_checker: Optional[CheckerInterface] = (
+            custom_checker(self._oracle_handle) if custom_checker else None
+        )
 
     def check(
         self,
@@ -68,12 +73,6 @@ def check(
                 num_delta,
             )
 
-        # generation_result_path = os.path.join(
-        #     self.trial_dir, f"generation-{generation:03d}-runtime.json"
-        # )
-        # with open(generation_result_path, "w", encoding="utf-8") as f:
-        #     json.dump(run_result.to_dict(), f, cls=ActoEncoder, indent=4)
-
         return OracleResults(
             crash=self._crash_checker.check(
                 generation, snapshot, prev_snapshot
@@ -87,6 +86,11 @@ def check(
             consistency=self._consistency_checker.check(
                 generation, snapshot, prev_snapshot
             ),
+            custom=(
+                self._custom_checker.check(generation, snapshot, prev_snapshot)
+                if self._custom_checker
+                else None
+            ),
         )
 
     def count_num_fields(self, snapshot: Snapshot, prev_snapshot: Snapshot):

diff --git a/acto/checker/impl/health.py b/acto/checker/impl/health.py
@@ -10,9 +10,14 @@ class HealthChecker(CheckerInterface):
     """System health oracle"""
 
     def check(
-        self, _: int, snapshot: Snapshot, __: Snapshot
+        self,
+        _: int = 0,
+        snapshot: Optional[Snapshot] = None,
+        __: Optional[Snapshot] = None,
     ) -> Optional[OracleResult]:
         """System health oracle"""
+        if snapshot is None:
+            return None
         logger = get_thread_logger(with_prefix=True)
 
         system_state = snapshot.system_state

diff --git a/acto/checker/impl/state_compare.py b/acto/checker/impl/state_compare.py
@@ -4,6 +4,7 @@
 from deepdiff.helper import NotPresent
 
 from acto.k8s_util.k8sutil import canonicalize_quantity
+from acto.common import flatten_dict
 
 
 def is_none_or_not_present(value: Any) -> bool:
@@ -83,6 +84,18 @@ def input_config_is_subset_of_output_config(input_config: Any, output_config: An
             return False
     return False
 
+def compare_application_config(input_config: Any, output_config: Any) -> bool:
+    if isinstance(input_config, dict) and isinstance(output_config, dict):
+        try:
+            set_input_config = flatten_dict(input_config, ["root"])
+            set_output_config = flatten_dict(output_config, ["root"])
+            for item in set_input_config:
+                if item not in set_output_config:
+                    return False
+            return True
+        except configparser.Error:
+            return False
+    return False
 
 class CompareMethods:
     def __init__(self, enable_k8s_value_canonicalization=True):
@@ -143,3 +156,23 @@ def transform_field_value(self, in_prev, in_curr, out_prev, out_curr):
 
         # return original values
         return in_prev, in_curr, out_prev, out_curr
+
+class CustomCompareMethods():
+    def __init__(self):
+        self.custom_equality_checkers = []
+        self.custom_equality_checkers.extend([compare_application_config])
+
+    def equals(self, left: Any, right: Any) -> bool:
+        """
+        Compare two values. If the values are not equal, then try to use custom_equality_checkers to see if they are
+        @param left:
+        @param right:
+        @return:
+        """
+        if left == right:
+            return True
+        else:
+            for equals in self.custom_equality_checkers:
+                if equals(left, right):
+                    return True
+            return False
diff --git a/acto/checker/impl/tests/test_state.py b/acto/checker/impl/tests/test_state.py
@@ -24,7 +24,7 @@ def input_model_and_context_mapping() -> (
     Dict[str, Tuple[Dict, DeterministicInputModel]]
 ):
     """Returns a mapping from apiVersion to (context, input_model)"""
-    configs = glob.glob("./data/**/config.json")
+    configs = glob.glob("./data/**/config.json", recursive=True)
     ret = {}
     for config_path in configs:
         with open(config_path, "r", encoding="utf-8") as f:

diff --git a/acto/deploy.py b/acto/deploy.py
@@ -20,7 +20,12 @@ def wait_for_pod_ready(kubectl_client: KubectlClient) -> bool:
     """Wait for all pods to be ready"""
     now = time.time()
     try:
-        p = kubectl_client.wait_for_all_pods(timeout=600)
+        i = 0
+        while i < 3:
+            p = kubectl_client.wait_for_all_pods(timeout=600)
+            if p.returncode == 0:
+                break
+            i += 1
     except subprocess.TimeoutExpired:
         logging.error("Timeout waiting for all pods to be ready")
         return False

diff --git a/acto/engine.py b/acto/engine.py
@@ -16,6 +16,7 @@
 import jsonpatch
 import yaml
 
+from acto.checker.checker import CheckerInterface
 from acto.checker.checker_set import CheckerSet
 from acto.checker.impl.health import HealthChecker
 from acto.common import (
@@ -81,6 +82,11 @@ def apply_testcase(
                 testcase.mutator(field_curr_value), list(path)
             )
             curr = value_with_schema.raw_value()
+        else:
+            raise RuntimeError(
+                "Running test case while precondition fails"
+                f" {path} {field_curr_value}"
+            )
 
     # Satisfy constraints
     assumptions: list[tuple[PropertyPath, bool]] = []
@@ -141,16 +147,16 @@ def check_state_equality(
     # remove pods that belong to jobs from both states to avoid observability problem
     curr_pods = curr_system_state["pod"]
     prev_pods = prev_system_state["pod"]
-    curr_system_state["pod"] = {
-        k: v
-        for k, v in curr_pods.items()
-        if v["metadata"]["owner_references"][0]["kind"] != "Job"
-    }
-    prev_system_state["pod"] = {
-        k: v
-        for k, v in prev_pods.items()
-        if v["metadata"]["owner_references"][0]["kind"] != "Job"
-    }
+
+    for k, v in curr_pods.items():
+        if "owner_reference" in v["metadata"] and v["metadata"]["owner_reference"] is not None and ["owner_references"][0]["kind"] == "Job":
+            continue
+        curr_system_state[k] = v
+
+    for k, v in prev_pods.items():
+        if "owner_reference" in v["metadata"] and v["metadata"]["owner_reference"] is not None and ["owner_references"][0]["kind"] == "Job":
+            continue
+        prev_system_state[k] = v
 
     for obj in prev_system_state["secret"].values():
         if "data" in obj and obj["data"] is not None:
@@ -249,8 +255,8 @@ def __init__(
         runner_t: type,
         checker_t: type,
         wait_time: int,
-        custom_on_init: list[Callable],
-        custom_oracle: list[Callable],
+        custom_on_init: Optional[Callable],
+        custom_checker: Optional[type[CheckerInterface]],
         workdir: str,
         cluster: base.KubernetesEngine,
         worker_id: int,
@@ -288,7 +294,7 @@ def __init__(
         )
 
         self.custom_on_init = custom_on_init
-        self.custom_oracle = custom_oracle
+        self.custom_checker = custom_checker
         self.dryrun = dryrun
         self.is_reproduce = is_reproduce
 
@@ -407,8 +413,8 @@ def run_trial(
         )
         # first run the on_init callbacks if any
         if self.custom_on_init is not None:
-            for on_init in self.custom_on_init:
-                on_init(oracle_handle)
+            for callback in self.custom_on_init:
+                callback(oracle_handle)
 
         runner: Runner = self.runner_t(
             self.context,
@@ -423,7 +429,7 @@ def run_trial(
             trial_dir,
             self.input_model,
             oracle_handle,
-            self.custom_oracle,
+            self.custom_checker,
         )
 
         curr_input = self.input_model.get_seed_input()
@@ -555,7 +561,6 @@ def run_trial(
                             run_result.oracle_result.differential = self.run_recovery(  # pylint: disable=assigning-non-slot
                                 runner
                             )
-                            generation += 1
                             trial_err = run_result.oracle_result
                             setup_fail = True
                             break
@@ -586,7 +591,6 @@ def run_trial(
                 run_result.oracle_result.differential = self.run_recovery(
                     runner
                 )
-                generation += 1
                 trial_err = run_result.oracle_result
                 break
 
@@ -596,10 +600,10 @@ def run_trial(
                 break
 
         if trial_err is not None:
-            trial_err.deletion = self.run_delete(runner, generation=generation)
+            trial_err.deletion = self.run_delete(runner, generation=0)
         else:
             trial_err = OracleResults()
-            trial_err.deletion = self.run_delete(runner, generation=generation)
+            trial_err.deletion = self.run_delete(runner, generation=0)
 
         return TrialResult(
             trial_id=trial_id,
@@ -767,9 +771,9 @@ def run_delete(
         logger = get_thread_logger(with_prefix=True)
 
         logger.debug("Running delete")
-        success = runner.delete(generation=generation)
+        deletion_failed = runner.delete(generation=generation)
 
-        if not success:
+        if deletion_failed:
             return DeletionOracleResult(message="Deletion test case")
         else:
             return None
@@ -884,13 +888,16 @@ def __init__(
 
         self.sequence_base = 0
 
+        self.custom_oracle: Optional[type[CheckerInterface]] = None
+        self.custom_on_init: Optional[Callable] = None
         if operator_config.custom_oracle is not None:
             module = importlib.import_module(operator_config.custom_oracle)
-            self.custom_oracle = module.CUSTOM_CHECKER
-            self.custom_on_init = module.ON_INIT
-        else:
-            self.custom_oracle = None
-            self.custom_on_init = None
+            if hasattr(module, "CUSTOM_CHECKER") and issubclass(
+                module.CUSTOM_CHECKER, CheckerInterface
+            ):
+                self.custom_checker = module.CUSTOM_CHECKER
+            if hasattr(module, "ON_INIT"):
+                self.custom_on_init = module.ON_INIT
 
         # Generate test cases
         self.test_plan = self.input_model.generate_test_plan(
@@ -1125,7 +1132,7 @@ def run(self) -> list[OracleResults]:
                 self.checker_type,
                 self.operator_config.wait_time,
                 self.custom_on_init,
-                self.custom_oracle,
+                self.custom_checker,
                 self.workdir_path,
                 self.cluster,
                 i,

diff --git a/acto/input/constraint.py b/acto/input/constraint.py
@@ -1,5 +1,3 @@
-""""""
-
 from typing import Literal, Optional
 
 import pydantic