-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat)(cr) define the chaos custom resource specifications #3
Conversation
Signed-off-by: ksatchit <karthik.s@openebs.io>
…eed upon Signed-off-by: ksatchit <karthik.s@openebs.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ksatchit I have provided some comments here.
--- | ||
## This is the chaos engine profile requested by dev for the nginx | ||
## app, i.e., the user facing custom resource. Mapped to a dedicated | ||
## contrller which triggers the actual chaos experiments as per |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spellcheck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will correct
@@ -1,7 +1,101 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on below folder structure?
// deploy/nginx/chaosengine.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that chaos engine is the app-mapping chaos resource. However, the general spec of the engine doesn't contain any app-specific elements. One potential reference to nginx/specific-app may come if we include experiments belonging to an "nginx" chart - like below - but that is an optional inclusion. By having this folder structure - the repo will contain several "similar" yamls w/ minimum changes.
Instead should we leave it to developer to just take the "reference" engine we have in deploy/crds/litmuschaos_v1alpha1_chaosengine_cr.yaml
and make his changes.
Would like to change name to chaosengine.yaml over the current name, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok... we can avoid the org & version value in the name at-least
@@ -1,7 +1,43 @@ | |||
apiVersion: litmuschaos.io/v1alpha1 | |||
--- | |||
## An experiment is the definition of a chaos test and is listed as an item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to explain via charts. In other words can the folder structure be as follows?
// deploy/nginx/charts/0.1.0/chaosexperiment.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on this above discussion (on path/naming conventions) - thinking more of // deploy/charts/nginx/0.1.0/chaosexperiment.yaml
(so that there can be deploy/charts/kubernetes/0.2.0/chaosexperiment.yaml
and others as well) ... However, if we are thinking of a separate chart hub etc,. maybe this structure should go there and this repo should contain the standard template only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok..
In future we have to think of validating the yamls pushed into hub via some means, e.g travis or ci etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shall keep this structuring in the chart hub - when it is defined.
## resource. Consists of default values for chaos-specific params | ||
## which are expected to be overridden from the chaosExperiment | ||
|
||
apiVersion: litmus.io/v1alpha1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to call this as ChaosGraph
or ChaosType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok with both @AmitKumarDas .. I thought graph conveys some low-level descriptor (actual action). But it was also chosen as nothing better was striking my head :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also have a reference there to litmus.io over litmuschaos.io - will change
size: 3 | ||
|
||
chart: | ||
- name: kubernetes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused here.
What are we trying to convey by putting chart as kubernetes & its chart version?
Does this relate to ChaosTemplate/ChaosType/ChaosGraph?
Do we want to convey below?
kind: ChaosExperiment
metadata:
name: disappearing-pods
namespace: nginx
labels:
chart/type: nginx
chart/version: 0.1.0
spec:
chaosGraph:
type: kubernetes
name: pod-delete
kind: ChaosGraph
metadata:
name: pod-delete
namespace: nginx
labels:
chart/type: kubernetes
chart/version: 2.0
spec:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, above fits my thought process as well. Experiments, if belonging/pulled by a chart -> the chaosGraph/Type also gets pulled as part of it in an implict way - because of the 1-1 mapping between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be OK for the experiment to reuse an "older" or "newer" template, i.e., w/ a different chart version - as a backward/forward compatibility feature. This is feasible as the executor is embedded into the chaosTemplate itself!
size: 3 | ||
definition: | ||
labels: | ||
name: simple-pod-failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO name has to be very specific.
Pod failure can be done via various ways, e.g.:
- pod delete
- pod oom
- pod's image not found
- etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. will update
env: | ||
- name: ANSIBLE_STDOUT_CALLBACK | ||
value: default | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need not have extra spaces unless & until its needed
name: example-chaosexperiment | ||
|
||
## Eventually launched chaos litmusbook/job will bear <name>-<hash> | ||
name: disappearing-pods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this should convey what it will try to do. In other words, it should convey the exact chaos experiment that will be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ChaosExperiment
& ChaosTemplate/ChaosGraph
have same name?
If above is a good thing to do, then we can remove the entire mapping to ChaosTemplate/ChaosGraph
from within ChaosExperiment
entirely.
AFAIK ChaosExperiment will do only one specific task. ChaosGraph has a direct relation with underlying infrastructure to execute this specific task. e.g. spawn a k8s job that deletes a pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to rethink if infrastructure should be abstracted as well. e.g. killing a pod via a k8s job might be different from killing a pod via some api service or other chaos tooling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Agree on name changes, will do that.
-
Very nice! Will think on addition to chaosGraph spec to package-in the executor nature (via traditional k8s litmus job, some other api - maybe directly reuse chaostoolkit api for example etc.,)
-
The experiment-graph/template was originally conceived in order to keep the spec of a "test"/"chaos experiment" homogeneous and common irrespective of what it does underneath. This interface would then reference low-level specs which can change greatly on a case-by-case basis. Now there are two developments which make us rethink this approach:
a/ Emergence on chaosEngine which is now the app/user-facing resource - which will be homogeneous.
b/ Introduction of
spec.definition
in the chaosGraph which packages all heterogeneous chaos parameters as ENV - which satisfies in a way the strict-definition/uniformity requirements of chaosGraph.Now, even w/ (a) & (b), are there are some situations which necessitates two separate CRs ? I can think of this future case: An experiment might be the result of a (chaosjob-1) + (chaosjob-2) + (chaosjob-3) executed in that order to simulate a "multi-component" failure or an "issue-build-up" ? It can be a developer discretion also, to define it that way (This requirement varies slightly from the engine listing <exp1, exp2, exp3 by rank> - this is more of a batch run of chaos like diagnostics... the case we are discussing is the experiment-itself which is a case of more than 1 chaos event/action.) This thinking violates the current approach of 1-1 mapping, and may need a corresponding change to the spec - but is an important consideration nonetheless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current decision on the 3rd point based on discussions with @AmitKumarDas :
- Multi-chaos experiments is definitely a possibility - but we currently feel these are best handled by the executors. Exposing programming constructs into the YAMLs may not be wise at this point.
- However, we will still go ahead with both the CRs to cover for cases where a user-defined chaosGraph is added and experiment can refer to it while maintaining the components abstraction. Future ones might come up. Moreover, there is no controller mapped against the resource anyway - it is only to hold data.
- We will have same names for chaosExperiment and chaosGraph to enable simple identification. User can specify different on a need basis.
- It will be of great value to add a "description" attribute to the experiment spec to hold information about what it does. This will be very useful for developers and help abstract-away lower-level chaos details (also another reason to maintain a separate CR).
Signed-off-by: ksatchit <karthik.s@openebs.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Minor comments @ksatchit on naming conventions
@@ -1,7 +1,101 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok... we can avoid the org & version value in the name at-least
@@ -1,7 +1,43 @@ | |||
apiVersion: litmuschaos.io/v1alpha1 | |||
--- | |||
## An experiment is the definition of a chaos test and is listed as an item |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok..
In future we have to think of validating the yamls pushed into hub via some means, e.g travis or ci etc.
deploy/crds/chaosengine.yaml
Outdated
|
||
schedule: | ||
interval: "" | ||
excluded-times: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should all the property names follow a certain naming standard?
e.g. why not excludedTimes
& excludedDays
deploy/crds/chaosengine.yaml
Outdated
schedule: | ||
# quarter-hourly, half-hourly, hourly, bi-hourly, trihoral, daily | ||
interval: "half-hourly" | ||
excluded-times: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment
deploy/crds/chaosexperiment.yaml
Outdated
chart/version: 0.9 | ||
|
||
description: | ||
data: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename data
to message
Signed-off-by: ksatchit <karthik.s@openebs.io>
(refactor)exporter: support non-default ns & add engine name label to metrics
Signed-off-by: ksatchit karthik.s@openebs.io