-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-27574 Implement ClusterManager interface for Kubernetes #4979
base: master
Are you sure you want to change the base?
Conversation
Transitive hull of the new dependencies. Should we be shading these?
|
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
For better work together with K8s, I think we need to discuss more about how to define the actions... A simple kill of a region server is not enough to reduce the number of region servers, as the deployment will soon start a new one... Simply setting the number of pods is not applicable here as we can not control which region server it will kill... |
I agree that there is a longer arch of a discussion re: HBase and a container runtime platform like Kubernetes. I believe that Kubernetes implements its own form of chaos, and I have not yet explored an implementation based on that tooling. However. Just like the CM-based and coprocessor-based implementations before it, this has allowed me to use our existing ITBLL + Chaos tools in an environment that is convenient to what I have available to me in my organization. It's convenient to be able to run the same processes in the new deployment environment and have everything basically function. I'd like to share it with the community, especially if there's a path to us using similar tools as part of our project's resource budget. |
Then let's not use the generate "KubernetesClusterManager" as the name? Maybe later we will have other type of K8s cluster manager... |
hbase-it/src/test/java/org/apache/hadoop/hbase/KubernetesClusterManager.java
Outdated
Show resolved
Hide resolved
hbase-it/src/test/java/org/apache/hadoop/hbase/KubernetesClusterManager.java
Outdated
Show resolved
Hide resolved
A basic implementation that supports taking destructive actions. Assume that services are running behind a resilient `Deployment` of some kind, and that the cluster will handle starting up replacement processes. Requires specification of a scoping namespace.
340aaaf
to
47af1fa
Compare
Rebased and PR feedback. These changes are untested. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort of wonder if we should be using the Exec api to try to send normal kill signals to the processes within the pods, rather than use the pod delete api.
The problem with using the pod api is killing a pod is likely to be graceful -- people tend to use stop hooks, pod disruption budgets, and finalizers as ways to control how pods die in a normal case. In my environment for example, deleting a pod is a common and safe thing to do for most applications. So it's not really creating as much chaos as a ChaosMonkey should. I fear someone would run this and feel like they did a good chaos test, but then see weirder failure modes down the line when pods less gracefully fail.
What do you think?
That's a good point @bbeaudreault . I wonder if the current ssh-based ClusterManager can be coerced into running |
Checked the API description The deleteNamespacedPod method has a gracePeriodSeconds, 0 means delete immediately, so I think it could archive what we want. But what I concern more is about how to correctly support stop and kill, as in K8s, if you do not shift the replica count, the framework will launch a new pod right after you delete a pod... I think this is exactly what we want, but seems still not fully implemented yet... And we also need to change some semantics for the cluster manager. For example, on K8s, it is useless to specify a hostname when starting a new region server, so maybe we could change the API to "startNewRegionServer", as even for non k8s environment, I do not think we must start a region server on a given host, we just need to start a new one, right? And for stop, kill, restart, maybe we could also change the semantice so it would fit both k8s and non k8s environment. For example, we just remove stop and kill, only leave restart there, but we provide a flag to indicate how to stop the region server, i.e, a graceful shutdown, or a force kill. And we provide another api called reduceRegionServerNumber. For K8s environment, it is just a API call, and for non k8s environment, we can randomly select a region server to stop. This is not perfect but I think it could fit most of our test scenarios. What do you guys think? Thanks. |
And I'm a bit interest on how do you guys manage datanode or namenode on K8s? They have local storage, so if you delete the pod and launch a new one at other places, the data will be lost... Use stateful set? Thanks. |
Personally I prefer to use Exec API for this. It seems somewhat artificial to try reducing the pod count just for the sake of it. IMO chaos monkey is for testing both hbase handling and deployment automation. Outside k8s, if you stop a regionserver process you better have monit or sysctl to start it back up. In kubernetes, this is handled for you. So if chaos sends a kill 9, it's doing a good job of testing hire both systems handle a regionserver dying. Maybe in kubernetes you have an init container which gets in the way of the pod gracefully having a regionserver container dying. Chaos would expose that. Otherwise I think kill -stop is an important feature and I wouldn't want to bury it in an option. So that's another reason just replacing ssh with Exec api would be nice. |
We currently don't run DataNodes in k8s. For namenodes we use StatefulSet with EBS backed. For DataNodes we don't want to use EBS, too expensive. When we eventually get to then we plan to use FlexVolumes to basically provision space on particular SSD-backed kube nodes. So if pod restarts, it would go to the same node if it's available. If not, it would go elsewhere and lose its data but this is how things work outside k8s and is handled by hdfs replication. Sadly can't give more details than this right now because it's been on hold for a while so we can work on other things. |
A basic implementation that supports taking destructive actions. Assume that services are running behind a resilient
Deployment
of some kind, and that the cluster will handle starting up replacement processes. Requires specification of a scoping namespace.