-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hubble on the management cluster #1594
Comments
@weatherhog imo this is getting important. More and more providers are switched to cilium and this would give us much more visibility into the network. Do you think Cabbage is a good fit for this? |
@teemow I think it could be a good fit. The main question is, how far are we with cilium? Last time we wanted to test for example linkerd on cilium based clusters the release got stopped and reverted. Do we have stable cilium installations at the moment? |
Afaik CAPO, CAPG are cilium and in production with customers already. CAPA currently being tested already and CAPVCD is also being worked on. Vintage AWS has been reverted but the MCs are with cilium already afaik. So it should be possible to test this already. Please confirm @cornelius-keller @alex-dabija |
CAPG is not yet in production. We are currently deploying the first GA management cluster for our customer. Yes, you can test Hubble on CAPG and CAPA clusters. Both are using Cilium. |
CAPO is in production using cilium as well - only difference is that we didn't replace kube-proxy by cilium yet. |
I've been checking a bit the current status and from what I can see:
Now, the question is what exactly we want to enable and how would like to expose it to our customers.
|
@giantswarm/team-cabbage please look into hubble and into the questions provided by @mcharriere if there are more questions please add them to this issue |
|
Would it be a lot of work to learn with the customer what is useful and what not? I'd suggest to share it from the beginning. @weatherhog let's talk to Atlas then. Maybe we have some low hanging fruits to eg create two prometheus instances per cluster or similar. And then we use a separate prometheus just for the network metrics. But not sure if this is possible. But somehow Atlas needs to enable teams to add more metrics soon. |
|
@TheoBrigitte any chance to have a second prometheus instance for this Hubble use case? Lets have a chat about this. |
@TheoBrigitte just a ping to get a reply on our question. |
@paurosello can u provide feedback on your Hubble usage during the Stability Sprint? |
I do not understand why we would need another Prometheus instance dedicated to Hubble when we already have Prometheus servers in place. |
@TheoBrigitte the question was if our prometheus setup can stand a lot more networking metrics from the workload clusters. Maybe @paurosello or @whites11 can say a bit more as they have now also looked into this. Do we enable all the metrics with cilium already? Is that independent of using the hubble UI? Did we test if with v19 the metrics have increased and if yes how much? Will our prometheus setup still work on the big clusters if we upgrade to cilium? |
We have been playing with cilium and hubble during this stability sprint and I would say enabling hubble should not be a problem (we have a PR for it already to change the values in the app that should be released shortly) due to the following reasons:
@teemow as for your questions:
For now the plan is to use port-forward to access it and if we find it useful improve the setup with ingress and oauth. This will be enabled on MC and WC |
Ok, so if you are already shipping metrics into Prometheus since months, I guess we are fine right, or is there anything you need from Atlas ? |
PR to enable hubble/monitoring for cilium: giantswarm/cilium-app#56 |
@TheoBrigitte cilium isn't released for vintage yet. It is comming with v19. I think @paurosello is talking about the management clusters only. @T-Kukawka we need to make sure that we know prometheus doesn't break if we add cilium to the big clusters. |
Sure @teemow let me add an issue so we test it with Atlas when we have release ready |
general Cilium v1.13 work for Phoenix: #2131 |
this is running on gaia and gremlin now |
@T-Kukawka can you please schedule a session ? so we work out together how Prometheus behave with Cilium. |
We will move with testing this week so when we are ready we will involve atlas for WCs testing, we can sync here; https://github.com/giantswarm/giantswarm/issues/26139 On MCs it is already running on Gaia and Gremlin so you can check this async |
removing this von cabbage as this is in phoenix |
hubble is running in all MCs now. |
I tried exposing hubble ui on the MCs through ingress, there are a bunch of problems:
|
Probably because following external name services is disabled by default in https://github.com/giantswarm/nginx-ingress-controller-app/blob/main/helm/nginx-ingress-controller-app/values.yaml#L192 |
Oh thanks, I thought it was something along those lines but couldn't track it down |
hubble+auth available for vintage in giantnetes-terraform 14.13.0. Deployed in gaia, gremlin, giraffe for now (will be enabled everywhere with next round of MC updates) |
@teemow something I don't get from this ticket: how would it be possible to see hubble data from the workload clusters in the MC? |
@whites11 if hubble doesn't support multi-cluster then we probably have to think about either adding hubble per WC on the MC or to have hubble on each WC. FYI @weatherhog (connectivity) @TheoBrigitte (monitoring) what do you think about it? Imo it would be helpful not only for platform teams but also for development teams to get access and see the connectivity between different services. |
Afaik @teemow the Huble is enabled by default in our Cilium already: https://github.com/giantswarm/cilium-app/blob/main/CHANGELOG.md#080---2023-03-08. This means there will be Huble on each WC and MC(should already be the case) |
@T-Kukawka but we need to decide how we want to enable it for customers. I am not against having it on the WCs but I'd like to discuss this a bit. We could also have the UI on the MC. And we're constantly moving more functionality into the MC to have more of a single pane of glass. |
From what I see here I do not understand the implication of this solution in our observability stack. I think having a session to explain what this is about would be best for me to comprehend this better and see how we move forward on this. |
i think we can have a joined session with Phoenix/Cabbage/Atlas but then topic should be continued by 2 other teams and should not block the release. This could come next as an improvement as in theory that is available already. |
@T-Kukawka do you organize a session then? |
send out an invite for 11.05, 13:30 |
Meeting notes:
Next steps:
|
Removing from Phoenix board - adding to Turtles as the 'responsible' |
As hubble is rolled out we will close this issue for now and create if needed for any further steps with hubble a follow up epic in the turtles board |
As we transition to cilium on all providers it would be great to add Hubble on the management cluster to get better visibility of the network on each workload cluster.
Value
See https://github.com/cilium/hubble for more details.
Please also allow customers to access Hubble.
The text was updated successfully, but these errors were encountered: