-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
insights: Insights-gateway #447
Conversation
/assign @mfojtik @smarterclayton |
From business perspective it makes sense to me. To be absolutely clear, the components/operators using the c.r.c Insights gateway will be responsible for payload size and data privacy. Insights gateway won't be anyhow processing the data. Correct @iNecas ? |
Correct @radekvokal |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: iNecas The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
096e077
to
355b418
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The drawbacks are all things I consider significant - I would probably lean towards a library approach (and expected it to be discussed here if there are gaps).
|
||
Given insights-operator is already part of the OCP cluster and has proper credentials | ||
configured already as part of the installation, we're proposing for it to expose | ||
a proxy for other components to be able to send the payloads on their behalf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm - I would probably prefer these components remain decoupled architecturally. Insights is providing support data about the cluster - it really is not a generic operator (generic operators have a high bar).
There must be an extremely strong reason to expose a new generic api - it would probably be a better start to expose a library those operators can use rather than trying to provide a service. The justifications over providing a library would have to be very compelling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looping @jhjaggars and especially @chambridge to perhaps fill in some gaps around requirements/justification and more thoughts on the library alternative.
Few things regarding the library approach:
- some targeted operators are implemented with ansible (https://github.com/project-koku/korekuta-operator/blob/5fe7938ee87ca08dec6181cc0e37083979def62c/roles/collect/tasks/main.yml#L339): the library doesn't seem to help much with that
- with the library approach, the operator would need to have access to secrets in
openshift-config
namespace: would we want to allow this or minimize the access to that resource? Or would it mean the credentials need to be handled differently. The gateway approach limits the access to the upload only: nothing else exposed. - the library still doesn't allow the single place for overview on what data are flowing to c.r.c.
- additional requirements around networking we've already seen and solved in insights-operator (such as service-specific proxy as implemented in openshift/insights-operator@65e2183) would need to be implemented again or left as gaps.
As of extending the operator to be a generic one, we could argue the scope is still narrowed around providing the c.r.c-related services, not to be a single place for all outgoing traffic beyond the c.r.c. ingress.
As mentioned at the bottom of this doc (https://github.com/openshift/enhancements/pull/447/files#diff-74d208115f9ccd14e14cb6d90d466a9eR196) long term, we would like to see the generic solution that even insights-operator could use provided by OCP core.
For the API, would adding v1alpha
in the endpoint help with the concerns anyhow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal here is to create a secure channel into cloud.redhat.com for data that doesn't fit telemetry yet provide value to customer thru applications build on c.r.c -- here's a discussion why Cost Management can't meet the budget and cardinality requirements of monitoring team and also why the current dataflow into c.r.c is not optimal -- https://docs.google.com/document/d/1gdxlc37-CMniwptdccob4C3fwrAZNgyOQFDNL5msCsk/edit#heading=h.v7h8j8wxgjo2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a library approach solves a couple of the issues we're wanting to address:
- abstract the mechanics of talking to cloudDot (http client configuration and uri stuff)
- centralize the proxy configuration
It doesn't address gaining access to credentials necessary for authentication with cloudDot.
Additionally, a library approach grants us the ability to just do it, which I really like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm waiting for more feedback on this proposal from the OLM operators interested into this future.
Given we would go the library approach, where should the proxy configuration live?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can someone describe how the library would work with an Ansible based operator?
Would this be something we'd have to put into an Ansible collection? Also would like to understand the impact on EXD how the library would be built and made available for a certified operator release? I think the documentation for the library would need some information on how an operator would gather the pull secret config plus any necessary RBAC role/cluster role needs.
But if all that is easy then I'm okay with a library approach as it might have a side effect that it doesn't need all the backporting into z streams.
I assume the library would also take into account the cluster global proxy configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the ansible-operator, I can think of two ways (based more on how I think ansible-operator works, without much exprience):
- explosing the library via executable that would be available in the operator image and used instead of the
curl
it uses today. You would need to figure out how to add the binary to your image. Disadvantage would be it would need to read configuration on every upload, or deal with caching. - running as a side-car container, exposing the local endpoint to provide the gathering functionality
Regarding the side-car: what do you think about this being the generic regardless it's ansible operator or not @smarterclayton, @jhjaggars? This way, we could also standardize on metrics and would actually feel like k8s way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the advantages of using the library from the solution we have today - coded by ourselves? It seems that the long term solution of an upload service will better fix the air-gapped use cases, and would simplify credential management and RBAC configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least two advantages that I can think of:
- standardization of how to make the connection
- clear instructions for users to follow
The sidecar container approach (basically a service wrapper around the library) should address the ansible operator case. Would it be a big lift to include another image in the existing certification process for each operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: could the OLM managed operators create this bindings as part of the deployment process (or ask the administrator to do so?):
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: some-olm-operator
namespace: openshift-config
rules:
- apiGroups:
- ""
resources:
- secrets
resourceNames:
- pull-secret
- support
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: some-olm-operator
namespace: openshift-config
roleRef:
kind: Role
name: some-olm-operator
subjects:
- kind: ServiceAccount
name: operator
namespace: some-olm-operator
If so, the sidecar could also pull the right token from the pull secret and watch for the changes.
The sidecar is probably my favorite one also as it could standardize on things like tracked metrics.
each needing additional data to be sent from the OCP clusters, such as: | ||
|
||
- [cost management](https://github.com/project-koku/korekuta-operator) | ||
- [subscription watch](https://github.com/chambridge/subscription-watch-operator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are generally moving towards all this data will be sent through telemetry in the future
|
||
- [cost management](https://github.com/project-koku/korekuta-operator) | ||
- [subscription watch](https://github.com/chambridge/subscription-watch-operator) | ||
- [marketplace](https://github.com/redhat-marketplace/redhat-marketplace-operator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over time, I expect all of this data to flow through telemetry once we have completed the necessary data interlock.
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
No description provided.