Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enhancement proposal for result forwarding #69

Conversation

rhmdnd
Copy link

@rhmdnd rhmdnd commented Jul 15, 2022

This commit writes down the proposal for implementing a results
forwarding mechanism in the compliance operator.

Co-Authored-By: Juan Antonio Osorio juan.osoriorobles@eu.equinix.com

enhancements/results-forwarding.md Show resolved Hide resolved
enhancements/results-forwarding.md Show resolved Hide resolved
enhancements/results-forwarding.md Outdated Show resolved Hide resolved
@rhmdnd
Copy link
Author

rhmdnd commented Jul 25, 2022

/retest

Required: True
Type: string

Reference to a server certificate/keypair secret for mutual TLS.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confused as per why do we need the server cert? I thought the aggregator was a client only?
Or is this secret meant for the GRPC service?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was for the gRPC server, but I'll let @mrogers950 and @JAORMX clarify.


#### `resultForwarding.provider`

Required: True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly a dumb question, but what does 'Required' mean here? "Does not have a default and must be set"?
Shouldn't grpc to the local crd-creating service be the default?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was even wondering if we should have a grpc-local provider as the default that would nominally talk over grpc, but not over TCP, but a UNIX socket in some shared volume. This would allow us to get away by not requiring mTLS for the communication betweeen the components..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly a dumb question, but what does 'Required' mean here? "Does not have a default and must be set"?
Shouldn't grpc to the local crd-creating service be the default?

Yeah - I think required means it must be specified by the user.

Maybe we need to go through the various options, and pick the best default for backwards compatibility. A grpc provider makes a lot of sense for the external storage case, but it requires the end user to know where they want to send results (something they don't necessarily care about today) since they go into etcd, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So of course the downside of having the grpc service exposed over local sockets only is that we don't test the network forwarding and would have more configurations to test..

With the grpc-over-network approach, how would the default CO deployment look like then? I was imagining we'd have one more deployment with the grpc server, a service and then the grpc client would use http://service as the endpoint (and since we'd know the service name, everything could just use defaults..)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrogers950 do you have an opinion?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's true that we only need to forward the objects that the aggregator is responsible for (results and remediations, see my comment above), my thought is that our "local" mode would just be the existing CR creation and always on by default, and if resultForwarding.provider is set, there is also an attempt to forward. So resultForwarding.Provider would be optional, and we provide another option resultForwarding.DisableLocal that is false by default, and only takes effect when set to true and also provider == set.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the topic of the local gRPC traffic, since this is only results, I think that making a local provider and all would be overkill (requires the aggregator to break up into a client/server). For the evidence proposal that type of pattern makes more sense for what we want to accomplish there, since that's where our existing component client+servers are involved.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I guess your proposal would also be safer (more stable) in the sense that the aggregator wouldn't change that drastically from a self-contained binary to a client/server.

Just FWIW, the reason I proposed the always-forwarder was just that it felt architecturally (even if local) much cleaner where the aggregator would be just a forwarder to a handler, because we're have to implement the forwarding either way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to moving towards that if it makes sense for the rest of the forwarding features. We could still build the server implementation but keep it as a test component to run the aggregator client against.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a good compromise. We need a testbed anyway and having a test server that produces the CRDs would also make it easier to do e2e testing.

Good idea!

@rhmdnd rhmdnd force-pushed the result-forwarding-enhancement branch from 67e4594 to d586a0f Compare August 2, 2022 17:12
to a configured implementation. In this case, the **aggregator** will act as a
gRPC client. The **resultserver** then becomes a gRPC server, subject to the
evidence forwarding provider and will be renamed **evidence-persistor** and
responsible for writing results.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case - what does the aggregator currently use to pass results to the resultserver?

Or are we just saying that the aggregator will be a gRPC client. An example server implementation would be resultserver or rhmdnd/compserv?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former.

Currently, the aggregator parses all the results coming as ConfigMaps from the nodes.

It "aggregates" them together, forming groups from the different machine pools. e.g. one result is applicable to a set of nodes, not just one node. This is done also to detect inconsistencies in pools: "One node deviated from the benchmark configuration, it should be fixed."

Finally, it takes that aggregated result and creates the CRDs based on this.

The idea is to change the last step from the model and turn the aggregator into a gRPC client that always forwards. This way, that code becomes more general and we can then focus on writing server implementations: Be it the default one that creates CRDs, or compserv that centralizes the results.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional details. I attempted to clarify that in the wording.

@rhmdnd rhmdnd force-pushed the result-forwarding-enhancement branch from d586a0f to 0e08eac Compare August 29, 2022 19:50
to a configured implementation. In this case, the **aggregator** will act as a
gRPC client. The **resultserver** then becomes a gRPC server, subject to the
evidence forwarding provider and will be renamed **evidence-persistor** and
responsible for writing results.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former.

Currently, the aggregator parses all the results coming as ConfigMaps from the nodes.

It "aggregates" them together, forming groups from the different machine pools. e.g. one result is applicable to a set of nodes, not just one node. This is done also to detect inconsistencies in pools: "One node deviated from the benchmark configuration, it should be fixed."

Finally, it takes that aggregated result and creates the CRDs based on this.

The idea is to change the last step from the model and turn the aggregator into a gRPC client that always forwards. This way, that code becomes more general and we can then focus on writing server implementations: Be it the default one that creates CRDs, or compserv that centralizes the results.


#### `resultForwarding.provider`

Required: True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say always requiring the gRPC traffic to go through the network would help in keeping things consistent. e.g. you wouldn't need a code-path to handle creating a unix socket and having a separate goroutine in the aggregator to handle this.


#### `resultForwarding.provider`

Required: True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree that we should choose a sane default that's backwards compatible, that is what users expect and would help in migration.

enhancements/results-forwarding.md Show resolved Hide resolved
enhancements/results-forwarding.md Outdated Show resolved Hide resolved
to a configured implementation. In this case, the **aggregator** will act as a
gRPC client. The **resultserver** then becomes a gRPC server, subject to the
evidence forwarding provider and will be renamed **evidence-persistor** and
responsible for writing results.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional details. I attempted to clarify that in the wording.


#### `resultForwarding.provider`

Required: True
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this should still be required, then?

Should we also have a field for marking a default value?

enhancements/results-forwarding.md Show resolved Hide resolved
enhancements/results-forwarding.md Show resolved Hide resolved
@rhmdnd rhmdnd force-pushed the result-forwarding-enhancement branch from 0e08eac to 6f6c110 Compare August 30, 2022 15:37
@rhmdnd rhmdnd force-pushed the result-forwarding-enhancement branch from 6f6c110 to b0822ea Compare August 30, 2022 19:11
This commit writes down the proposal for implementing a results
forwarding mechanism in the compliance operator.

Co-Authored-By: Juan Antonio Osorio juan.osoriorobles@eu.equinix.com
@rhmdnd rhmdnd force-pushed the result-forwarding-enhancement branch from b0822ea to 06ab0c6 Compare September 1, 2022 19:38
@jhrozek
Copy link

jhrozek commented Sep 2, 2022 via email

@JAORMX
Copy link
Collaborator

JAORMX commented Sep 2, 2022

  • (unsure about this part) if the CM is not set and the autodetection is not possible, what then? A random string (that would have to be stable across re-runs)? Or just a hard fail?

I like the idea of having a random string. It's just ideal to have a way to detect a unique deployment.

@rhmdnd
Copy link
Author

rhmdnd commented Sep 6, 2022

I like the idea of having a random string. It's just ideal to have a way to detect a unique deployment.

Ok, let me summarize to make sure I understand where we are:

The Compliance Operator will check for a ConfigMap within the operator namespace (e.g., openshift-compliance) that is specific to the compliance operator's need (e.g, data.compliance_cluster_id = "b876fd70-b6fb-4c9d-9bb6-b53e87d10ac7"). If this configuration map doesn't exist when Compliance Operator's gRPC forwarding implementation needs it, it will create it using a random string for the compliance_cluster_id.

Administrators can modify this value if they choose to (e.g, data.compliance_cluster_id = "us-east-pci-dss"). The Compliance Operator will only care if the value exists, so that it can use it.

We could recommend the users populate this ConfigMap prior to using scan settings with gRPC forwarding if they want consistency from the beginning of the forwarding history.

Is that correct?

Copy link

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
imo we should just continue iterating on the doc as we implement the API

Copy link
Collaborator

@sheriff-rh sheriff-rh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - one small educational question. I like this proposal, seems like it would be very helpful.

Requires a network connection to forward results after each scan. The
Compliance Operator should validate the endpoint URL and fail early if it is
malformed. If the Compliance Operator cannot connect to the gRPC endpoint, it
should retry and issue an alert.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to answer the question posed on line 436, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - I think we could supply a configuration, through documentation, that tells users how to configure the compliance-operator to forward results and how to store them locally.

The safety net in that case would be that the results are saved somewhere in a persistent volume if the gRPC forwarding fails.

@openshift-ci
Copy link

openshift-ci bot commented Sep 20, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jhrozek, rhmdnd, sheriff-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rhmdnd
Copy link
Author

rhmdnd commented Sep 27, 2022

/lgtm
imo we should just continue iterating on the doc as we implement the API

I'm on board with updating the document as we implement the API and configuration.

@JAORMX @mrogers950 @Vincent056 thoughts?

@sheriff-rh sheriff-rh removed their assignment Nov 3, 2022
@xiaojiey
Copy link
Collaborator

xiaojiey commented Feb 7, 2023

/label qe-approved

@openshift-merge-robot openshift-merge-robot merged commit 0cf1555 into ComplianceAsCode:master Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants