Skip to content

refactor: remove worker pool #628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 18, 2024

Conversation

fabriziosestito
Copy link
Contributor

@fabriziosestito fabriziosestito commented Jan 10, 2024

Description

Fixes: #611

This PR refactors the policy server by removing the worker thread pool.
Also, it changes the web framework from warp to axum.

Removing the worker pool is possible because we can now share the EvaluationEnvironment between handlers using axum State extractor.
A semaphore is used to limit the simultaneous evaluations instead of relying on a pool of workers.
By doing this we can remove the complexity of the worker pool bootstrap and the bridge between sync and async world, also it simplifies the evaluation flow since we do not have to rely on channels and async communication to start an evaluation in the handlers.

This PR also takes care of the following:

  • clean-up and major improvements in the layering of the application. The major change is wrapping the policy server code in a PolicyServer type that handles bootstrapping and running the API server.
  • adding JSON responses in case of errors (before we were returning plain text and status code)

Test

Existing tests were moved/updated to comply with the new code.
This PR also takes care of the following:

  • adding back timeout protection integration tests that were removed when we removed the venom e2e tests.
  • refactoring existing tests using axum testing utilities instead of starting a global HTTP server and sending requests against it.

As expected load testing doesn't show major performance improvements, the old and new implementations show briefly the same results.

Manual metrics/tracing was performed.
TODO:

  • test against a real cluster -> ran e2e tests, passing

Additional Information

Since the sigstore crate dependes on the blocking reqwest feature by using and old version of tough, I needed to wrap the fulcio and rekor initialization in a spawn_blocking task.
This should go away once we update sigstore-rs, see:
sigstore/sigstore-rs#320

and

// TODO: remove the spawn blocking once the Sigstore client is async

@fabriziosestito fabriziosestito force-pushed the refactor/remove-workers branch from 82e1390 to ffc3d86 Compare January 12, 2024 12:36
@fabriziosestito fabriziosestito marked this pull request as ready for review January 12, 2024 12:57
@fabriziosestito fabriziosestito requested a review from a team as a code owner January 12, 2024 12:57
@fabriziosestito fabriziosestito self-assigned this Jan 12, 2024
Comment on lines +119 to +130
populate_span_with_admission_request_data(&admission_review.request);

let response = acquire_semaphore_and_evaluate(
state,
policy_id,
ValidateRequest::AdmissionRequest(admission_review.request),
RequestOrigin::Validate,
)
.await
.map_err(handle_evaluation_error)?;

populate_span_with_policy_evaluation_results(&response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the tracing for policies in monitor mode will always have result as "allowed". Which is not desired. The metrics should show the original evaluation result. Otherwise, the users will not get the value of the monitor mode which is see the original results before moving the policy to protect mode.

Copy link
Member

@viccuad viccuad Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this. Still this isn't a regression from this PR, so I'm ok opening an issue and tackling it later. I'm wondering if it is better to:
a. Add a mode field and set allowed value as the output, which will be false if on monitor and rejected.
b. Addmode, monitor_result fields, if mode==monitor then allowed is always true and monitor_result equals the result from the policy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should create a new issue for that, and tackle that outside of this PR.

I also prefer to keep the current behavior, but extend the trace to have:

  • a new field more that states whether the policy is operating in monitor or protect mode
  • a new field raw_result (or something else) that contains the boolean value of the evaluation result before the monitor mode changes that. We could have this field added to all the policies, regardless of their operating mode, or we could have it added only to the traces emitted by policies operating in monitor mode

At the end of the day, I want an operator to be able to run a Jaeger query like: select * from traces where operation_mode = "monitor" and raw_result = false

Copy link
Member

@viccuad viccuad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just some quibbles, looks good.

This is great, many thanks! Great refactor, expanded tests & better logging. Good to make use of maturity of Tokio frameworks (plus I learned about idiomatic Axum usage).

Played with it locally too.

Since there's known users of bare policy-server, we should document in the GH release changelog that we now expect HTTP request with the application/json content header set, and we return better errors.

@fabriziosestito fabriziosestito force-pushed the refactor/remove-workers branch from e191fc7 to 736b28b Compare January 17, 2024 12:30
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
…ecific module

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
…ests

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
@fabriziosestito fabriziosestito force-pushed the refactor/remove-workers branch from 736b28b to f4b4d5a Compare January 17, 2024 13:05
Comment on lines +119 to +130
populate_span_with_admission_request_data(&admission_review.request);

let response = acquire_semaphore_and_evaluate(
state,
policy_id,
ValidateRequest::AdmissionRequest(admission_review.request),
RequestOrigin::Validate,
)
.await
.map_err(handle_evaluation_error)?;

populate_span_with_policy_evaluation_results(&response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should create a new issue for that, and tackle that outside of this PR.

I also prefer to keep the current behavior, but extend the trace to have:

  • a new field more that states whether the policy is operating in monitor or protect mode
  • a new field raw_result (or something else) that contains the boolean value of the evaluation result before the monitor mode changes that. We could have this field added to all the policies, regardless of their operating mode, or we could have it added only to the traces emitted by policies operating in monitor mode

At the end of the day, I want an operator to be able to run a Jaeger query like: select * from traces where operation_mode = "monitor" and raw_result = false

@flavio
Copy link
Member

flavio commented Jan 17, 2024

Fantastic job @fabriziosestito 👏

I left some comments, I think we're pretty close to merge this PR

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>
@flavio
Copy link
Member

flavio commented Jan 18, 2024

Merging, all the concerns/issues have been addressed! 🥳

@flavio flavio merged commit bdda825 into kubewarden:main Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Consider removal of the worker pool
4 participants