refactor: remove worker pool #628

fabriziosestito · 2024-01-10T13:42:38Z

Description

Fixes: #611

This PR refactors the policy server by removing the worker thread pool.
Also, it changes the web framework from warp to axum.

Removing the worker pool is possible because we can now share the EvaluationEnvironment between handlers using axum State extractor.
A semaphore is used to limit the simultaneous evaluations instead of relying on a pool of workers.
By doing this we can remove the complexity of the worker pool bootstrap and the bridge between sync and async world, also it simplifies the evaluation flow since we do not have to rely on channels and async communication to start an evaluation in the handlers.

This PR also takes care of the following:

clean-up and major improvements in the layering of the application. The major change is wrapping the policy server code in a PolicyServer type that handles bootstrapping and running the API server.
adding JSON responses in case of errors (before we were returning plain text and status code)

Test

Existing tests were moved/updated to comply with the new code.
This PR also takes care of the following:

adding back timeout protection integration tests that were removed when we removed the venom e2e tests.
refactoring existing tests using axum testing utilities instead of starting a global HTTP server and sending requests against it.

As expected load testing doesn't show major performance improvements, the old and new implementations show briefly the same results.

Manual metrics/tracing was performed.
TODO:

test against a real cluster -> ran e2e tests, passing

Additional Information

Since the sigstore crate dependes on the blocking reqwest feature by using and old version of tough, I needed to wrap the fulcio and rekor initialization in a spawn_blocking task.
This should go away once we update sigstore-rs, see:
sigstore/sigstore-rs#320

and

policy-server/src/lib.rs

Line 58 in 624ca3b

// TODO: remove the spawn blocking once the Sigstore client is async

jvanz · 2024-01-16T16:38:26Z

src/api/handlers.rs

+    populate_span_with_admission_request_data(&admission_review.request);
+
+    let response = acquire_semaphore_and_evaluate(
+        state,
+        policy_id,
+        ValidateRequest::AdmissionRequest(admission_review.request),
+        RequestOrigin::Validate,
+    )
+    .await
+    .map_err(handle_evaluation_error)?;
+
+    populate_span_with_policy_evaluation_results(&response);


As far as I can see, the tracing for policies in monitor mode will always have result as "allowed". Which is not desired. The metrics should show the original evaluation result. Otherwise, the users will not get the value of the monitor mode which is see the original results before moving the policy to protect mode.

I agree with this. Still this isn't a regression from this PR, so I'm ok opening an issue and tackling it later. I'm wondering if it is better to:
a. Add a mode field and set allowed value as the output, which will be false if on monitor and rejected.
b. Addmode, monitor_result fields, if mode==monitor then allowed is always true and monitor_result equals the result from the policy.

I think we should create a new issue for that, and tackle that outside of this PR.

I also prefer to keep the current behavior, but extend the trace to have:

a new field more that states whether the policy is operating in monitor or protect mode

a new field raw_result (or something else) that contains the boolean value of the evaluation result before the monitor mode changes that. We could have this field added to all the policies, regardless of their operating mode, or we could have it added only to the traces emitted by policies operating in monitor mode

At the end of the day, I want an operator to be able to run a Jaeger query like: select * from traces where operation_mode = "monitor" and raw_result = false

src/lib.rs

src/evaluation/evaluation_environment.rs

viccuad

LGTM! Just some quibbles, looks good.

This is great, many thanks! Great refactor, expanded tests & better logging. Good to make use of maturity of Tokio frameworks (plus I learned about idiomatic Axum usage).

Played with it locally too.

Since there's known users of bare policy-server, we should document in the GH release changelog that we now expect HTTP request with the application/json content header set, and we return better errors.

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

…ecific module Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

…ests Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

flavio · 2024-01-17T13:44:47Z

src/api/handlers.rs

+    populate_span_with_admission_request_data(&admission_review.request);
+
+    let response = acquire_semaphore_and_evaluate(
+        state,
+        policy_id,
+        ValidateRequest::AdmissionRequest(admission_review.request),
+        RequestOrigin::Validate,
+    )
+    .await
+    .map_err(handle_evaluation_error)?;
+
+    populate_span_with_policy_evaluation_results(&response);


I think we should create a new issue for that, and tackle that outside of this PR.

I also prefer to keep the current behavior, but extend the trace to have:

a new field more that states whether the policy is operating in monitor or protect mode

a new field raw_result (or something else) that contains the boolean value of the evaluation result before the monitor mode changes that. We could have this field added to all the policies, regardless of their operating mode, or we could have it added only to the traces emitted by policies operating in monitor mode

At the end of the day, I want an operator to be able to run a Jaeger query like: select * from traces where operation_mode = "monitor" and raw_result = false

src/lib.rs

flavio · 2024-01-17T13:59:55Z

Fantastic job @fabriziosestito 👏

I left some comments, I think we're pretty close to merge this PR

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

flavio · 2024-01-18T07:17:36Z

Merging, all the concerns/issues have been addressed! 🥳

fabriziosestito force-pushed the refactor/remove-workers branch from 82e1390 to ffc3d86 Compare January 12, 2024 12:36

fabriziosestito marked this pull request as ready for review January 12, 2024 12:57

fabriziosestito requested a review from a team as a code owner January 12, 2024 12:57

fabriziosestito self-assigned this Jan 12, 2024

jvanz requested changes Jan 16, 2024

View reviewed changes

viccuad reviewed Jan 17, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

viccuad reviewed Jan 17, 2024

View reviewed changes

src/evaluation/evaluation_environment.rs Outdated Show resolved Hide resolved

viccuad approved these changes Jan 17, 2024

View reviewed changes

fabriziosestito force-pushed the refactor/remove-workers branch from e191fc7 to 736b28b Compare January 17, 2024 12:30

fabriziosestito added 12 commits January 17, 2024 14:05

build(deps): add axum, remove warp and add axum related dev dependencies

d4f1d45

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

test: add test utils module

bb7cc3a

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

refactor: remove worker pool, refactor evaluation environment in a sp…

ccf48e1

…ecific module Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

refactor: refactor api module

6350129

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

refactor: move tracing related functions to a separate module

2319ac0

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

refactor: update metrics module

87a9a50

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

refactor: update config

7d6267f

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

refactor: add PolicyServer type

318e138

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

refactor: update main

65c0579

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

test(integration): update integration tests; add timeout protection t…

9216f2d

…ests Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

chore: add TODO comment

85fcd05

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

fix: move metrics initialization and tracing shutdown to main

f4b4d5a

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

fabriziosestito force-pushed the refactor/remove-workers branch from 736b28b to f4b4d5a Compare January 17, 2024 13:05

viccuad approved these changes Jan 17, 2024

View reviewed changes

flavio reviewed Jan 17, 2024

View reviewed changes

fabriziosestito added 2 commits January 18, 2024 07:06

fix: enter span in the evaluation spawn_blocking task

5f4ee51

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

fix: change eprintln! to error!

e42cf9a

Signed-off-by: Fabrizio Sestito <fabrizio.sestito@suse.com>

flavio merged commit bdda825 into kubewarden:main Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: remove worker pool #628

refactor: remove worker pool #628

Uh oh!

fabriziosestito commented Jan 10, 2024 •

edited

Loading

Uh oh!

jvanz Jan 16, 2024

Uh oh!

viccuad Jan 17, 2024 •

edited

Loading

Uh oh!

flavio Jan 17, 2024

Uh oh!

Uh oh!

Uh oh!

viccuad left a comment

Uh oh!

flavio Jan 17, 2024

Uh oh!

Uh oh!

Uh oh!

flavio commented Jan 17, 2024

Uh oh!

flavio commented Jan 18, 2024

Uh oh!

Uh oh!

refactor: remove worker pool #628

refactor: remove worker pool #628

Uh oh!

Conversation

fabriziosestito commented Jan 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test

Additional Information

Uh oh!

jvanz Jan 16, 2024

Choose a reason for hiding this comment

Uh oh!

viccuad Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flavio Jan 17, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

viccuad left a comment

Choose a reason for hiding this comment

Uh oh!

flavio Jan 17, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

flavio commented Jan 17, 2024

Uh oh!

flavio commented Jan 18, 2024

Uh oh!

Uh oh!

fabriziosestito commented Jan 10, 2024 •

edited

Loading

viccuad Jan 17, 2024 •

edited

Loading