Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

examples: fixed fault-injection delay demo #11715

Merged
merged 2 commits into from
Jun 25, 2020

Conversation

cpakulski
Copy link
Contributor

Description:
fault injection example did not work for delay option. After gdb debugging it was discovered that fault injection config requires delay part to be present. Without delay part Envoy assumes that Delay is disabled and does not scan RTDS for delay config updates.

Risk Level:
Low:

Testing:
Did manual testing as described in https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/fault_injection

Docs Changes:
No

Release Notes:
No

Fixes: #11095

Without that part Envoy assumes that delay is disabled.

Changed Envoy docker file to run as root: required for access to stdout.

Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
snowp
snowp previously requested changes Jun 23, 2020
Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just one comment

@@ -13,6 +13,8 @@ services:
ports:
- 9211:9211
- 9901:9901
environment:
ENVOY_UID: 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW #11523

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was necessary to give Envoy permission to access /dev/stdout. Without that setting Envoy container exists complaining that it cannot write logs to /dev/stdout (specified in envoy.yaml as sink for logs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, bonus points for commenting that - I ran into that trying to repro and it'd be lovely if a local grep for stdout turned up the fix :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment to explain why Envoy must run as root.

@@ -36,6 +36,11 @@ static_resources:
percentage:
numerator: 0
denominator: HUNDRED
delay:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great find, thanks for tracking it down!

I do wonder if we should tweak the fault filter to delay / abort if there's a runtime override even if the proto doesn't include the config. Even though it's pretty clearly documented, I can imagine it confusing folks.
cc @mattklein123

either way it shouldn't block this PR, more musing if we should file a tracking issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recall the history of why this is implemented the way it is. I agree it's potentially confusing. I think the idea is we want the user to opt-in explicitly to the config they want but I don't feel super strongly about it. @Augustyniak any thoughts here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think explicitness here is good since we support injecting faults via different methods including HTTP headers and we want these faults to be injected only if our Envoy is explicitly configured to accept them.

  • Without Envoy requiring people to opt-in with this config, it may be confusing as to what kind of faults a given Envoy may potentially inject - the ones controlled via runtime or via HTTP headers. With config in, it becomes clear what’s supported.
  • If we decide that we don’t want to require a config for faults controlled via runtime we should probably do the same for faults controlled with HTTP headers which is risky because of security implications.

For all of the aforementioned reasons, I would leave the presence of the config as the required step of FaultFilter’s setup.

For reference, an example of a config that supports delay faults via HTTP headers:

delay:
  header_delay: {}
  percentage:
  numerator: 100

Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
@cpakulski cpakulski requested a review from snowp June 23, 2020 18:56
@cpakulski
Copy link
Contributor Author

@alyssawilk I think this is ready for merge.

@alyssawilk alyssawilk assigned alyssawilk and unassigned snowp Jun 25, 2020
@alyssawilk alyssawilk merged commit d0b4b4f into envoyproxy:master Jun 25, 2020
chaoqin-li1123 pushed a commit to chaoqin-li1123/envoy that referenced this pull request Jun 25, 2020
fault injection example did not work for delay option. After gdb debugging it was discovered that fault injection config requires delay part to be present. Without delay part Envoy assumes that Delay is disabled and does not scan RTDS for delay config updates.

Risk Level: Low:
Testing: Did manual testing as described in https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/fault_injection
Docs Changes: No
Release Notes: No
Fixes: envoyproxy#11095
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: chaoqinli <chaoqinli@google.com>
songhu pushed a commit to songhu/envoy that referenced this pull request Jun 25, 2020
fault injection example did not work for delay option. After gdb debugging it was discovered that fault injection config requires delay part to be present. Without delay part Envoy assumes that Delay is disabled and does not scan RTDS for delay config updates.

Risk Level: Low:
Testing: Did manual testing as described in https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/fault_injection
Docs Changes: No
Release Notes: No
Fixes: envoyproxy#11095
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
@cpakulski cpakulski deleted the issue/11095 branch July 2, 2020 18:45
yashwant121 pushed a commit to yashwant121/envoy that referenced this pull request Jul 24, 2020
fault injection example did not work for delay option. After gdb debugging it was discovered that fault injection config requires delay part to be present. Without delay part Envoy assumes that Delay is disabled and does not scan RTDS for delay config updates.

Risk Level: Low:
Testing: Did manual testing as described in https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/fault_injection
Docs Changes: No
Release Notes: No
Fixes: envoyproxy#11095
Signed-off-by: Christoph Pakulski <christoph@tetrate.io>
Signed-off-by: yashwant121 <yadavyashwant36@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fixed_duration_ms is not working in Fault Injection.
6 participants