Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics: Add server for Egress IP/firewall #358

Merged

Conversation

martinkennelly
Copy link
Contributor

@martinkennelly martinkennelly commented Oct 9, 2021

Add an insecure HTTP metrics server that serves
metrics on loopback port 29100.

Add metrics to count egress firewalls,
egress firewall rules and egress IPs count.

In order to validate the metrics:

Metric num_egress_ips:

  1. Configure EgressIP: https://docs.openshift.com/container-platform/4.9/networking/openshift_sdn/assigning-egress-ips.html
  2. Exec into SDN pod that is the current LE leader. You can see whos the leader in a config map in the SDN namespace.
  3. curl localhost:29100/metrics | grep num_egress_ips
  4. You should see an integer from step 3 which represents the total number of egress IPs. This should be the same as the amount you created in step 1.

Metric num_egress_firewall_rules:

  1. Configure EgressFirewall: https://docs.openshift.com/container-platform/4.9/networking/openshift_sdn/configuring-egress-firewall.html
  2. Exec into SDN pod that is the current LE leader. You can see whos the leader in a config map in the SDN namespace.
  3. curl localhost:29100/metrics | grep num_egress_firewall_rules
  4. You should see an integer from step 3 which represents the total number of EgressFirewall egress rules. This should be the same as the amount you created in step 1.

Metric num_egress_firewalls:

    1. Configure X number EgressFirewall: https://docs.openshift.com/container-platform/4.9/networking/openshift_sdn/configuring-egress-firewall.html
  1. Exec into SDN pod that is the current LE leader. You can see whos the leader in a config map in the SDN namespace.
  2. curl localhost:29100/metrics | grep num_egress_firewalls
  3. You should see an integer from step 3 which represents the total number of EgressFirewalls. This should be the same as the amount you created in step 1.

Signed-off-by: Martin Kennelly mkennell@redhat.com

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 9, 2021
@martinkennelly martinkennelly changed the title [WIP] Metrics: Add server for Egress IP/firewall Metrics: Add server for Egress IP/firewall Oct 11, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 11, 2021
@martinkennelly
Copy link
Contributor Author

Forced pushed to group cache + mutex together in the same struct.

const (
shutdownTimeout = time.Millisecond * 50
endpoint = "/metrics"
defaultBindAddress = "127.0.0.1:29102"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was weighing up whether to bind to loopback or not. I went with loopback because it's an insecure HTTP server and I want to protect this server against exposure by mistake. This server will be exposed via RBAC proxy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, loopback+rbac proxy is correct.
When this merges you also need to update dev-guide/host-port-registry.md in openshift/enhancements to reflect the fact that this port is now used by openshift-sdn-controller.

@martinkennelly
Copy link
Contributor Author

/assign @danwinship
Feel free to reassign/unassign. I notice you are the most active on here..
No rush on review.

const (
shutdownTimeout = time.Millisecond * 50
endpoint = "/metrics"
defaultBindAddress = "127.0.0.1:29102"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, loopback+rbac proxy is correct.
When this merges you also need to update dev-guide/host-port-registry.md in openshift/enhancements to reflect the fact that this port is now used by openshift-sdn-controller.

@martinkennelly martinkennelly force-pushed the add_metrics_server branch 2 times, most recently from 6f1cd9d to cfa0235 Compare October 13, 2021 10:29
@martinkennelly
Copy link
Contributor Author

@danwinship The first iteration was poor. It wasn't a good idea to have a metric for egress firewall per namespace. I got rid of that and just keep a total count for egress firewall rules.

If this merges, I will update the port number reservation in dev-guide/host-port-registry.md in repo openshift/enhancements.
Thank you for the informer education.

@martinkennelly
Copy link
Contributor Author

Forced pushed to change default port from 29102 -> 29100 so as not to possibly conflict with ovn-k port.

@martinkennelly
Copy link
Contributor Author

/retest

1 similar comment
@martinkennelly
Copy link
Contributor Author

/retest

@martinkennelly
Copy link
Contributor Author

/retest
CI flaking. Submitted BZ for flake: https://bugzilla.redhat.com/show_bug.cgi?id=2019001

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
general idea is good, needs a few tweaks, but then anyone can /lgtm it after that

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 1, 2021
@martinkennelly martinkennelly changed the title Metrics: Add server for Egress IP/firewall WIP: Metrics: Add server for Egress IP/firewall Nov 2, 2021
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2021
@martinkennelly martinkennelly force-pushed the add_metrics_server branch 2 times, most recently from 6ae2f3b to 1bc06d7 Compare November 3, 2021 11:25
Add an insecure HTTP metrics server that serves
metrics on loopback port 29100.

Add metrics to count egress firewalls,
egress firewall rules and egress IPs count.

Signed-off-by: Martin Kennelly <mkennell@redhat.com>
@martinkennelly
Copy link
Contributor Author

@danwinship I made the changes you requested - ENP metric, renamed file, and unit tests.
During my testing, I decided to change the definition of num_egress_ips to count egress IPs assigned to nodes [1]. This is better than counting all because it lets us know how many egress IPs are currently active on a cluster. I updated the HELP field to inform consumers of this distinction.

I also noticed an error I made in the original code you reviewed. I made the mistake of checking if interface{} was nil for func parameter old in handleAddUpdate. I now check the event type to find out if its an add or update operation [2].

[1] https://github.com/openshift/sdn/pull/358/files#diff-cb8e1be9489f2925e33943b9051dded85e22f36cb5b9905ebeb807f820cb9627R700

[2] https://github.com/openshift/sdn/pull/358/files#diff-2153a20dc3cd4885ce75eac5c9e4a51f6343d6be8b0d154be9973c1e66bad0a0R30

@martinkennelly martinkennelly changed the title WIP: Metrics: Add server for Egress IP/firewall Metrics: Add server for Egress IP/firewall Nov 3, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2021
@martinkennelly
Copy link
Contributor Author

/retest
CI flakes. Added BZ for one of them.

Copy link
Contributor

@alexanderConstantinescu alexanderConstantinescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really an lgtm, I just had a small nit that I wanted to give you time to decide what to do about (and if I put a real lgtm, the bot will merge the PR). Let me know

var metricEgressIPCount = prometheus.NewGauge(prometheus.GaugeOpts{
Namespace: metricSDNNamespace,
Subsystem: metricSDNSubsystemController,
Name: "num_egress_ips",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small nit (you can discard it and slightest hesitation): call it num_assigned_egress_ips as to not confuse it with the number of egress IP objects

Copy link
Contributor Author

@martinkennelly martinkennelly Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Id love to change it to what you suggested but I want to keep this name consistent with OVN-K and that PR is merged. Its not worth it for me to rename it now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood then :)

@alexanderConstantinescu
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 10, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 10, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexanderConstantinescu, danwinship, martinkennelly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@martinkennelly martinkennelly changed the title Metrics: Add server for Egress IP/firewall WIP: Metrics: Add server for Egress IP/firewall Nov 10, 2021
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 10, 2021
@openshift-merge-robot openshift-merge-robot merged commit a22d08c into openshift:master Nov 10, 2021
@martinkennelly martinkennelly changed the title WIP: Metrics: Add server for Egress IP/firewall Metrics: Add server for Egress IP/firewall Nov 10, 2021
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants