Skip to content
This repository has been archived by the owner on Feb 15, 2023. It is now read-only.

NETOBSERV-122: Health stats reporter #19

Merged
merged 1 commit into from
Jan 31, 2022
Merged

NETOBSERV-122: Health stats reporter #19

merged 1 commit into from
Jan 31, 2022

Conversation

mariomac
Copy link
Contributor

@mariomac mariomac commented Jan 12, 2022

Status report based on Eclipse Microprofile 2.1.

Implements /health/ready and /health/live endpoints, returning something like:

{
    "checks": [
        {
            "data": {
                "host": "goflow-kube-54485b9d94-s695x"
            },
            "name": "flows",
            "status": "UP"
        }
    ],
    "status": "UP"
}

It also implements a /metrics endpoint to know the exact information about flows processed, templates received, etc...:

# HELP flow_process_nf_count NetFlows processed.
# TYPE flow_process_nf_count counter
flow_process_nf_count{router="100.64.0.2",version="10"} 229
flow_process_nf_count{router="100.64.0.3",version="10"} 22
flow_process_nf_count{router="100.64.0.4",version="10"} 39
flow_process_nf_count{router="100.64.0.5",version="10"} 103
# HELP flow_process_nf_delay_summary_seconds NetFlows time difference between time of flow and processing.
# TYPE flow_process_nf_delay_summary_seconds summary
flow_process_nf_delay_summary_seconds{router="100.64.0.2",version="10",quantile="0.5"} 59
flow_process_nf_delay_summary_seconds{router="100.64.0.2",version="10",quantile="0.9"} 60
flow_process_nf_delay_summary_seconds{router="100.64.0.2",version="10",quantile="0.99"} 1.8446744073709552e+19
flow_process_nf_delay_summary_seconds_sum{router="100.64.0.2",version="10"} 2.9514790517935283e+20
(....) etc

@mariomac mariomac changed the title WIP: Health stats reporter Health stats reporter Jan 14, 2022
@mariomac mariomac changed the title Health stats reporter NETOBSERV-122: Health stats reporter Jan 14, 2022
Comment on lines 106 to 110
if len(sw.boxes) == 0 ||
clock().Sub(sw.boxes[len(sw.boxes)-1].start) >= boxLength {
sw.pushBox()
}
return sw.boxes[len(sw.boxes)-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's smart !

But does it mean if there is no flows coming for a time (typically hanging at r.format.Next()) you will not push boxes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, but it should not matter since in the stats aggregation we just count the boxes whose start time is less than 5 minutes.

type MetricSet struct {
// ReceivedFlows is the total number of received flows, including both those
// that have been successfully processed and those that haven't
ReceivedFlows uint64 `json:"received_flows"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we said dromedary :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch! Actually this struct is not converted to JSON anymore. Removing the json tags

@jotak
Copy link
Member

jotak commented Jan 14, 2022

I haven't checked in deep, but does it capture when flows are discarded due to not having the definitions? Maybe it would requires some change directly in goflow, as I'm not sure if we are notified about that at our level.

@mariomac
Copy link
Contributor Author

@jotak mmm... good point. I was assuming that the goflow library was returning error on that situation. I'll double-check

@jotak
Copy link
Member

jotak commented Jan 20, 2022

@mariomac we also need a PR in the operator, to have goflow-kube service expose these endpoint, right?
never mind, i've just seen it :)

@jotak
Copy link
Member

jotak commented Jan 24, 2022

FYI I created https://issues.redhat.com/browse/NETOBSERV-168 as a follow-up, to provide the full status as first intended

@jotak
Copy link
Member

jotak commented Jan 31, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 31, 2022
@jotak jotak added the no-qe This PR doesn't necessitate QE approval label Jan 31, 2022
@openshift-ci
Copy link

openshift-ci bot commented Jan 31, 2022

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 52e7caf into netobserv:main Jan 31, 2022
@mariomac mariomac deleted the health branch January 31, 2022 11:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved lgtm no-qe This PR doesn't necessitate QE approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants