Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BR: metrics rework #3176

Merged
merged 13 commits into from
Oct 1, 2019
Merged

BR: metrics rework #3176

merged 13 commits into from
Oct 1, 2019

Conversation

sgmonroy
Copy link
Contributor

@sgmonroy sgmonroy commented Sep 23, 2019

This change is Reviewable

@sgmonroy sgmonroy requested review from kormat, oncilla and karampok and removed request for oncilla September 23, 2019 07:51
Copy link
Contributor

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 25 of 25 files at r1.
Reviewable status: all files reviewed, 25 unresolved discussions (waiting on @kormat and @sgmonroy)


go/border/io.go, line 126 at r1 (raw file):

			rp.Raw = rp.Raw[:msg.N] // Set the length of the slice
			rp.Ingress.Dst = dst
			rp.Ingress.IfLabel = s.Label

where do you we use that?


go/border/router.go, line 188 at r1 (raw file):

		l.Result = metrics.ErrRoute
		metrics.Process.PktsWith(l).Inc()
	}

Do we need also here to increase metric with success?


go/border/ifstate/ifstate.go, line 62 at r1 (raw file):

type state struct {
	// info is a pointer to an Info object.
	// XXX(sgmonroy) why is this an unsafe pointer?

Is that left on purpose? Follow up story?


go/border/ifstate/ifstate.go, line 84 at r1 (raw file):

		SRevInfo:     srev,
		RawSRev:      rawSRev,
		ActiveMetric: metrics.Control.IFStateWith(label),

I cannot find where we use the ActiveMetric (apart from some lines below).


go/border/ifstate/ifstate.go, line 100 at r1 (raw file):

		var rawSRev common.RawBytes
		ifid := common.IFIDType(info.IfID)
		// XXX Why accept unsigned ifstate infos?

Left on purpose?


go/border/internal/metrics/ctrl.go, line 31 at r1 (raw file):

)

type ControlLabels struct {

To tested as in
https://github.com/scionproto/scion/blob/99f78f029d55c277b1551052a13136ecda8367e0/go/path_srv/internal/metrics/registration_test.go


go/border/internal/metrics/ctrl.go, line 35 at r1 (raw file):

	Type   string
	Src    string
	Dst    string

Result,Type,Src,Dst string is slightly more golang way


go/border/internal/metrics/ctrl.go, line 38 at r1 (raw file):

}

func (l *ControlLabels) Labels() []string {

no pointer


go/border/internal/metrics/ctrl.go, line 63 at r1 (raw file):

}

func (c *control) PktsWith(l ControlLabels) prometheus.Counter {

comments as usual


go/border/internal/metrics/input.go, line 39 at r1 (raw file):

	sub := "input"
	intf := IntfLabels{}
	l := intf.Labels()

one line?


go/border/internal/metrics/input.go, line 62 at r1 (raw file):

}

func (in *input) PktsWith(l IntfLabels) prometheus.Counter {

comments as usual


go/border/internal/metrics/metrics.go, line 30 at r1 (raw file):

// Result values.
const (

Best practice would be to document/comment all the exported vars


go/border/internal/metrics/metrics.go, line 53 at r1 (raw file):

)

type IntfLabels struct {

this can be tested as in https://github.com/scionproto/scion/blob/99f78f029d55c277b1551052a13136ecda8367e0/go/path_srv/internal/metrics/registration_test.go


go/border/internal/metrics/metrics.go, line 56 at r1 (raw file):

	Intf string
}

Comment is missing , see https://github.com/scionproto/scion/blob/master/doc/Metrics.md for the generic description


go/border/internal/metrics/metrics.go, line 57 at r1 (raw file):

}

func (l *IntfLabels) Labels() []string {

no pointer needed


go/border/internal/metrics/metrics.go, line 61 at r1 (raw file):

}

func (l *IntfLabels) Values() []string {

no pointer needed


go/border/internal/metrics/metrics.go, line 70 at r1 (raw file):

}

func IntfToLabel(ifid common.IFIDType) string {

If we use IFIDType.String() and do not have the default loc, would that be no desirable?
What loc means?


go/border/internal/metrics/output.go, line 38 at r1 (raw file):

	sub := "output"
	intf := IntfLabels{}
	l := intf.Labels()

IntfLabels{}.Labels() , with Labels a method on a value receiver?


go/border/internal/metrics/output.go, line 58 at r1 (raw file):

}

func (o *output) PktsWith(l IntfLabels) prometheus.Counter {

Generic comments


go/border/internal/metrics/process.go, line 35 at r1 (raw file):

	Out    string
}

Generic comments so linter will complain less,


go/border/rctrl/ctrl.go, line 90 at r1 (raw file):

			cl.Result = metrics.ErrRead
			metrics.Control.PktsWith(cl).Inc()
			fatal.Fatal(common.NewBasicError("Reading packet", err))

Confuses me the behavior here, why not just break? (irrelevant with your changes)


go/border/rctrl/ctrl.go, line 93 at r1 (raw file):

		}
		cl.Type = metrics.IFStateInfo
		cl.Result = metrics.Success

this can be initialized out side the for loop


go/border/rctrl/ifstate.go, line 44 at r1 (raw file):

// startup.
func ifStateUpdate() {
	if err := genIFStateReq(); err != nil {

Do we need to run once manually? the for range is not enough?


go/border/rctx/io.go, line 71 at r1 (raw file):

		Dir:    dir,
		Ifid:   ifid,
		Label:  metrics.IntfToLabel(ifid),

if you have already the ifid, do we still need to have the Label?
We could always jhave this value?


go/border/rpkt/route.go, line 65 at r1 (raw file):

	for _, epair := range rp.Egress {
		epair.S.Ring.Write(ringbuf.EntryList{&EgressRtrPkt{rp, epair.Dst}}, true)
		l.Out = epair.S.Label

Here, you could use the S.Ifid.String(), maybe :)

Copy link
Contributor Author

@sgmonroy sgmonroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 25 unresolved discussions (waiting on @karampok and @kormat)


go/border/io.go, line 126 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

where do you we use that?

go/border/rpkt/route.go#L59


go/border/router.go, line 188 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Do we need also here to increase metric with success?

go/border/rpkt/route.go#L66


go/border/ifstate/ifstate.go, line 62 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Is that left on purpose? Follow up story?

Done.


go/border/ifstate/ifstate.go, line 84 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

I cannot find where we use the ActiveMetric (apart from some lines below).

Done.


go/border/ifstate/ifstate.go, line 100 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Left on purpose?

Done.


go/border/internal/metrics/ctrl.go, line 31 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

To tested as in
https://github.com/scionproto/scion/blob/99f78f029d55c277b1551052a13136ecda8367e0/go/path_srv/internal/metrics/registration_test.go

Done.


go/border/internal/metrics/ctrl.go, line 35 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Result,Type,Src,Dst string is slightly more golang way

I added comments, so left on separated lines.


go/border/internal/metrics/ctrl.go, line 38 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

no pointer

Done.


go/border/internal/metrics/ctrl.go, line 63 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

comments as usual

Done.


go/border/internal/metrics/input.go, line 39 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

one line?

Done.


go/border/internal/metrics/input.go, line 62 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

comments as usual

Done.


go/border/internal/metrics/metrics.go, line 30 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Best practice would be to document/comment all the exported vars

Done.


go/border/internal/metrics/metrics.go, line 53 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

this can be tested as in https://github.com/scionproto/scion/blob/99f78f029d55c277b1551052a13136ecda8367e0/go/path_srv/internal/metrics/registration_test.go

Done.


go/border/internal/metrics/metrics.go, line 56 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Comment is missing , see https://github.com/scionproto/scion/blob/master/doc/Metrics.md for the generic description

Done


go/border/internal/metrics/metrics.go, line 57 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

no pointer needed

Done.


go/border/internal/metrics/metrics.go, line 61 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

no pointer needed

Done.


go/border/internal/metrics/metrics.go, line 70 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

If we use IFIDType.String() and do not have the default loc, would that be no desirable?
What loc means?

You need to ask ops team about that. Existing code tagged that interface with the 'loc' label.


go/border/internal/metrics/output.go, line 38 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

IntfLabels{}.Labels() , with Labels a method on a value receiver?

Done.


go/border/internal/metrics/output.go, line 58 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Generic comments

Done.


go/border/internal/metrics/process.go, line 35 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Generic comments so linter will complain less,

Done.


go/border/rctrl/ctrl.go, line 90 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Confuses me the behavior here, why not just break? (irrelevant with your changes)

Logic here is something went wrong and we are likely not able to recover, to quit and restart process.
This just preserves the current behavior. I suggest opening an issue about it if you think it should be changed.


go/border/rctrl/ctrl.go, line 93 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

this can be initialized out side the for loop

Not the result, it can be set to ErrParse in the below branch.


go/border/rctrl/ifstate.go, line 44 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Do we need to run once manually? the for range is not enough?

That was the original code, the tick doesn't trigger straight away.


go/border/rctx/io.go, line 71 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

if you have already the ifid, do we still need to have the Label?
We could always jhave this value?

you need to do fmt.Sprintf per packet just for the counter


go/border/rpkt/route.go, line 65 at r1 (raw file):

Previously, karampok (Konstantinos) wrote…

Here, you could use the S.Ifid.String(), maybe :)

see above

Copy link
Contributor

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 19 of 21 files at r2.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @kormat)


go/border/rctrl/ctrl.go, line 93 at r1 (raw file):

Previously, sgmonroy (Sergio Gonzalez Monroy) wrote…

Not the result, it can be set to ErrParse in the below branch.

OK

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 25 files at r1, 1 of 21 files at r2.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @sgmonroy)

a discussion (no related file):

border_control_pkts_total{dst="1-ff00:0:110,[BS M (0x8000)]:0 (UDP)",result="success",src="self",type="ifstate_request"} 3

Don't put host addresses into label values


a discussion (no related file):

Total number of output bytes received.

That should probably be "sent"


Copy link
Contributor Author

@sgmonroy sgmonroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @kormat)

a discussion (no related file):

Previously, kormat (Stephen Shirley) wrote…

Total number of output bytes received.

That should probably be "sent"

Done.


a discussion (no related file):

Previously, kormat (Stephen Shirley) wrote…
border_control_pkts_total{dst="1-ff00:0:110,[BS M (0x8000)]:0 (UDP)",result="success",src="self",type="ifstate_request"} 3

Don't put host addresses into label values

Done.


Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 21 files at r2.
Reviewable status: 18 of 26 files reviewed, 8 unresolved discussions (waiting on @karampok, @kormat, and @sgmonroy)

a discussion (no related file):
It occurs to me that it would be really useful to add the remote IA as a label to all input/output metrics. neigh_ia is the label name that we're using in #3167. For packets sent/received on the local socket, the label can be the empty string.



go/border/internal/metrics/metrics.go, line 48 at r2 (raw file):

)

const Self = "self"

I'd group this with Namespace above.


go/border/internal/metrics/metrics.go, line 59 at r2 (raw file):

type IntfLabels struct {
	// Itnf in the interface ID

"Intf is the interface ID"


go/border/internal/metrics/metrics.go, line 77 at r2 (raw file):

		return "loc"
	}
	return fmt.Sprintf("%d", ifid)

You can use ifid.String()


go/border/internal/metrics/output.go, line 40 at r2 (raw file):

	return output{
		pkts: prom.NewCounterVec(Namespace, sub,
			"pkts_total", "Total number of output packets received.", l),

Looking at this again, the wording can be simplified and improved: "Total number of packets sent" (and "Total number of packets received" for the input equivelent).


go/border/internal/metrics/output.go, line 51 at r2 (raw file):

		writeErrors: prom.NewCounterVec(Namespace, sub,
			"write_errors_total", "Total number of output socket write errors.", l),
		latency: prom.NewCounterVec(Namespace, sub,

I think this is a copy/paste error. Input has "latency" which measures how long packets wait in the kernel buffers before the BR reads them. Output has write duration; how long it takes to write packets back to the kernel. These are fundamentally different measurements.

(I do see that the old code calls the var OutputWriteLatency, which was most likely a copy/paste error on my side)

So:

  • Rename the var to duration
  • the metric to output write_duration_seconds_total
  • and the description to "Total time spent writing packets to the kernel, in seconds."

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r3.
Reviewable status: 19 of 26 files reviewed, 7 unresolved discussions (waiting on @karampok, @kormat, and @sgmonroy)

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r3.
Reviewable status: 20 of 26 files reviewed, 6 unresolved discussions (waiting on @karampok, @kormat, and @sgmonroy)

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 21 files at r2.
Reviewable status: 20 of 26 files reviewed, 6 unresolved discussions (waiting on @karampok, @kormat, and @sgmonroy)

Copy link
Contributor

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 8 files at r3.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @kormat and @sgmonroy)

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 21 files at r2, 1 of 8 files at r3.
Reviewable status: all files reviewed, 8 unresolved discussions (waiting on @sgmonroy)


go/border/internal/metrics/ctrl.go, line 49 at r3 (raw file):

type control struct {
	pkts    *prometheus.CounterVec

There should be separate metrics for messages sent and received. E.g.

sent_msgs_total{type="ifstate_req", result="success"}
received_msgs_total{type="ifstate_info", result="err_invalid"}

go/border/internal/metrics/process.go, line 23 at r3 (raw file):

)

const (

These 2 seem to be unused.


go/border/internal/metrics/process.go, line 60 at r3 (raw file):

			"pkts_total", "Total number of processed packets.", ProcessLabels{}.Labels()),
		time: prom.NewCounterVec(Namespace, sub,
			"time_seconds_total", "Total packet processing time.", IntfLabels{}.Labels()),

I think this should be duration instead of time.

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 25 files at r1, 1 of 21 files at r2, 5 of 8 files at r3.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @sgmonroy)


go/border/rctrl/ctrl.go, line 91 at r3 (raw file):

			fatal.Fatal(common.NewBasicError("Reading packet", err))
		}
		cl.Result = metrics.Success

I think it would be more natural to put this at the top of the loop.


go/border/rctrl/ifstate.go, line 81 at r3 (raw file):

	bsAddrs, err := rctx.Get().ResolveSVCMulti(addr.SvcBS)
	if err != nil {
		metrics.Control.Pkts(cl).Inc()

I think svc resolution failures should be a separate result type.

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 8 of 21 files at r2.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @sgmonroy)


go/border/io.go, line 74 at r3 (raw file):

	// Called when the packet's reference count hits 0.
	free := func(rp *rpkt.RtrPkt) {
		metrics.Process.Time(l).Add(time.Since(rp.TimeIn).Seconds())

Why did this change from a pre-computed metric?


go/border/ifstate/ifstate.go, line 59 at r3 (raw file):

type state struct {
	// info is a pointer to an Info object. Processing gorouting can update this value.

"goroutine"


go/border/rctrl/ifstate.go, line 55 at r3 (raw file):

// genIFStateReq generates an Interface State request packet to the local beacon service.
func genIFStateReq() error {

There should be a metric that increments every time this function runs. That way it can be used to compute error ratios etc.

Copy link
Contributor Author

@sgmonroy sgmonroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @kormat)

a discussion (no related file):

Previously, kormat (Stephen Shirley) wrote…

It occurs to me that it would be really useful to add the remote IA as a label to all input/output metrics. neigh_ia is the label name that we're using in #3167. For packets sent/received on the local socket, the label can be the empty string.

I don't understand how that is really useful when you already know the interface ID, neither in this case nor the case you are pointing me to in #3167.
Anyway, done.



go/border/io.go, line 74 at r3 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Why did this change from a pre-computed metric?

Done.


go/border/ifstate/ifstate.go, line 59 at r3 (raw file):

Previously, kormat (Stephen Shirley) wrote…

"goroutine"

Done.


go/border/internal/metrics/ctrl.go, line 49 at r3 (raw file):

Previously, kormat (Stephen Shirley) wrote…

There should be separate metrics for messages sent and received. E.g.

sent_msgs_total{type="ifstate_req", result="success"}
received_msgs_total{type="ifstate_info", result="err_invalid"}

Done.


go/border/internal/metrics/metrics.go, line 59 at r2 (raw file):

Previously, kormat (Stephen Shirley) wrote…

"Intf is the interface ID"

Done.


go/border/internal/metrics/metrics.go, line 77 at r2 (raw file):

Previously, kormat (Stephen Shirley) wrote…

You can use ifid.String()

Done.


go/border/internal/metrics/output.go, line 40 at r2 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Looking at this again, the wording can be simplified and improved: "Total number of packets sent" (and "Total number of packets received" for the input equivelent).

Done.


go/border/internal/metrics/output.go, line 51 at r2 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I think this is a copy/paste error. Input has "latency" which measures how long packets wait in the kernel buffers before the BR reads them. Output has write duration; how long it takes to write packets back to the kernel. These are fundamentally different measurements.

(I do see that the old code calls the var OutputWriteLatency, which was most likely a copy/paste error on my side)

So:

  • Rename the var to duration
  • the metric to output write_duration_seconds_total
  • and the description to "Total time spent writing packets to the kernel, in seconds."

Done.


go/border/internal/metrics/process.go, line 23 at r3 (raw file):

Previously, kormat (Stephen Shirley) wrote…

These 2 seem to be unused.

Removed Ctrl, and now using Drop.


go/border/internal/metrics/process.go, line 60 at r3 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I think this should be duration instead of time.

Done.


go/border/rctrl/ctrl.go, line 91 at r3 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I think it would be more natural to put this at the top of the loop.

Done.


go/border/rctrl/ifstate.go, line 55 at r3 (raw file):

Previously, kormat (Stephen Shirley) wrote…

There should be a metric that increments every time this function runs. That way it can be used to compute error ratios etc.

Done.


go/border/rctrl/ifstate.go, line 81 at r3 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I think svc resolution failures should be a separate result type.

Done.

Copy link
Contributor

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r4.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @kormat)

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r4.
Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @sgmonroy)

a discussion (no related file):

Previously, sgmonroy (Sergio Gonzalez Monroy) wrote…

I don't understand how that is really useful when you already know the interface ID, neither in this case nor the case you are pointing me to in #3167.
Anyway, done.

If you're looking at some dashboards for a random AS, the interface IDs will be meaningless to you. Having the neighbouring AS in the metric labels will make it much easier to get an overview of what's going on.

It looks like the ctrl part isn't setting the neigh_ia label btw:

# HELP border_control_interface_active Interface is active.
# TYPE border_control_interface_active gauge
border_control_interface_active{intf="1",neigh_ia=""} 1
border_control_interface_active{intf="2",neigh_ia=""} 1


go/border/internal/metrics/ctrl.go, line 55 at r4 (raw file):

func newControl() control {
	sub := "control"

Suggestion: ctrl


go/border/internal/metrics/metrics.go, line 59 at r2 (raw file):

Previously, sgmonroy (Sergio Gonzalez Monroy) wrote…

Done.

"Itnf" -> "Intf"


go/border/internal/metrics/metrics.go, line 24 at r4 (raw file):

)

const Namespace = "border"

Let's change to br, to fit with the other services.


go/border/rctrl/ctrl.go, line 1 at r4 (raw file):

// Copyright 2018 ETH Zurich, Anapaya Systems

Ctrl Metrics:

  • low-level:
    • br_ctrl_reads_total{result="ok_success|err_read"}
    • br_ctrl_process_errors{type="err_parse|err_invalid_request"} (err_invalid_request for both "Unsupported control payload" and "Unsupported PathMgmt payload")
  • ifstate:
    • br_ctrl_received_ifstateinfo_total{result="ok_success|err_process"}
    • br_ctrl_sent_ifstatereqs_total{result="ok_success|err_process|err_resolve_svc"}
  • rev forwarding:
    • br_ctrl_read_revinfos_total{result="ok_success|err_parse"}
    • br_ctrl_sent_revinfos_total{result="ok_success|err_process|err_resolve_svc", svc="bs|ps"}

go/border/rctrl/ifstate.go, line 47 at r4 (raw file):

		logger.Error(err.Error(), nil)
	}
	metrics.Control.IFStateTick().Inc()

It would be simpler to put this at the top of genIFStateReq


go/border/rctrl/revinfo.go, line 76 at r4 (raw file):

	dst.NextHop, err = ctx.ResolveSVCAny(dstHost)
	if err != nil {
		metrics.Control.SentMsgs(cl).Inc()

Missing a cl.Result = metrics.ErrResolveSVC

Copy link
Contributor Author

@sgmonroy sgmonroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @kormat)

a discussion (no related file):

Previously, kormat (Stephen Shirley) wrote…

If you're looking at some dashboards for a random AS, the interface IDs will be meaningless to you. Having the neighbouring AS in the metric labels will make it much easier to get an overview of what's going on.

It looks like the ctrl part isn't setting the neigh_ia label btw:

# HELP border_control_interface_active Interface is active.
# TYPE border_control_interface_active gauge
border_control_interface_active{intf="1",neigh_ia=""} 1
border_control_interface_active{intf="2",neigh_ia=""} 1

Done.



go/border/internal/metrics/ctrl.go, line 55 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Suggestion: ctrl

Done.


go/border/internal/metrics/metrics.go, line 24 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Let's change to br, to fit with the other services.

Done.


go/border/rctrl/ctrl.go, line 1 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Ctrl Metrics:

  • low-level:
    • br_ctrl_reads_total{result="ok_success|err_read"}
    • br_ctrl_process_errors{type="err_parse|err_invalid_request"} (err_invalid_request for both "Unsupported control payload" and "Unsupported PathMgmt payload")
  • ifstate:
    • br_ctrl_received_ifstateinfo_total{result="ok_success|err_process"}
    • br_ctrl_sent_ifstatereqs_total{result="ok_success|err_process|err_resolve_svc"}
  • rev forwarding:
    • br_ctrl_read_revinfos_total{result="ok_success|err_parse"}
    • br_ctrl_sent_revinfos_total{result="ok_success|err_process|err_resolve_svc", svc="bs|ps"}

Done.


go/border/rctrl/ifstate.go, line 47 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

It would be simpler to put this at the top of genIFStateReq

Done.


go/border/rctrl/revinfo.go, line 76 at r4 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Missing a cl.Result = metrics.ErrResolveSVC

Done.

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 10 files at r5, 1 of 1 files at r6.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @sgmonroy)


go/border/internal/metrics/metrics.go, line 37 at r6 (raw file):

	ErrWrite = "err_write"
	// ErrValidate is an error validating the packet.
	ErrValidate = "err_validate"

Some new values got added to go/lib/prom since you created this pr. ErrValidate can be reused from there


go/border/internal/metrics/metrics.go, line 40 at r6 (raw file):

	// ErrValidate is an error routing the packet.
	ErrRoute = "err_route"
	// ErrProcessLocal is an error on local processing the packet, ie. SVC resolution.

Hmm. It looks like this is only used for the result of NeedsLocalProcessing, which can't actually return an error at the moment. I'm inclined to say drop this value, and make NeedsLocalProcessing not return anything.


go/border/internal/metrics/metrics.go, line 47 at r6 (raw file):

	ErrResolveSVC = "err_resolve_svc"
	// ErrInvalidRequest is an error for unsupported control message.
	ErrInvalidRequest = "err_invalid_request"

prom.ErrInvalidReq


go/border/rpkt/path.go, line 126 at r6 (raw file):

		// If the BR does not have a revocation for the current epoch, it considers
		// the interface as active until it receives a new revocation.
		intf, ok := rp.Ctx.Conf.BR.IFs[*ifid]

I think this is already covered by the if _, ok := rp.Ctx.Conf.Topo.IFInfoMap[*ifid]; !ok { check at the top of this method.

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @sgmonroy)


go/border/rctrl/revinfo.go, line 55 at r6 (raw file):

	cl := metrics.SentRevInfoLabels{
		Result: metrics.ErrProcess,
		SVC:    dstHost.String(),

dstHost.BaseString() would give a better metric value (you get plain BS, PS, rather than the full BS A (0x0000))

Copy link
Contributor Author

@sgmonroy sgmonroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 21 of 28 files reviewed, 5 unresolved discussions (waiting on @karampok, @kormat, and @sgmonroy)


go/border/internal/metrics/metrics.go, line 37 at r6 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Some new values got added to go/lib/prom since you created this pr. ErrValidate can be reused from there

Done.


go/border/internal/metrics/metrics.go, line 40 at r6 (raw file):

Previously, kormat (Stephen Shirley) wrote…

Hmm. It looks like this is only used for the result of NeedsLocalProcessing, which can't actually return an error at the moment. I'm inclined to say drop this value, and make NeedsLocalProcessing not return anything.

Done.


go/border/internal/metrics/metrics.go, line 47 at r6 (raw file):

Previously, kormat (Stephen Shirley) wrote…

prom.ErrInvalidReq

Done.


go/border/rctrl/revinfo.go, line 55 at r6 (raw file):

Previously, kormat (Stephen Shirley) wrote…

dstHost.BaseString() would give a better metric value (you get plain BS, PS, rather than the full BS A (0x0000))

Done.


go/border/rpkt/path.go, line 126 at r6 (raw file):

Previously, kormat (Stephen Shirley) wrote…

I think this is already covered by the if _, ok := rp.Ctx.Conf.Topo.IFInfoMap[*ifid]; !ok { check at the top of this method.

Done

Copy link
Contributor

@kormat kormat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r7.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

@sgmonroy sgmonroy merged commit e5e37f0 into scionproto:master Oct 1, 2019
@sgmonroy sgmonroy deleted the pub-br-metrics branch October 1, 2019 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants