-
Notifications
You must be signed in to change notification settings - Fork 29
feat: Use modern metrics #948
Conversation
Codecov Report
@@ Coverage Diff @@
## master #948 +/- ##
======================================
Coverage 100% 100%
======================================
Files 57 57
Lines 9605 9761 +156
======================================
+ Hits 9605 9761 +156
Continue to review full report at Codecov.
|
autopush/websocket.py
Outdated
@@ -837,9 +820,10 @@ def _verify_user_record(self): | |||
self.log.debug(format="Dropping User", code=104, | |||
uaid_hash=self.ps.uaid_hash, | |||
uaid_record=dump_uaid(record)) | |||
tags = list('code:104') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str's iterable: you definitely want ['code:104'] instead
autopush/web/base.py
Outdated
@@ -283,12 +284,26 @@ def _router_fail_err(self, fail): | |||
self.log.debug(format="Success", status_code=exc.status_code, | |||
logged_status=exc.logged_status or 0, | |||
client_info=self._client_info) | |||
success = True | |||
self.metrics.increment('notification.message.success', | |||
tags=dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datadog expects tags as a list of strs, not dicts
..looks like we already had this mistake on 4 drop_user metrics -- causing them to never register into datadog AFAICT (yet kibana received the associated "Dropping User" logs)
web/webpush.py
119- uaid_record=dump_uaid(result))
120: metrics.increment("updates.drop_user", tags={"errno": 102})
--
129- uaid_record=dump_uaid(result))
130: metrics.increment("updates.drop_user", tags={"errno": 103})
websocket.py
841- self.metrics.increment("client.drop_user",
842: tags={"errno": 104})
--
855- self.metrics.increment("client.drop_user",
856: tags={"errno": 105})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought these were being normalized. Will correct the others as well.
autopush/web/base.py
Outdated
tags=dict( | ||
destination="Direct", | ||
router=router_type, | ||
vapid=(vapid is True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vapid's a dict so this is always False
autopush/websocket.py
Outdated
tags=self.base_tags) | ||
self.metrics.gauge('ua.message_data', len(msg.get('data', '')), | ||
tags={"source": msg.pop('source', 'Direct')}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you want notif.source vs msg.pop('source'). this could use a test
autopush/tests/test_web_base.py
Outdated
@@ -57,8 +58,11 @@ def setUp(self): | |||
headers={"ttl": "0"}, | |||
host='example.com:8080') | |||
|
|||
mock_db = Mock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a test_db() in support you can also use, but it's not important here/this is fine
autopush/tests/test_websocket.py
Outdated
yield self._wait_for(lambda: len(self.metrics.mock_calls) > 0) | ||
eq_(self.metrics.timing.call_args[0][0], 'ua.connection.lifespan') | ||
# Wait for final cleanup (no error or metric produced) | ||
yield sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these sleeps really necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, yes. Otherwise we get coverage issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you keep the older _wait_for on more metrics emitted + the client.notify_uaid_failure check that should do it instead (and a little quicker than a full sleep(1) -- coverage is ultimately waiting on that client.notify_uaid_failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, this might be another bit that got messed up in the patch I had to do to catch up. And looks like client.notify_uaid_failure
isn't listed in the table. The metric had previously been removed.
@@ -61,9 +61,15 @@ def amend_endpoint_response(self, response, router_data): | |||
"""Stubbed out for this router""" | |||
|
|||
def stored_response(self, notification): | |||
self.metrics.gauge("notification.message_data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these would be simplepush only as webpush's router overrides these methods.
Would this stat be interesting on external bridges too? If so it might make sense for the WebHandlers to be emitting these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I wanted to get these calls about as close to the point that the action occurs as possible. Everything goes through route_notification
, and that's really the only time we can say with authority what the disposition is & have the data size.
I'll add the calls to the other methods as well. (Although bridge would pretty much only be "Direct")
... and just realized there's a bunch of metrics we missed for the audit.
8d10be0
to
46c3489
Compare
autopush/web/base.py
Outdated
tags=dict( | ||
code=exc.status_code, | ||
router=router_type, | ||
vapid=(vapid is True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot this one and one in webpush
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handled in subsequent update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still here
autopush/tests/test_websocket.py
Outdated
yield self._wait_for(lambda: len(self.metrics.mock_calls) > 0) | ||
eq_(self.metrics.timing.call_args[0][0], 'ua.connection.lifespan') | ||
# Wait for final cleanup (no error or metric produced) | ||
yield sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you keep the older _wait_for on more metrics emitted + the client.notify_uaid_failure check that should do it instead (and a little quicker than a full sleep(1) -- coverage is ultimately waiting on that client.notify_uaid_failure
autopush/websocket.py
Outdated
@@ -852,8 +836,9 @@ def _verify_user_record(self): | |||
uaid_record=dump_uaid(record)) | |||
self.force_retry(self.db.router.drop_user, | |||
self.ps.uaid) | |||
self.metrics.increment("client.drop_user", | |||
tags={"errno": 105}) | |||
tags = list('code:105') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forgot ['code:105'] here
autopush/metrics.py
Outdated
self._client.increment(self._prefix_name(name), count, host=self._host, | ||
**kwargs) | ||
|
||
def gauge(self, name, count, **kwargs): | ||
if 'tags' in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitnitpick: tags=None explicitly in the func prototype is a lil cleaner
Reduced the sleeps to .25s. Granted, it's guesswork as to the optimal time required, but I'll admit that 1s is overkill. |
autopush/router/apnsrouter.py
Outdated
@@ -144,8 +144,10 @@ def _route(self, notification, router_data): | |||
apns_client.send(router_token=router_token, payload=payload, | |||
apns_id=apns_id) | |||
except (ConnectionError, AttributeError) as ex: | |||
self.metrics.increment("updates.client.bridge.apns.connection_err", | |||
self._base_tags) | |||
self._base_tags.extend(['application:{}'.format(rel_channel), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't want to append to _base_tags, there's only one router instance per lifetime of the app (we could change routers' _base_tags to be class variables like cors_methods/response_handlers are in the Handlers, to make this explicit)
autopush/web/base.py
Outdated
tags=dict( | ||
code=exc.status_code, | ||
router=router_type, | ||
vapid=(vapid is True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
web.webpush is also still emitting updates.drop_user which should be killed
autopush/metrics.py
Outdated
@@ -89,19 +89,32 @@ def __init__(self, api_key, app_key, hostname, flush_interval=10, | |||
def _prefix_name(self, name): | |||
return "%s.%s" % (self._namespace, name) | |||
|
|||
def _normalize_tags(self, tags): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill all this now that we'll be consistent w/ always passing lists of strs
autopush/router/apnsrouter.py
Outdated
self.metrics.increment( | ||
"updates.client.bridge.apns.%s.sent" % | ||
router_data["rel_channel"], | ||
self._base_tags | ||
) | ||
self.metrics.gauge("notification.message_data", | ||
len(notification.data or ""), | ||
tags={'destination': 'Direct'}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tags=['destination:Direct'] -- there's about 8 of these to fix in this diff
autopush/web/webpush.py
Outdated
self.metrics.timing("notification.request_time", | ||
duration=time_diff) | ||
self.metrics.increment('notification.message.success', | ||
tags=dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert to list too
autopush/web/base.py
Outdated
tags=[ | ||
"code:{}".format(exc.status_code), | ||
"router:{}".format(router_type), | ||
"vapid:{}".format(vapid is True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's still 2 bogus "vapid is True" in this patch: here and in webpush _router_completed. actually, both of these methods don't even need the full vapid dict. they just need a bool flag
webpush.py can be repsonsible for "vapid = jwt is not None" and pass that along to them
dest = 'Stored' | ||
self.metrics.timing("notification.request_time", | ||
duration=time_diff) | ||
self.metrics.increment('notification.message.success', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just occurred to me that web.simplepush's missing this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
Went ahead and created the which means closing #950 |
autopush/metrics.py
Outdated
@@ -74,6 +74,14 @@ def timing(self, name, duration, **kwargs): | |||
self._metric.timing(name, duration) | |||
|
|||
|
|||
def make_tags(base=[], **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although this is a safe use of a list as a default kwarg, might just wanna use base=None instead (kinda surprised flake8 didn't complain, I think PyCharm will)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, neither did. But, yeah, I keep forgetting about not doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use make_tags in a couple more spots I suppose but this'll work
@@ -678,7 +667,6 @@ def get_uaid(self, uaid): | |||
# We unfortunately have to catch this here, as track_provisioned | |||
# will not see this, since JSONResponseError is a subclass and | |||
# will capture it | |||
self.metrics.increment("error.provisioned.get_uaid") | |||
raise | |||
except JSONResponseError: # pragma: nocover | |||
# We trap JSONResponseError because Moto returns text instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should stop using moto in the future, then we can remove this entirely since it only really applies to testing.
autopush/router/simple.py
Outdated
return RouterResponse(202, "Notification Stored") | ||
|
||
def delivered_response(self, notification): | ||
self.metrics.gauge("notification.message_data", | ||
len(notification.data or ""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of wonder if maybe notification should have a property data_length
that does this.
autopush/web/base.py
Outdated
success = True | ||
self.metrics.increment('notification.message.success', | ||
tags=[ | ||
'destination:"Direct"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Value has double quotes around it but none of the others do?
I think only the last comment really needs to be addressed before merge, as we should be consistent in value quoting for tags. |
Closes #943
(Metric updates from https://docs.google.com/spreadsheets/d/18RL-Brz-Q-jR-iClNPCwDMVPYS0P2CvkHSJ2WZK-ceA/edit#gid=0)