-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
feat: Initial monitors implementation #11602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dcramer
commented
Jan 18, 2019
- Add support for schedule-based push monitors
- And endpoint for creating and updating checkins (DSN-based auth supported)
|
If anyone has thoughts on the current implementation its almost in a usable state. The main missing components are:
|
a5539bb to
4cb4d59
Compare
|
|
||
| def post(self, request, project, monitor): | ||
| """ | ||
| Create a new check-in for a monitor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking that users would make an API request directly, or would monitor creation be part of the SDKs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will happen via the UI or API
| organization_id=project.organization_id, | ||
| monitor_id=monitor.id, | ||
| duration=result.get('duration'), | ||
| status=getattr(CheckInStatus, result['status']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do monitors need to have release or environment context? Having releases/environments would allow users to track when jobs broke around deploys and whether or not they are just missing in staging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release won't work for this kind of thing. We could probably allow environment.
| default=MonitorType.UNKNOWN, | ||
| choices=MonitorType.as_choices(), | ||
| ) | ||
| config = EncryptedJsonField(default=dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What goes in here? Could any of those fields be part of the official schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schedule is the only one, but we'd have to agree to always store it the same for any kind of monitor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also there is a spec for this if you want to read (its in notion)
|
|
||
|
|
||
| def generate_secret(): | ||
| return uuid4().hex + uuid4().hex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get this function from sentry.models.apitoken:generate_token
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way django 1.8+ generates migration the default has to be module-scoped, which is the only reason im not using a lambda here. I generally frown upon using locally scoped functions across multiple modules (e.g. APIToken has nothing to do with Monitor), and given that its for a specific field default I think its fine to duplicate the code
9097e9f to
3f6d674
Compare
|
This might be functional enough that we could merge in and prototype |
3f6d674 to
f8d4354
Compare
317582e to
826bd4d
Compare
|
Two schema things we should agree on:
|
3d01b38 to
f271a9b
Compare
f271a9b to
36caea9
Compare
|
So one more decision: do we just use guid as pk? |
36caea9 to
50ff25f
Compare
|
After convo in Slack - let's keep guid and id as there are pros/cons of replacing id with guid that we'd prefer to not deal w/ today. |
f1745f8 to
b84402c
Compare
src/sentry/tasks/check_monitors.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely need an index for this query. Probably fully on (type, status, next_checkin). Can get away without having status in there if we think that case is pretty rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can ignore status and just NULL the field if you delete/disable the monitor.
b84402c to
f9e4b8e
Compare
I would prefer to have as many real columns as we can. Having data in json blobs makes future schema changes and record updates harder as we have to decode the json blob.
I think storing both would be ideal. Having duration allows the creation of alert rules based on the job runtime without having to infer based on the schedule + completion time. Duration is also useful for being able to know when we have missed a check-in with more confidence. |
|
@markstory the json decoding isn’t an issue but we aren’t certain on how scheduling should be defined for other monitors. For example it’s much more complex to parse ceontqb schedules that it is “run_every_minutes=1. I also wouldn’t want columns if they weren’t uniform types and used by every subschema. Not sure I understand your point on the duration bit. I do think we might need a date_updated to understand the last communication on an individual check in. |
|
If anyone has anything else they think is hard to take back before prototyping this with our own stuff let me know and we can resolve before merge, otherwise my goal is to get this into prod and wire up some of our jobs // start talking to some customers more about what this might look like down the road. |
src/sentry/models/monitor.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also note: @alex-hofsteede @bretthoerner @tkaemming this is what i was talking about where its really hard to internally generate a real event and insert it into the pipeline. In this case I wanted to bypass rate limits and standard tracking since we wouldn't want to enforce them with this. I realize we could refactor it, but I also didn't want to be responsible for that. Something we should consider going forward within our internal APIs.
f9e4b8e to
c90d220
Compare
|
I added date_updated to checkins which will let us manage a long-running task more effectively. It could enable the kind of reporting that says "this is still running" vs "it just hasn't checked in as finished". |
c90d220 to
7864186
Compare
|
Also added the 'monitor' context type so it will automatically register the tag. |
|
Need to fix UUIDField for sqlite yet |
7864186 to
a9c05ce
Compare
- Add support for schedule-based push monitors - And endpoint for creating and updating checkins (DSN-based auth supported) - Generate a simplistic message event upon monitor failure
a9c05ce to
2debd39
Compare
|
Updated UUIDField to work in sqlite - a few changes from pgfields:
|
|
@mitsuhiko can you confirm no issues w/ the way im doing this context type in here? thats the only thing we cant easily take away afaik |
|
@dcramer yap, this is fine. It will also roundtrip unprocessed through the new normalizer untouched but if we want to keep it we will want to add it to the protocol in semaphore so it's validated correctly. You don't have to do that though, but we should not forget to do that (reference: https://github.com/getsentry/semaphore/blob/96ebf37c3fd2a205fcf60569514f8b68aab277c2/general/src/protocol/contexts.rs#L138-L169) @untitaker cc for the above context addition |
|
maybe ill take a stab at adding it to the rust binding since i havent touched rust before |
|
The current suite of datetime/duration fields you have cover all the useful timing bits I think we'll need 👍 |
|
chatted in Slack - gonna merge and then take next steps to wire this up to our celery scheduled tasks |