Skip to content

Conversation

@dcramer
Copy link
Member

@dcramer dcramer commented Jan 18, 2019

  • Add support for schedule-based push monitors
  • And endpoint for creating and updating checkins (DSN-based auth supported)

@dcramer dcramer added the WIP label Jan 18, 2019
@dcramer
Copy link
Member Author

dcramer commented Jan 18, 2019

If anyone has thoughts on the current implementation its almost in a usable state. The main missing components are:

  1. we need to generate an event (this is actually complicated to do right now)
  2. we'll likely add a new event type for monitors ("monitor failure")
  3. we need to make sure rules expose event type so we can define one specifically for monitors
  4. theres absolutely no UI, but thats ok for now

@dcramer dcramer force-pushed the feat/monitors branch 4 times, most recently from a5539bb to 4cb4d59 Compare January 18, 2019 21:57

def post(self, request, project, monitor):
"""
Create a new check-in for a monitor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking that users would make an API request directly, or would monitor creation be part of the SDKs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will happen via the UI or API

organization_id=project.organization_id,
monitor_id=monitor.id,
duration=result.get('duration'),
status=getattr(CheckInStatus, result['status']),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do monitors need to have release or environment context? Having releases/environments would allow users to track when jobs broke around deploys and whether or not they are just missing in staging.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release won't work for this kind of thing. We could probably allow environment.

default=MonitorType.UNKNOWN,
choices=MonitorType.as_choices(),
)
config = EncryptedJsonField(default=dict)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What goes in here? Could any of those fields be part of the official schema?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schedule is the only one, but we'd have to agree to always store it the same for any kind of monitor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also there is a spec for this if you want to read (its in notion)



def generate_secret():
return uuid4().hex + uuid4().hex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get this function from sentry.models.apitoken:generate_token

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way django 1.8+ generates migration the default has to be module-scoped, which is the only reason im not using a lambda here. I generally frown upon using locally scoped functions across multiple modules (e.g. APIToken has nothing to do with Monitor), and given that its for a specific field default I think its fine to duplicate the code

@dcramer dcramer force-pushed the feat/monitors branch 5 times, most recently from 9097e9f to 3f6d674 Compare January 23, 2019 00:50
@dcramer
Copy link
Member Author

dcramer commented Jan 23, 2019

This might be functional enough that we could merge in and prototype

@dcramer dcramer removed the WIP label Jan 23, 2019
@dcramer dcramer force-pushed the feat/monitors branch 2 times, most recently from 317582e to 826bd4d Compare January 23, 2019 17:23
@dcramer
Copy link
Member Author

dcramer commented Jan 23, 2019

Two schema things we should agree on:

  1. should schedule be a column (i think its ok to keep it uniform in syntax)
  2. should we use 'duration' or simply 'date_finished' for checkin's

@dcramer dcramer force-pushed the feat/monitors branch 2 times, most recently from 3d01b38 to f271a9b Compare January 23, 2019 18:56
@dcramer
Copy link
Member Author

dcramer commented Jan 23, 2019

So one more decision: do we just use guid as pk?

@dcramer
Copy link
Member Author

dcramer commented Jan 23, 2019

After convo in Slack - let's keep guid and id as there are pros/cons of replacing id with guid that we'd prefer to not deal w/ today.

@dcramer dcramer force-pushed the feat/monitors branch 2 times, most recently from f1745f8 to b84402c Compare January 23, 2019 22:47
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely need an index for this query. Probably fully on (type, status, next_checkin). Can get away without having status in there if we think that case is pretty rare.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ignore status and just NULL the field if you delete/disable the monitor.

@markstory
Copy link
Member

  1. should schedule be a column (i think its ok to keep it uniform in syntax)

I would prefer to have as many real columns as we can. Having data in json blobs makes future schema changes and record updates harder as we have to decode the json blob.

  1. should we use 'duration' or simply 'date_finished' for checkin's

I think storing both would be ideal. Having duration allows the creation of alert rules based on the job runtime without having to infer based on the schedule + completion time. Duration is also useful for being able to know when we have missed a check-in with more confidence.

@dcramer
Copy link
Member Author

dcramer commented Jan 23, 2019

@markstory the json decoding isn’t an issue but we aren’t certain on how scheduling should be defined for other monitors. For example it’s much more complex to parse ceontqb schedules that it is “run_every_minutes=1. I also wouldn’t want columns if they weren’t uniform types and used by every subschema.

Not sure I understand your point on the duration bit. I do think we might need a date_updated to understand the last communication on an individual check in.

@dcramer
Copy link
Member Author

dcramer commented Jan 24, 2019

If anyone has anything else they think is hard to take back before prototyping this with our own stuff let me know and we can resolve before merge, otherwise my goal is to get this into prod and wire up some of our jobs // start talking to some customers more about what this might look like down the road.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also note: @alex-hofsteede @bretthoerner @tkaemming this is what i was talking about where its really hard to internally generate a real event and insert it into the pipeline. In this case I wanted to bypass rate limits and standard tracking since we wouldn't want to enforce them with this. I realize we could refactor it, but I also didn't want to be responsible for that. Something we should consider going forward within our internal APIs.

@dcramer
Copy link
Member Author

dcramer commented Jan 24, 2019

I added date_updated to checkins which will let us manage a long-running task more effectively. It could enable the kind of reporting that says "this is still running" vs "it just hasn't checked in as finished".

@dcramer
Copy link
Member Author

dcramer commented Jan 24, 2019

Also added the 'monitor' context type so it will automatically register the tag.

@dcramer
Copy link
Member Author

dcramer commented Jan 24, 2019

Need to fix UUIDField for sqlite yet

- Add support for schedule-based push monitors
- And endpoint for creating and updating checkins (DSN-based auth supported)
- Generate a simplistic message event upon monitor failure
@dcramer
Copy link
Member Author

dcramer commented Jan 24, 2019

Updated UUIDField to work in sqlite - a few changes from pgfields:

  • not unique by default (doesnt make sense to me why we'd do that, since a UUID might be an fkey to another UUID)
  • swaps out with CHAR(32) in non-postgres dbs, and stores as UUID.hex
  • register uuid adapter at runtime, once

@dcramer
Copy link
Member Author

dcramer commented Jan 24, 2019

@mitsuhiko can you confirm no issues w/ the way im doing this context type in here? thats the only thing we cant easily take away afaik

@mitsuhiko
Copy link
Contributor

@dcramer yap, this is fine. It will also roundtrip unprocessed through the new normalizer untouched but if we want to keep it we will want to add it to the protocol in semaphore so it's validated correctly. You don't have to do that though, but we should not forget to do that (reference: https://github.com/getsentry/semaphore/blob/96ebf37c3fd2a205fcf60569514f8b68aab277c2/general/src/protocol/contexts.rs#L138-L169)

@untitaker cc for the above context addition

@dcramer
Copy link
Member Author

dcramer commented Jan 24, 2019

maybe ill take a stab at adding it to the rust binding since i havent touched rust before

@markstory
Copy link
Member

The current suite of datetime/duration fields you have cover all the useful timing bits I think we'll need 👍

@dcramer
Copy link
Member Author

dcramer commented Jan 25, 2019

chatted in Slack - gonna merge and then take next steps to wire this up to our celery scheduled tasks

@dcramer dcramer merged commit d27ecf0 into master Jan 25, 2019
@dcramer dcramer deleted the feat/monitors branch January 25, 2019 19:32
@github-actions github-actions bot locked and limited conversation to collaborators Dec 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants