Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: migrate ScheduledJob from rethinkdb to pg #9490

Merged
merged 4 commits into from
Mar 14, 2024

Conversation

jordanh
Copy link
Contributor

@jordanh jordanh commented Mar 1, 2024

Description

Resolves: #9489

I had a bit of extra time last night and decided to try and tackle this small migration. I wanted to see firsthand what kind of effort these migrations (even the simple ones) take.

Testing scenarios

  • Set a meeting timer / let it elapse

  • Check scheduled team limits (not checked, as this is not currently in use)

Final checklist

  • I checked the code review guidelines
  • I have added Metrics Representative as reviewer(s) if my PR invovles metrics/data/analytics related changes
  • I have performed a self-review of my code, the same way I'd do it for any other team member
  • I have tested all cases I listed in the testing scenarios and I haven't found any issues or regressions
  • Whenever I took a non-obvious choice I added a comment explaining why I did it this way
  • I added the label Skip Maintainer Review Indicating the PR only requires reviewer review and can be merged right after it's approved if the PR introduces only minor changes, does not contain any architectural changes or does not introduce any new patterns and I think one review is sufficient'
  • PR title is human readable and could be used in changelog

@github-actions github-actions bot requested a review from tianrunhe March 1, 2024 21:47
@jordanh jordanh requested review from Dschoordsch and removed request for tianrunhe March 1, 2024 21:47
@github-actions github-actions bot added the size/m label Mar 1, 2024
export type ScheduledJobType =
| 'MEETING_STAGE_TIME_LIMIT_END'
| 'LOCK_ORGANIZATION'
| 'WARN_ORGANIZATION'

export default abstract class ScheduledJob {
id = generateUID()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postgres now manages the id

if (filter) {
Object.keys(filter).forEach((key) => {
const value = filter[key as keyof FilterType]
if (value) query = query.where(key as keyof FilterType, '=', value)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will "AND ${field} = ${value}" for anything specified in the filter object

const removeScheduledJobs = async (runAt: Date, filter: {[key: string]: any}) => {
const r = await getRethink()
return r.table('ScheduledJob').getAll(runAt, {index: 'runAt'}).filter(filter).delete().run()
type FilterType = Omit<Updateable<DB['ScheduledJob']>, 'runAt'>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Updatable makes all the fields optional

@jordanh jordanh force-pushed the feat/9489/migrate-scheduledjob branch from 1f3abb7 to 774e867 Compare March 1, 2024 21:55
}
}

const getNextData = async (leftBoundCursor: Date | undefined) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: see

const getNextData = async (leftBoundCursor: Date | undefined) => {
for where I pilfered this logic

@jordanh jordanh force-pushed the feat/9489/migrate-scheduledjob branch from 774e867 to 4cd113e Compare March 1, 2024 21:59
Copy link
Member

@mattkrick mattkrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good! haven't run a test on it, but can't see any flaws in it from here

import {r} from 'rethinkdb-ts'
import {RValue} from '../../database/stricterR'
import {DataLoaderWorker} from '../../graphql/graphql'
import updateNotification from '../../graphql/public/mutations/helpers/updateNotification'

const removeTeamsLimitObjects = async (orgId: string, dataLoader: DataLoaderWorker) => {
const removeJobTypes = ['LOCK_ORGANIZATION', 'WARN_ORGANIZATION']
const removeJobTypes = ['LOCK_ORGANIZATION', 'WARN_ORGANIZATION'] as DB['ScheduledJob']['type'][]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 as const is a little narrower here, same can be applied to the line below it or any literal tuple

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep forgetting this!

@jordanh
Copy link
Contributor Author

jordanh commented Mar 2, 2024

@Dschoordsch, @mattkrick had a look – I yield to you if you want to test and merge

@jordanh jordanh force-pushed the feat/9489/migrate-scheduledjob branch 3 times, most recently from d58e5ea to df6024b Compare March 3, 2024 17:53
Copy link
Contributor

@Dschoordsch Dschoordsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did not test yet

'batchSize is smaller than the number of items that share the same cursor. Increase batchSize'
)
}
return nextBatch.slice(0, lastMatchingRunAt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 we should include lastMatchingRunAt

    return nextBatch.slice(0, lastMatchingRunAt + 1)

otherwise we might lose data. Assume a sequence [..., 4, 5, 5, 6], we would return [..., 4, 5] and start with the open (5, ...) next time, omitting this one element.

This sparks the idea that the migration should read the count first and verify it matches in the count in pg

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in future migrations I think I'll probably create a compound index that includes the timestamp + ID so we're guaranteed a unique ID that is also serial so items can still get updated while the migration is running

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dschoordsch this is not correct, see: #9502 (comment)

const upcomingJobs = await pg
.selectFrom('ScheduledJob')
.selectAll()
.where('runAt', '>=', new Date(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think this line could just be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's probably true. I was being a bit too direct in my translation ;)


CREATE INDEX IF NOT EXISTS "idx_ScheduledJob_orgId" ON "ScheduledJob"("orgId");
CREATE INDEX IF NOT EXISTS "idx_ScheduledJob_runAt" ON "ScheduledJob"("runAt");
CREATE INDEX IF NOT EXISTS "idx_ScheduledJob_type" ON "ScheduledJob"("type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Not sure we need an index here, it's only used after selecting for orgId, so only a few handful of rows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these indexes existed on the rethinkdb table...maybe we keep them for that reason?

  - also: removed unecessary transform from earlier revision
  - also: fixed off-by-1 error in move migration
@jordanh jordanh force-pushed the feat/9489/migrate-scheduledjob branch from df6024b to 6ea106a Compare March 8, 2024 20:01
@jordanh
Copy link
Contributor Author

jordanh commented Mar 8, 2024

@Dschoordsch branch has been updated to address one of your code review comments & rebased to master

#9500 should be merged first in order for the migration order to be correct

also:
   * rebased to master and updated migration order
@jordanh jordanh force-pushed the feat/9489/migrate-scheduledjob branch from 6ea106a to a1fc4e9 Compare March 8, 2024 20:17
@Dschoordsch Dschoordsch self-requested a review March 14, 2024 17:29
@jordanh jordanh merged commit 5c39fde into master Mar 14, 2024
5 checks passed
@jordanh jordanh deleted the feat/9489/migrate-scheduledjob branch March 14, 2024 20:14
@github-actions github-actions bot mentioned this pull request Mar 14, 2024
24 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chore: migrate ScheduledJob to pg (1/1)
3 participants