Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(jobsdb): latest job status query optimization #2693

Merged
merged 3 commits into from
Nov 21, 2022

Conversation

atzoum
Copy link
Contributor

@atzoum atzoum commented Nov 10, 2022

Description

When jobsdb queries the database for jobs, it needs to get the latest job status for each job. The current way of achieving this is through a SELECT MAX(id) GROUP BY job_id query in the job status table.

However, a specialized index (job_id asc, id desc) along with using SELECT DISTINCT ON (job_id) yields significant performance improvements over the current strategy (25% - 100% faster), see some example query plans: old vs new. See also the relevant thread.

To simplify queries even further, a new database view is introduced for every job status table (e.g. table rt_job_status_1 has view v_last_rt_job_status_1) abstracting away the actual query that provides this information.

Note: This pull request shall be reviewed along with its counterpart in rudder-migrator: #32

Bonus!

Added an easy-to-use function unionjobsdb(prefix text, num int) for quickly querying multiple jobsdb datasets for jobs along with their latest status. You no longer have to keep track of constantly changing jobsdb indexes...

E.g. to query the first 5 rt tables for Amplitude jobs, all you need to do now is:

SELECT * FROM unionjobsdb('rt',5) WHERE custom_val = 'AM'

Notion Ticket

Link

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@codecov
Copy link

codecov bot commented Nov 10, 2022

Codecov Report

Base: 45.43% // Head: 45.39% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (329e635) compared to base (bca84b7).
Patch coverage: 63.26% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2693      +/-   ##
==========================================
- Coverage   45.43%   45.39%   -0.04%     
==========================================
  Files         290      290              
  Lines       48101    48066      -35     
==========================================
- Hits        21853    21821      -32     
+ Misses      24861    24857       -4     
- Partials     1387     1388       +1     
Impacted Files Coverage Δ
jobsdb/readonly_jobsdb.go 34.79% <40.00%> (+0.31%) ⬆️
jobsdb/jobsdb.go 72.97% <68.42%> (-0.36%) ⬇️
jobsdb/unionQuery.go 84.13% <100.00%> (-0.62%) ⬇️
jobsdb/unionQueryLegacy.go 70.49% <0.00%> (-4.92%) ⬇️
config/backend-config/namespace_config.go 67.00% <0.00%> (-3.00%) ⬇️
processor/processor.go 85.99% <0.00%> (ø)
warehouse/schema.go 51.42% <0.00%> (+1.14%) ⬆️
services/db/recovery.go 53.01% <0.00%> (+2.40%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@atzoum atzoum force-pushed the chore.queryOptimisation branch 4 times, most recently from dab1167 to 4d0241a Compare November 11, 2022 14:59
@atzoum atzoum changed the title [WIP] chore(jobsdb): latest job status query optimization chore(jobsdb): latest job status query optimization Nov 14, 2022
@atzoum atzoum force-pushed the chore.queryOptimisation branch 2 times, most recently from a6aba26 to aba00db Compare November 14, 2022 13:11
@atzoum atzoum marked this pull request as ready for review November 14, 2022 13:33
@atzoum atzoum force-pushed the chore.queryOptimisation branch 2 times, most recently from de53cee to d414a37 Compare November 16, 2022 06:44
jd.dsListLock.WithLock(func(l lock.LockToken) {
switch jd.ownerType {
case Write, ReadWrite:
jd.setupDatabaseTables(l, jd.clearAll)
Copy link
Contributor Author

@atzoum atzoum Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: all gateways will need to be upgraded to the latest version so that the necessary schema migration will be performed on gw datasets.

@atzoum atzoum force-pushed the chore.queryOptimisation branch 3 times, most recently from 3af7c40 to 78368c5 Compare November 17, 2022 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants