Avoid locking common records and remove transaction annotation #2036

collado-mike · 2022-07-14T20:46:42Z

Problem

The OpenLineage API implementation is unnecessarily wrapped in a @Transaction annotation, which means that under heavy write load, there's a lot of unnecessary contention. For example, we lock the Namespace record in the upsertNamespaceRow method, even though we don't really update the namespace. Since the whole request is wrapped in a transaction, the namespace record is locked for the entire time we write new jobs, runs, and datasets. If many jobs for the same namespace run simultaneously, each OpenLineage request will wait until the namespace record is unlocked, forming a queue of requests, rather than handling each in parallel. This also happens for JobContexts and RunArgs, as oftentimes, many jobs will run with no arguments, no source code location, and no SQL facet. Many completely unrelated jobs will block while awaiting a lock on an empty JobContext and/or empty RunArgs.

I updated each of these upsert queries to do nothing on conflict (they're really immutable records) to avoid locking the impacted record.

Moreover, I don't see a reason for this API to be wrapped in a transaction at all. These changes don't need to be made atomically. OpenLineage requests are mostly idempotent, so if only partial database updates are applied and the same request made twice, there's no duplication in the data. Jobs, Runs, Datasets, JobContexts, and RunArgs created in the first request will be found and used in the second. Foreign key constraints require us to always write a new record before updating an existing one to reference it, so there's no opportunity to leave the database in an inconsistent state.

This screenshot shows the database under heavy write load - note that waits are dominated by transactionid (Waiting for a transaction to finish) and the primary contender in this case is the NamespaceDao.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

You've signed-off your work
Your changes are accompanied by tests (if relevant)
Your change contains a small diff and is self-contained
You've updated any relevant documentation (if relevant)
You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
You've included a header in any source code files (if relevant)

Signed-off-by: Michael Collado <collado.mike@gmail.com>

codecov · 2022-07-14T20:50:55Z

Codecov Report

Merging #2036 (0d1f82d) into main (cdee331) will increase coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head 0d1f82d differs from pull request most recent head dfb0aee. Consider uploading reports for the commit dfb0aee to get more accurate results

@@             Coverage Diff              @@
##               main    #2036      +/-   ##
============================================
+ Coverage     78.89%   78.92%   +0.03%     
- Complexity     1014     1018       +4     
============================================
  Files           197      199       +2     
  Lines          5548     5557       +9     
  Branches        421      421              
============================================
+ Hits           4377     4386       +9     
  Misses          724      724              
  Partials        447      447

Impacted Files	Coverage Δ
api/src/main/java/marquez/db/OpenLineageDao.java	`95.78% <ø> (ø)`
api/src/main/java/marquez/db/JobContextDao.java	`100.00% <100.00%> (ø)`
api/src/main/java/marquez/db/JobVersionDao.java	`89.85% <100.00%> (+0.46%)`	⬆️
api/src/main/java/marquez/db/NamespaceDao.java	`91.30% <100.00%> (+0.82%)`	⬆️
api/src/main/java/marquez/db/RunArgsDao.java	`100.00% <100.00%> (ø)`

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

wslulciuc · 2022-07-15T23:09:08Z

api/src/main/java/marquez/db/NamespaceDao.java

+  default NamespaceRow upsertNamespaceRow(
+      UUID uuid, Instant now, String name, String currentOwnerName) {
+    doUpsertNamespaceRow(uuid, now, name, currentOwnerName);
+    return findNamespaceByName(name).orElseThrow();


Minor: The RETURNING * was to avoid inserting, then querying for the namespace in two separate queries. But, also understand the cost of always updating updated_at to trigger the modified row to be returned. At some point, I'd love to flush out the usage of updated_at for all of our tables.

Unfortunately, the RETURNING * only returns if the record was modified - if we want to DO NOTHING to avoid locking the record, we have to execute a separate query 🤷🏽‍♂️

wslulciuc

Great analysis and write up @collado-mike on the DB performance! Who knew such "minor" changes could improve query performance so much. As for transactions, I agree. Originally, we wanted to keep the write path for Marquez in sync with metadata collected via OL. Now, I do think we'll have to wrap some DB calls in a transaction, but let's add them as we encounter data inconsistencies.

collado-mike added 5 commits July 14, 2022 12:54

Fix NamespaceDao to stop updating namespace record on conflict

b605794

Signed-off-by: Michael Collado <collado.mike@gmail.com>

Update JobContextDao to stop locking job context on conflict

a3a7e69

Signed-off-by: Michael Collado <collado.mike@gmail.com>

Update RunArgsDao to stop locking run args on conflict

d22fa10

Signed-off-by: Michael Collado <collado.mike@gmail.com>

Remove transaction annotations

b882e50

Signed-off-by: Michael Collado <collado.mike@gmail.com>

Update changelog

0d1f82d

Signed-off-by: Michael Collado <collado.mike@gmail.com>

collado-mike requested a review from wslulciuc July 14, 2022 20:46

wslulciuc reviewed Jul 15, 2022

View reviewed changes

wslulciuc approved these changes Jul 15, 2022

View reviewed changes

wslulciuc enabled auto-merge (squash) July 15, 2022 23:16

Merge branch 'main' into namespace_no_conflict

dfb0aee

wslulciuc merged commit 78eeae0 into main Jul 15, 2022

wslulciuc deleted the namespace_no_conflict branch July 15, 2022 23:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid locking common records and remove transaction annotation #2036

Avoid locking common records and remove transaction annotation #2036

collado-mike commented Jul 14, 2022 •

edited

Loading

codecov bot commented Jul 14, 2022 •

edited

Loading

wslulciuc Jul 15, 2022

collado-mike Jul 18, 2022

wslulciuc left a comment

Avoid locking common records and remove transaction annotation #2036

Avoid locking common records and remove transaction annotation #2036

Conversation

collado-mike commented Jul 14, 2022 • edited Loading

Problem

Checklist

codecov bot commented Jul 14, 2022 • edited Loading

Codecov Report

wslulciuc Jul 15, 2022

Choose a reason for hiding this comment

collado-mike Jul 18, 2022

Choose a reason for hiding this comment

wslulciuc left a comment

Choose a reason for hiding this comment

collado-mike commented Jul 14, 2022 •

edited

Loading

codecov bot commented Jul 14, 2022 •

edited

Loading