Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate and read from SyncStats table #16476

Merged
merged 21 commits into from
Sep 9, 2022
Merged

Conversation

alovew
Copy link
Contributor

@alovew alovew commented Sep 8, 2022

What

This PR does a few things around the SyncStats table:

  1. There is a new migration for updating the attempt_id foreign key from an Int to a BigInt, since the id on the attempts table is a BigInt
  2. Updates the writeOutput method in DefaultJobPersistence.java. Because we are continuing to store sync stats in the JSON blob, I put the write to sync_stats inside the transaction that writes to the sync output to the attempts table. Once we move SyncStats completely out of that JSON blob, we should have a separate method for writing to the sync stats table.
  3. New method getSyncStats for reading records from the SyncStats table

Recommended reading order

  1. New migration: ChangeSyncStatsForeignKey
  2. DefaultJobPersistence.java: updated write output method
  3. DefaultJobPersistence.java: new getSyncStats method
  4. Tests

@github-actions github-actions bot added area/platform issues related to the platform area/scheduler area/worker Related to worker labels Sep 8, 2022
Copy link
Contributor

@jdpgrailsdev jdpgrailsdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

gosusnp
gosusnp previously requested changes Sep 8, 2022
Copy link
Contributor

@gosusnp gosusnp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

.set(SYNC_STATS.ATTEMPT_ID, attemptId)
.set(SYNC_STATS.BYTES_EMITTED, syncStats.getBytesEmitted())
.set(SYNC_STATS.RECORDS_EMITTED, syncStats.getRecordsEmitted())
.set(SYNC_STATS.RECORDS_COMMITTED, syncStats.getRecordsEmitted())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRecordsCommitted?

@@ -249,14 +252,34 @@ void testWriteOutput() throws IOException {
final long jobId = jobPersistence.enqueueJob(SCOPE, SPEC_JOB_CONFIG).orElseThrow();
final int attemptNumber = jobPersistence.createAttempt(jobId, LOG_PATH);
final Job created = jobPersistence.getJob(jobId);
final JobOutput jobOutput = new JobOutput().withOutputType(JobOutput.OutputType.DISCOVER_CATALOG);
final SyncStats syncStats =
new SyncStats().withBytesEmitted(100L).withRecordsEmitted(10L).withRecordsCommitted(10L).withDestinationStateMessagesEmitted(1L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different values for recordsEmitted and recordsCommitted would have caught the typo.

@alovew alovew temporarily deployed to more-secrets September 8, 2022 22:28 Inactive
@alovew alovew temporarily deployed to more-secrets September 8, 2022 23:27 Inactive
@alovew alovew temporarily deployed to more-secrets September 9, 2022 00:28 Inactive
@alovew alovew temporarily deployed to more-secrets September 9, 2022 01:41 Inactive
@alovew alovew temporarily deployed to more-secrets September 9, 2022 02:51 Inactive
@alovew alovew dismissed gosusnp’s stale review September 9, 2022 05:32

implemented changes

@alovew alovew merged commit 3fc6730 into master Sep 9, 2022
@alovew alovew deleted the anne/populate-sync-stats branch September 9, 2022 05:33
robbinhan pushed a commit to robbinhan/airbyte that referenced this pull request Sep 29, 2022
- Populate sync stats table when job is complete
- Method to read from sync stats table
jhammarstedt pushed a commit to jhammarstedt/airbyte that referenced this pull request Oct 31, 2022
- Populate sync stats table when job is complete
- Method to read from sync stats table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/scheduler area/worker Related to worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants