Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refresh before syncs when feature flag is on #19888

Merged
merged 24 commits into from
Dec 6, 2022

Conversation

alovew
Copy link
Contributor

@alovew alovew commented Nov 29, 2022

Refresh schemas before each sync using feature flag

The main logic here is in SyncWorkflowImpl.java

@octavia-squidington-iv octavia-squidington-iv added area/platform issues related to the platform area/worker Related to worker labels Nov 29, 2022
@alovew alovew temporarily deployed to more-secrets November 29, 2022 20:01 Inactive
@alovew alovew temporarily deployed to more-secrets November 29, 2022 20:01 Inactive
@alovew alovew temporarily deployed to more-secrets November 29, 2022 22:46 Inactive
@alovew alovew temporarily deployed to more-secrets November 29, 2022 22:46 Inactive
@alovew alovew force-pushed the anne/feature-flag-refresh-schema branch from 0a83a93 to 25cc417 Compare November 30, 2022 18:19
@alovew alovew temporarily deployed to more-secrets November 30, 2022 18:22 Inactive
@alovew alovew temporarily deployed to more-secrets November 30, 2022 18:22 Inactive
@alovew alovew temporarily deployed to more-secrets November 30, 2022 18:59 Inactive
@alovew alovew temporarily deployed to more-secrets November 30, 2022 18:59 Inactive
@alovew alovew force-pushed the anne/feature-flag-refresh-schema branch from 9326cc7 to 527d2fc Compare November 30, 2022 20:09
@alovew alovew temporarily deployed to more-secrets November 30, 2022 20:12 Inactive
try {
airbyteApiClient.getSourceApi().discoverSchemaForSource(requestBody);
} catch (final Exception e) {
log.info("Attempted schema refresh, but failed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an error log with the exception itself logged as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yep, meant to update that! I also have a question about logging in general. These logs (and the ones in SyncWorkflowImpl) are not showing up in the UI logs and I'm not sure why.

@alovew alovew temporarily deployed to more-secrets November 30, 2022 21:55 Inactive
@alovew alovew temporarily deployed to more-secrets November 30, 2022 22:02 Inactive
@alovew alovew temporarily deployed to more-secrets November 30, 2022 22:05 Inactive
@alovew alovew temporarily deployed to more-secrets November 30, 2022 22:07 Inactive
@alovew alovew requested a review from jdpgrailsdev December 1, 2022 19:41
@alovew
Copy link
Contributor Author

alovew commented Dec 1, 2022

Two questions on this. I'm seeing this CI error:

___________ test_read_stream_returns_error_if_stream_does_not_exist ____________

    def test_read_stream_returns_error_if_stream_does_not_exist():
        expected_status_code = 400
        expected_detail = "Could not perform read with with error: \"The requested stream not_in_manifest was not found in the source. Available streams: dict_keys(['hashiras', 'breathing-techniques'])\""
    
        api = DefaultApiImpl()
        loop = asyncio.get_event_loop()
        with pytest.raises(HTTPException) as actual_exception:
            loop.run_until_complete(
                api.read_stream(StreamReadRequestBody(manifest=MANIFEST, config=***, stream="not_in_manifest"))
            )
    
        assert actual_exception.value.status_code == expected_status_code
>       assert expected_detail in actual_exception.value.detail
E       assert 'Could not perform read with with error: "The requested stream not_in_manifest was not found in the source. Available streams: dict_keys([\'hashiras\', \'breathing-techniques\'])"' in "Could not perform read with with error: The requested stream not_in_manifest was not found in the source. Available streams: dict_keys(['hashiras', 'breathing-techniques'])"
E        +  where "Could not perform read with with error: The requested stream not_in_manifest was not found in the source. Available streams: dict_keys(['hashiras', 'breathing-techniques'])" = HTTPException(status_code=400, detail="Could not perform read with with error: The requested stream not_in_manifest was not found in the source. Available streams: dict_keys(['hashiras', 'breathing-techniques'])").detail
E        +    where HTTPException(status_code=400, detail="Could not perform read with with error: The requested stream not_in_manifest was not found in the source. Available streams: dict_keys(['hashiras', 'breathing-techniques'])") = <ExceptionInfo HTTPException(status_code=400, detail="Could not perform read with with error: The requested stream not_in_manifest was not found in the source. Available streams: dict_keys(['hashiras', 'breathing-techniques'])") tblen=3>.value

and I'm not seeing how that's related.

Also - the logs I added in SyncWorkflowImpl are not showing up in the UI. Does anyone have insight into which logs end up in the UI?

@@ -72,6 +86,23 @@ public StandardSyncOutput run(final JobRunConfig jobRunConfig,

final int version = Workflow.getVersion(VERSION_LABEL, Workflow.DEFAULT_VERSION, CURRENT_VERSION);
final String taskQueue = Workflow.getInfo().getTaskQueue();

if (version > Workflow.DEFAULT_VERSION) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a dedicated version for that. Re-using the existing version won't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I actually had this locally but didn't push up the change - doing that now


@Trace(operationName = WORKFLOW_TRACE_OPERATION_NAME)
@Override
public StandardSyncOutput run(final JobRunConfig jobRunConfig,
final IntegrationLauncherConfig sourceLauncherConfig,
final IntegrationLauncherConfig destinationLauncherConfig,
final StandardSyncInput syncInput,
final UUID connectionId) {
final UUID connectionId)
throws JsonValidationException, ConfigNotFoundException, IOException, ApiException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that those exception shouldn't be in the method signature, we should catch them and set the failure reason in the return StandardSyncOutput

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just updated this -- I actually think we shouldn't fail the sync just because a schema refresh fails. I logged the error, but I'm not sure if that would be confusing to users. What do you think?

@alovew alovew force-pushed the anne/feature-flag-refresh-schema branch from a69ad8c to 7db8ba9 Compare December 1, 2022 21:00
@alovew alovew temporarily deployed to more-secrets December 1, 2022 21:03 Inactive
@alovew alovew requested a review from benmoriceau December 1, 2022 21:17
@alovew alovew force-pushed the anne/feature-flag-refresh-schema branch from da9aeb6 to 825e42a Compare December 2, 2022 22:24
@alovew alovew temporarily deployed to more-secrets December 2, 2022 22:27 Inactive

public RefreshSchemaActivityImpl(Optional<ConfigRepository> configRepository, SourceApi sourceApi) {
public RefreshSchemaActivityImpl(Optional<ConfigRepository> configRepository,
AirbyteApiClient airbyteApiClient,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we inject the specific ApiClient that are needed rather than the top level one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but was getting a micronaut bean error, so I followed how you did it in the PersistStateActivity

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time PersistStateActivity was added, I don't think we had the independent clients yet.
The way to solve the micronaut bean error should be to add the client to the API factory:

@benmoriceau, any specific reasons we do not have those yet other than did not have time to get to it?

Copy link
Contributor

@benmoriceau benmoriceau Dec 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know of. We should add them

@@ -423,4 +450,13 @@ private static void verifyDbtTransform(final DbtTransformationActivity dbtTransf
operatorDbtInput);
}

private static void verifyShouldRefreshSchema(final RefreshSchemaActivity refreshSchemaActivity) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@alovew alovew requested a review from gosusnp December 2, 2022 23:31
@alovew alovew temporarily deployed to more-secrets December 3, 2022 00:31 Inactive
@alovew alovew temporarily deployed to more-secrets December 3, 2022 00:56 Inactive
@alovew alovew temporarily deployed to more-secrets December 3, 2022 01:19 Inactive
@alovew alovew temporarily deployed to more-secrets December 5, 2022 23:08 Inactive
@alovew alovew requested a review from gosusnp December 5, 2022 23:13
@alovew alovew temporarily deployed to more-secrets December 5, 2022 23:51 Inactive
try {
sourceApi.discoverSchemaForSource(requestBody);
} catch (final Exception e) {
log.error("Attempted schema refresh, but failed with error: ", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could be worth adding a comment explaining that we swallow the exception here to avoid blocking the replication if we fail to refresh.

@alovew alovew temporarily deployed to more-secrets December 6, 2022 01:02 Inactive
@alovew alovew merged commit 2b045a9 into master Dec 6, 2022
@alovew alovew deleted the anne/feature-flag-refresh-schema branch December 6, 2022 02:46
malikdiarra added a commit that referenced this pull request Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/worker Related to worker
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants