Handling of Terminal Errors for Connectors During Discover & Publication #1007

williamhbaker · 2023-04-17T19:00:56Z

williamhbaker
Apr 17, 2023
Maintainer

When doing a discover or publication connectors may fail with an error if the configuration does not allow for successful completion of the operation: Wrong credentials, incorrect endpoint, database misconfiguration, etc. For these well-understood cases, the connector can return a specific human readable error message. Work has been done to accomplish this for SQL capture connectors in estuary/connectors#564 and SQL materialization connectors in estuary/connectors#612. We'd like be able to thread these specific error messages back through to the user in as clear of a way as possible.

The proposed mechanism for this is to add the human-readable error message to the draft_errors table for the failing draft, while keeping the developer-oriented log lines in the log output.

This already pretty much happens for publications, but the message contains additional wrapping from the error bubbling up through the connector & into the connector boilerplate, into connector-init via the connector stderr, and to the agent via grpc. Errors from discovers are not added to the draft_errors table but they do have drafts and could in principal work the same way as errors from publications.

For example, the error message in draft_errors that is shown to the user in the UI for an incorrect user/password on a postgres materialization publication ends up like this:

connector error while validating materialization bobCo/pgtest

Caused by:
    invocation failed: connector failed (exit status: 1) with stderr:
    the materialization cannot run due to the following error(s):
     - incorrect username or password

The essential message that the connector intends to communicate is only the last part, which stripped of the additional wrapping would ideally look like this:

the materialization cannot run due to the following error(s):
 - incorrect username or password

Backend Work

The initial step is to start populating the draft_errors table with only the error text from the connector error itself, omitting the included "wrapping". Once the mechanism for threading this basic text message through from the connector to the draft_errors table is in place it could be enhanced to support more elaborate presentations like markdown.

To achieve this initial step for publications:

The connector will output a specifically structured line of output over stderr to connector-init for errors containing human-readable content.
connector-init will parse this text and include it in the grpc error message to the agent
The agent will add the text to the draft_errors table. If no human-readable content is present for a connector error, draft_errors will be populated as "Connector failed. See logs for details" or something similar.

As mentioned before the process for discovers will need to change so that it works like publications and populates the draft_errors table with the terminal error for the operation. The agent currently runs flowctl-go directly and streams all log output from the connector into the log_lines table. It will instead need to run the connector itself and interact with connector-init so that the human-readable error message content can be extracted.

Frontend Work

The scope described so far is limited to simple text errors. Presentation of formatted (markdown, etc.) errors is definitely something we'll want to do as soon as practical, but for now the first step of optimizing the handling of text errors will provide immediate benefits.

There are at least a couple of things we could consider on the frontend:

Newlines currently aren't handled for text from draft_errors when shown in the UI under Configuration Test Failed. Ideally these newlines would be applied at the "top level" error message shown to the user without needing to expand the logs. See below for example of the current presentation:
We'll need to fetch the details for a failed discover from draft_errors.
- One possible complication here is that a (successful) discover followed by a publication work off the same draft_id, so the challenge is telling the difference between an error during discovery for a draft_id vs an error during publication for that same draft_id. Practically this should still work with our current UI discover -> publish flow: Every time a new discover is attempted, a new draft (and draft_id) is created. If that discover has an error, any errors in draft_errors for that draft_id must have been a result of the discover. If the discover is successful it is safe to assume during publication that any errors in draft_errors for that draft_id were a result of the publication. We may want to add more structure to these errors somehow to make the difference in these errors more robust though.

williamhbaker · 2023-04-17T19:03:53Z

williamhbaker
Apr 17, 2023
Maintainer Author

From the above, there could be 3 separate lines of work for the backend changes:

Invoke connectors directly (from rust) during during discovery
Teach connector-init & agent to handle terminal connector errors
Update connectors to emit terminal connector errors

0 replies

jgraettinger · 2023-04-17T19:56:40Z

jgraettinger
Apr 17, 2023
Maintainer

one tactical thought, is to use fatal or panic levels as the terminal error. Then, one would call logrus.Fatal to both crash the connector, and also write out its terminal error.

One place i have some uncertainty: we have a separate conversation going where we'll use failures of materialization validation to identify when a collection may need to be re-versioned. Currently, we don't have a means to extract which materialization binding is not able to validate.

Is this something that should be plumbed through this final error ? That's a bit weird.

1 reply

williamhbaker Apr 17, 2023
Maintainer Author

I think the final error will need some structure anyway. For example, we'll eventually want to support markdown errors and I think we'd want to continue to write an error message to the logs journal that isn't markdown (should be readable as plain text), which would need a separate field for logMessage vs errorMessage (for example). Maybe binding makes sense there too? Otherwise I don't see how we'd communicate from the connector that kind of information back out, unless a separate log message with a similar structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of Terminal Errors for Connectors During Discover & Publication #1007

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Handling of Terminal Errors for Connectors During Discover & Publication #1007

williamhbaker Apr 17, 2023 Maintainer

Backend Work

Frontend Work

Replies: 2 comments · 1 reply

williamhbaker Apr 17, 2023 Maintainer Author

jgraettinger Apr 17, 2023 Maintainer

williamhbaker Apr 17, 2023 Maintainer Author

williamhbaker
Apr 17, 2023
Maintainer

Replies: 2 comments 1 reply

williamhbaker
Apr 17, 2023
Maintainer Author

jgraettinger
Apr 17, 2023
Maintainer

williamhbaker Apr 17, 2023
Maintainer Author