-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report Normalization failures to Sentry #15695
Changes from 9 commits
145bae5
68285d9
70a4141
eec2bd3
681d976
3564c92
1e7f1ec
518c99a
d63189c
dea2aef
fe49c46
ec590da
a169a3b
fd7654b
7d4c0e7
23d5471
76fae3c
e21dc1d
f5ca8d7
de453be
0fa918e
e38e896
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,6 +32,9 @@ public Optional<List<SentryException>> buildSentryExceptions(final String stackt | |
if (stacktrace.contains("\tat ") && stacktrace.contains(".java")) { | ||
return buildJavaSentryExceptions(stacktrace); | ||
} | ||
if (stacktrace.startsWith("AirbyteDbtError: ")) { | ||
return buildNormalizationDbtSentryExceptions(stacktrace); | ||
} | ||
|
||
return Optional.empty(); | ||
}, Optional.empty()); | ||
|
@@ -166,4 +169,43 @@ private static Optional<List<SentryException>> buildJavaSentryExceptions(final S | |
return Optional.of(sentryExceptions); | ||
} | ||
|
||
private static Optional<List<SentryException>> buildNormalizationDbtSentryExceptions(final String stacktrace) { | ||
final List<SentryException> sentryExceptions = new ArrayList<>(); | ||
|
||
// focus on Database Errors (these are the common errors seen in Airbyte) | ||
final String databaseErrorIdentifier = "Database Error in model"; | ||
// for other dbt error types, we'll default to non-parsed stacktrace grouping | ||
if (!stacktrace.contains(databaseErrorIdentifier)) { | ||
return Optional.empty(); | ||
} | ||
|
||
final String[] separatedStackTrace = stacktrace.split("\n"); | ||
|
||
// first let's get our useful error line for grouping | ||
String usefulError = ""; | ||
boolean nextLine = false; | ||
for (String errorLine : separatedStackTrace) { | ||
// previous line was "Database Error..." so this is our useful message line | ||
if (nextLine) { | ||
usefulError = errorLine; | ||
break; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From reading this I think we're only building one SentryException from the stack trace, though I see the dbt stacks could have multiple errors (e.g. error 1 of 2 counts) - is there any value in building a SentryException for each of those? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I played around with this but decided to scrap it because:
side note: I will be pushing up some changes soon to cover grabbing a useful message from a wider array of the error structures but for now going to just identify events based on the first error message for simplicity. Will create an issue for incremental improvement on this to handle multiple exceptions. |
||
} | ||
if (errorLine.contains("Database Error in model")) { | ||
nextLine = true; | ||
} | ||
} | ||
|
||
if (!"".equals(usefulError)) { | ||
final SentryException usefulException = new SentryException(); | ||
usefulException.setValue(usefulError); | ||
usefulException.setType("DbtDatabaseError"); | ||
pedroslopez marked this conversation as resolved.
Show resolved
Hide resolved
|
||
sentryExceptions.add(usefulException); | ||
} | ||
|
||
if (sentryExceptions.isEmpty()) | ||
return Optional.empty(); | ||
|
||
return Optional.of(sentryExceptions); | ||
} | ||
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,7 @@ public class DefaultNormalizationRunner implements NormalizationRunner { | |
private final String normalizationImageName; | ||
private final NormalizationAirbyteStreamFactory streamFactory = new NormalizationAirbyteStreamFactory(CONTAINER_LOG_MDC_BUILDER); | ||
private Map<Type, List<AirbyteMessage>> airbyteMessagesByType; | ||
private String dbtErrorStack; | ||
|
||
private Process process = null; | ||
|
||
|
@@ -154,7 +155,7 @@ private boolean runProcess(final String jobId, | |
.collect(Collectors.groupingBy(AirbyteMessage::getType)); | ||
|
||
// picks up error logs from dbt | ||
String dbtErrorStack = String.join("\n\t", streamFactory.getDbtErrors()); | ||
dbtErrorStack = String.join("\n", streamFactory.getDbtErrors()); | ||
|
||
if (!"".equals(dbtErrorStack)) { | ||
AirbyteMessage dbtTraceMessage = new AirbyteMessage() | ||
|
@@ -165,8 +166,11 @@ private boolean runProcess(final String jobId, | |
.withError(new AirbyteErrorTraceMessage() | ||
.withFailureType(FailureType.SYSTEM_ERROR) // TODO: decide on best FailureType for this | ||
.withMessage("Normalization failed during the dbt run. This may indicate a problem with the data itself.") | ||
.withInternalMessage(dbtErrorStack) | ||
.withStackTrace(dbtErrorStack))); | ||
.withInternalMessage(buildInternalErrorMessageFromDbtStackTrace()) | ||
// due to the lack of consistent defining features in dbt errors we're injecting a breadcrumb to the | ||
// stacktrace so we can confidently identify all dbt errors when parsing and sending to Sentry | ||
// see dbt error examples: https://docs.getdbt.com/guides/legacy/debugging-errors for more context | ||
.withStackTrace("AirbyteDbtError: \n".concat(dbtErrorStack)))); | ||
|
||
airbyteMessagesByType.putIfAbsent(Type.TRACE, List.of(dbtTraceMessage)); | ||
} | ||
|
@@ -206,6 +210,24 @@ public Stream<AirbyteTraceMessage> getTraceMessages() { | |
return Stream.empty(); | ||
} | ||
|
||
private String buildInternalErrorMessageFromDbtStackTrace() { | ||
// Most dbt errors we see in Airbyte are `Database Errors` | ||
// The relevant error message is often the line following the "Database Error..." line | ||
// e.g. "Column 10 in UNION ALL has incompatible types: DATETIME, TIMESTAMP" | ||
boolean nextLine = false; | ||
for (String errorLine : streamFactory.getDbtErrors()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there any way to DRY this with the method in |
||
// previous line was "Database Error..." so this is our useful message line | ||
if (nextLine) { | ||
return errorLine; | ||
} | ||
if (errorLine.contains("Database Error in model")) { | ||
nextLine = true; | ||
} | ||
} | ||
// Not all errors are Database Errors, for other types, we just return the stacktrace | ||
return dbtErrorStack; | ||
} | ||
|
||
@VisibleForTesting | ||
DestinationType getDestinationType() { | ||
return destinationType; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,10 +93,9 @@ private Stream<AirbyteMessage> filterOutAndHandleNonAirbyteMessageLines(JsonNode | |
try (final var mdcScope = containerLogMdcBuilder.build()) { | ||
switch (logLevel) { | ||
case "debug" -> logger.debug(logMsg); | ||
case "info" -> logger.info(logMsg); | ||
case "warn" -> logger.warn(logMsg); | ||
case "error" -> logAndCollectErrorMessage(logMsg); | ||
default -> logger.info(jsonLine.asText()); // this shouldn't happen but logging it to avoid hiding unexpected lines. | ||
default -> logger.info(logMsg); // this shouldn't happen but logging it to avoid hiding unexpected lines. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this seems like it would render jsonLines differently than the current impl -- what's the upside to doing it this way? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did have some wider changes in this part and missed changing this bit back, reverted. |
||
} | ||
} | ||
} catch (final Exception e) { | ||
|
@@ -106,9 +105,9 @@ private Stream<AirbyteMessage> filterOutAndHandleNonAirbyteMessageLines(JsonNode | |
return m.stream(); | ||
} | ||
|
||
private void logAndCollectErrorMessage(String logMessage) { | ||
logger.error(logMessage); | ||
dbtErrors.add(logMessage); | ||
private void logAndCollectErrorMessage(String logMsg) { | ||
logger.error(logMsg); | ||
dbtErrors.add(logMsg); | ||
} | ||
|
||
public List<String> getDbtErrors() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how common is it that the line immediately following
Database error
is the "interesting" one? e.g: is it 100% of the time based on a reasonable empirical sample size?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was going for a classic 80-20 coverage move here, using the following data:
all normalization-system-errors (with the new failure reasons)
1,110 rows
Filtering for 'database errors'
736 rows
Nominal 736/1110 (66%)
Unique 368/442 (83%)
694/736 (~94%) follow the pattern of line immediately following "Database Error in model" is the single useful part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some futher investigation today looking deeper at the complete set of failures we've encountered (coming from those metabase links above) and decided it's not significant extra work to cover a lot of the possible error structures based on what we've seen so far:
The other 42 errors follow 1 of 3 patterns:
Currently implementing specific logic for these three extra edge cases so we cover all currently known database errors. (No doubt there are other edge cases that exist and we haven't encountered but we can build upon this parsing code over time.)
Also implementing logic for the rest of the error types (the other ~20% of unique errors we've seen):
Basing the logic from our dataset in Metabase (in previous comment links), now covering >95% of every dbt error we've seen since implementing normalization failure reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes will also solve this point