-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add datadog tracking to record schema validation errors #13393
Conversation
9188dc8
to
c6a8f55
Compare
df29a52
to
1e0bd00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, main one about refactoring how the datadog-specific inputs are passed into this class to be more explicit, encapsulated, and less prone to NullPointerExceptions. Happy to discuss more over zoom/slack if you want!
NUM_RECORD_SCHEMA_VALIDATION_ERRORS(MetricEmittingApps.WORKER, | ||
"record_schema_validation_error", | ||
"number of record schema validation errors"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't actually be tracking the number of errors for a given stream, right? Because we are just emitting 1 for each stream that had any errors, and we are capping the number of validations we perform for a stream once we reach a certain number?
Maybe this should be called something like SOURCE_STREAMS_WITH_RECORD_SCHEMA_VALIDATION_ERRORS
? Open to other suggestions on naming here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I was trying to use NUM because it is a count metric, but maybe NUM_SOURCE_STREAMS_WITH_RECORD_SCHEMA_VALIDATION_ERRORS
public DefaultReplicationWorker(final String jobId, | ||
final int attempt, | ||
final AirbyteSource source, | ||
final AirbyteMapper mapper, | ||
final AirbyteDestination destination, | ||
final MessageTracker messageTracker, | ||
final RecordSchemaValidator recordSchemaValidator) { | ||
this(jobId, attempt, source, mapper, destination, messageTracker, recordSchemaValidator, null, null); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor means just means that the non-container-orchestrator deployments will be setting those values to null
, right? I think that is okay, since we care more about tracking issues in cloud which only uses container orchestrators. But just wanted to confirm my understanding
@@ -354,8 +377,18 @@ private static Runnable getReplicationRunnable(final AirbyteSource source, | |||
} | |||
LOGGER.info("Total records read: {} ({})", recordsRead, FileUtils.byteCountToDisplaySize(messageTracker.getTotalBytesEmitted())); | |||
if (!validationErrors.isEmpty()) { | |||
DogStatsDMetricSingleton.initialize(MetricEmittingApps.WORKER, new DatadogClientConfiguration(configs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing the entire Configs
object into this class just to be used here to create the DatadogClientConfiguration
object, I think it would be better to just pass that DatadogClientConfiguration
object into this class directly. That way we aren't passing in more than we need.
In fact, I think it would be better if we created a new class to contain all of the logic that you added here. Something like a DatadogSchemaValidationCounter
class, with a constructor that takes in the DatadogClientConfiguration
object and the sourceDockerImage
string (and maybe the constructor can split out the repo and version from the image string into separate class variables), and has a method that called something like track()
which contains all of the logic to initialize the DogStatsDMetricSingleton, create the validationErrorMetadata, and call the DogStatsDMetricSingleton.count() method.
Then, I think this class should be wrapped in an Optional<>
when passed into this DefaultReplicationWorker
class, to make it explicit that it is not always set. It should only be set to something in the ReplicationJobOrchestrator class if the DD_AGENT_HOST
and DD_DOGSTATSD_PORT
env vars are not empty, so that none of this is attempted for Kube OSS users that have not configured datadog. And the track()
method on that class (or whatever you choose to call it) should only be called if the Optional is not empty.
The reason for all of this is that it encapsulates the logic related to datadog tracking into a separate class rather than further complicating the logic in this getReplicationRunnable()
method. Also, without an approach like the above I think you would need to add more null checks to the logic you've added here, otherwise I believe this will break when someone hasn't set those DD_*
variables, or if someone deploys this on a non-container-orchestrator deployment.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes that makes sense! when you say the class should be wrapped in Optional<> when it's passed in - is that instead of having multiple constructors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think just having a single constructor would be cleaner
@lmossman I updated this (obvs not a priority to look at right now). I made a separate class for datadog metrics, but made it more generic than you were suggesting - I was thinking we might want to track other kinds of metrics in the replication worker and could use that class for other metrics as well. let me know what you think. |
79d4fc7
to
1e08c84
Compare
f7a5e50
to
b1d70a5
Compare
51f8fea
to
bf0a059
Compare
bf0a059
to
164928f
Compare
164928f
to
0d9e30b
Compare
0d9e30b
to
55b8de7
Compare
5abb899
to
915fc4a
Compare
@@ -55,6 +57,7 @@ | |||
@Slf4j | |||
public class ContainerOrchestratorApp { | |||
|
|||
private static final Logger LOGGER = LoggerFactory.getLogger(ContainerOrchestratorApp.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need that. There is a logger available inthe log
variable
System.setProperty(envVar, envMap.get(envVar)); | ||
final String getEnvResult = System.getenv(envVar); | ||
LOGGER.info("getting env " + envVar); | ||
LOGGER.info(getEnvResult); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it for debug only? This will need to be removed, we may have some credential in the env variables.
|
||
@Slf4j | ||
public class ReplicationJobOrchestrator implements JobOrchestrator<StandardSyncInput> { | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(ReplicationJobOrchestrator.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have a logger present in the log
var
LOGGER.info("metric client in async orchestrator pod process"); | ||
LOGGER.info(metricClient); | ||
|
||
envVars.add(new EnvVar(EnvConfigs.METRIC_CLIENT, metricClient, null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need the Datadog related env vars here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so but apparently not since this is working without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'working' as in, the metric client is showing up as a properly initialized datadog metric client
Tested this on dev and it's working now, env variables are all correctly being passed to the worker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dismissing bc comments were addressed
Add Datadog tracking for record schema validation errors