Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add actionable logging to patching runs #2009

Open
melange396 opened this issue Jul 30, 2024 · 4 comments
Open

Add actionable logging to patching runs #2009

melange396 opened this issue Jul 30, 2024 · 4 comments
Assignees
Labels
data quality Missing data, weird data, broken data devops enhancement future-solution Solutions to problems we don't have yet but still dread

Comments

@melange396
Copy link
Contributor

Patch runs are very similar to regular indicator runs, but have different reasons/purposes and theyre not run on a schedule. We should include information in our logs to signify when these runs are happening. This additional info can then be incorporated into monitoring and alerting systems to distinguish normal and patching activity, which will let us see that aberrations are due to patching.

The format of the logging additions is yet to be determined (New, additional log messages? New parameters on existing log messages? Both? Something else???), but it should be done in a way that is easily integrable into elastic and such.

@melange396 melange396 added data quality Missing data, weird data, broken data enhancement future-solution Solutions to problems we don't have yet but still dread devops labels Jul 30, 2024
@melange396
Copy link
Contributor Author

The "acquisition" step of patching runs can potentially be detected with the log message:
logger.info(event='processing csv files from issue'...
found at https://github.com/cmu-delphi/delphi-epidata/blob/8746ff2ef7a936bb93628bc1358471d7c6c4f5f8/src/acquisition/covidcast/csv_importer.py#L128

This works because patching runs need to put CSV files in a specific directory structure to specify the "issue" date for import (instead of the default "today"), and that is where that log message is emitted. This will likely only work until the following ticket is addressed, after which all indicators will supply acquisition with specific "issue" dates:

@minhkhul minhkhul self-assigned this Aug 12, 2024
@minhkhul
Copy link
Contributor

minhkhul commented Aug 13, 2024

Here's the plan to add just patch acquisition log to elastic:
Currently, our normal indicator acquisition jobs log out here: /var/log/epidata/csv_upload_{acq_ind_name}.log
Then that log content gets picked up by filebeat as configured here to be available on elastic stuff through this ingest pipeline.
Right now, patch acquisition is logged out here:
/var/log/filebeat-pickup/epidata.acquisition.covidcast.csv_to_database_batch-issue-upload-$(date -u +"%Y-%m-%dT%H_%M_%SZ").log
Therefore, all that has to be done to add patch acquisition log to elastic is change patch acquisition to log out at /var/log/epidata/csv_upload_patch.log in the Acquisition cronicle job, and rely on current processes to pick up the logs as usual.

To test this (and potentially other later stuff), I'm gonna set up patch acquisition log pickup to elastic on staging:

  • Uncomment this.
  • Adjust current dashboards that cares only for prod data to ignore staging.
    Then check how things goes on staging with some fake patch data and these jobs.

@melange396
Copy link
Contributor Author

if you want to be sure to keep things out of other dashboards for testing purposes, instead of just uncommenting the pipeline in the staging filebeat config, change its name to filebeat-epidata-pipeline-staging and create a matching ingest pipeline with a new target_field, like "epidata_data__test"

@minhkhul
Copy link
Contributor

minhkhul commented Sep 5, 2024

Switching patch logging to be output to /var/log/epidata/batch_issue_upload.log instead of to /var/log/filebeat-pickup/epidata.acquisition.covidcast.csv_to_database_batch-issue-upload-$(date -u +"%Y-%m-%dT%H_%M_%SZ").log. This is so patch acquisition logs could be processed under the same pipeline as normal acquisition logs on elastic, which makes it easier for patch acquisition info to be seen on dashboards.
Tested the change on staging and the logs showed up as expected on elastic, so applying this to prod.

next steps: address #1907

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data quality Missing data, weird data, broken data devops enhancement future-solution Solutions to problems we don't have yet but still dread
Projects
None yet
Development

No branches or pull requests

2 participants