-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Airflow ETL Pipeline DAG to incorporate feedback changes #4576
Comments
Next steps will be to replicate CG infrastructure in Staging. |
Results of load test. Date: 02.13.24 Logs: |
The load test was performed and Airflow handled the job as expected. As our conversation about our use of the tool has evolved, the team has decided to pivot away from using Airflow--at least in the interim--due to the high cost of support in terms of infrastructure cost as well as time to learn the platform, versus the advantages that it was expected to bring. In short, our use case (high throughput, minimal analysis) does not overlap as nicely with Airflow's strengths as we'd expected. |
User Story
In order to incorporate updates to the datagov-harvesting-logic API, and feedback from the most recent design sessions, changes need to be made to the Airflow ETL pipeline DAG in order to fully test a DCAT-US record end-to-end.
Related:
Acceptance Criteria
[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]
AND I have supplied a test JSON harvest source of N DCAT-US records that has generated a dynamic etl_pipeline DAG
WHEN I trigger a run of that DAG
THEN I expect the new ETL pipeline to process the source through the pipeline tasks and at completion to compile metrics from the tasks.
Background
[Any helpful contextual notes or links to artifacts/evidence, if needed]
Security Considerations (required)
[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]
Sketch
Reference
The text was updated successfully, but these errors were encountered: