[4/4] Rétrofit des helpers pg / s3 / macros etc #173

vperron · 2023-12-11T14:03:39Z

Tu mets le petit doigt dedans, ça t'attrape le bras.

Je rétrofit les helpers introduits dans 2/4, dans les autres DAGs et modèles.

Au passage j'ai vu passer deux ou trois autres trucs mais c'est juste du code réécrit autrement pour etre plus DRY.

vmttn

👍

pipeline/dags/import_admin_express.py

pipeline/dags/import_brevo.py

pipeline/dags/import_insee_code_officiel_geographique.py

pipeline/dags/import_insee_firstnames.py

pipeline/dags/import_odspep.py

pipeline/dags/import_sources.py

vmttn · 2023-12-12T13:58:47Z

pipeline/dags/import_sources.py

@@ -292,7 +226,7 @@ def _load(
                extract = python.ExternalPythonOperator(


Pour éviter de se poser trop de questions, est-ce qu'on peut garder synchro :

le nom de la task python.ExternalPythonOperator

le nom de la fonction passée à python_callable

l'id de la task

oui et "non"... J'ai essayé mais en réalité le souci c'est que je ne peux pas les garder EXACTEMENT synchro (conflit de nommage entre le nom de la méthode et la task sinon)

Je dois donc faire comme toi et mettre "au moins un underscore".

Sauf que en faisant ça, je trouve que entre avoir:

load pour le nom / ID de la task et _load pour le callable

load pour le nom / ID de la task et load_s3_to_datawarehouse

load_s3_to_datawarehouse pour le nom / ID de la task et _load_s3_to_datawarehouse

... l'option 2 reste la plus jolie / concise dans l'interface Airflow et le code du DAG, tout en restant plus explicite sur le nom du callable que l'option 1 (et en commençant par le meme identifiant)

L'option 3 me semble polluer le nom des tasks Airflow et alourdir le code du DAG inutilement, on est dejà explicite quelque part.

vperron · 2023-12-13T16:30:04Z

@vmttn si ça te va je suis OK pour merger ça j'ai implémenté les corrections demandées (sauf une, j'ai expliqué pourquoi)

'pass' cannot be used, it's 'password'. I don't entirely understand how it could work so far.

TODOS: - rewrite settings.py to use a DataSource(class) defining HTTP extractors and loaders, streams and schedule intervals, etc. - split the DAG. - place mediation numerique somewhere else About the tests: - split the tests that should be run on CI and the others. - if we want to test the DAGs we want: * a specific test database just for the testing moment * some cleanup before and after * etc etc ==> "just running" pytest without orchestration has zero chance to work, so we should split the tests run on CI and the others. Don't forget to re-run the DAGs before and after the changes and check if the sources or datawarehouses have changed...

Some of the DAGs (for instance all the INSEE ones) could maybe be combined ? I am a little surprised that for those ones we don't have the datalake layer. I agree those are mostly for "sources" in the DI sense but technically, it would be nice to assume that we can reproduce everything from a single S3 dump. Maybe that's too much ? I still think I would sleep better at night if the Extract + Load was always the same. In which case we could separate the subfolders though. - di-sources - seeds - you name it

In the name of the DRY.

https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#deprecation-of-schedule-interval-and-timetable-arguments-25410

vperron had a problem deploying to staging December 11, 2023 14:08 — with GitHub Actions Failure

vperron force-pushed the vperron/retrofit-helpers branch from 5e2ff5a to faede54 Compare December 11, 2023 17:23

vperron had a problem deploying to staging December 11, 2023 17:25 — with GitHub Actions Failure

vmttn reviewed Dec 12, 2023

View reviewed changes

vperron force-pushed the vperron/retrofit-helpers branch from faede54 to ce0b06a Compare December 13, 2023 16:22

vperron had a problem deploying to staging December 13, 2023 16:25 — with GitHub Actions Failure

vperron marked this pull request as ready for review December 13, 2023 16:30

vperron added 7 commits December 14, 2023 10:26

fix(bug) : Typo in profiles.yml

3fa7dfb

'pass' cannot be used, it's 'password'. I don't entirely understand how it could work so far.

chore(dags) : Use a central TIME_ZONE helper

df0087f

chore(dbt) : Retrofit the stg_source_header macro

69df199

In the name of the DRY.

chore(dags) : Fix warning related to schedule_interval

d6cf8a4

https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html#deprecation-of-schedule-interval-and-timetable-arguments-25410

chore(dags) : Namespace TIME_ZONE to date.TIME_ZONE

b394b01

vperron force-pushed the vperron/retrofit-helpers branch from ce0b06a to b394b01 Compare December 14, 2023 09:26

vperron temporarily deployed to staging December 14, 2023 09:31 — with GitHub Actions Inactive

vperron merged commit 54fc160 into main Dec 14, 2023
6 checks passed

vperron deleted the vperron/retrofit-helpers branch December 14, 2023 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[4/4] Rétrofit des helpers pg / s3 / macros etc #173

[4/4] Rétrofit des helpers pg / s3 / macros etc #173

vperron commented Dec 11, 2023

vmttn left a comment

vmttn Dec 12, 2023

vperron Dec 13, 2023 •

edited

Loading

vperron commented Dec 13, 2023

		@@ -292,7 +226,7 @@ def _load(
		extract = python.ExternalPythonOperator(

[4/4] Rétrofit des helpers pg / s3 / macros etc #173

[4/4] Rétrofit des helpers pg / s3 / macros etc #173

Conversation

vperron commented Dec 11, 2023

vmttn left a comment

Choose a reason for hiding this comment

vmttn Dec 12, 2023

Choose a reason for hiding this comment

vperron Dec 13, 2023 • edited Loading

Choose a reason for hiding this comment

vperron commented Dec 13, 2023

vperron Dec 13, 2023 •

edited

Loading