Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge dataemon and a part of cht-pipeline with cht-sync #74

Closed
witash opened this issue Feb 23, 2024 · 5 comments
Closed

Merge dataemon and a part of cht-pipeline with cht-sync #74

witash opened this issue Feb 23, 2024 · 5 comments
Assignees
Labels
Type: Improvement Make something better

Comments

@witash
Copy link
Contributor

witash commented Feb 23, 2024

There are 3 repositories that currently depend on each other
cht-sync->cht-pipeline
cht-sync->dataemon
dataemon->cht-pipeline

I wonder if for deployment, development and maintenance it wouldn't be easier to maintain one medium sized repository instead of three small ones

  1. things like environment variables and dbt profiles don't have a single place that they belong, and are split across the three projects (related issues feat(#72): Replace hardcoded values with env variables #73 Add profile.yml for production #39 )
  2. for deployment and release, having three projects that depend on each other means three different versions and branches that all have to be able to talk to each other. A change in one project can break the other two.
  3. cht-sync end to end tests are not really end to end, they are just checking that there is data in the supplied database, which may have gotten there some other way (like from running a dev version of the project). For testing, ideally we would be able to do a true end to end test, where data is pulled from couchdb and goes through the whole pipeline to the correct postgres tables.
  4. the README for pipeline does not have instructions on how to run it standalone, and instead refers to running cht-sync.

If the intention is to have a repo for different instances to put their different dbt models in different branches, could this repository by limited to just the models? And have the configuration and running of dbt be defined in a single repository?

@witash witash added the Question Further information is requested label Feb 23, 2024
@mrjones-plip
Copy link

@garethbowen - any thoughts here? I suspect @njuguna-n and @witash know best given how close they are to the code, but I know that you maybe had a hand in setting all these repos up when we we first dreamed up CHT Sync so wanted to check in.

@witash
Copy link
Contributor Author

witash commented Feb 26, 2024

draft MR here medic/cht-sync#74

@garethbowen
Copy link
Contributor

I definitely agree with merging dataemon and cht-pipeline.

cht-sync could remain separate because there is no real dependency between cht-sync and cht-pipeline as far as I'm aware. It's possible for example that you might run cht-sync and cht-couch2pg. However even if the two were in the same repo it would be possible to deploy them separately so it's not a blocker.

If this will make development, documentation, communication, and/or usage easier then I say go for it.

@witash
Copy link
Contributor Author

witash commented Mar 18, 2024

After discussion with the team, will go ahead with merging part of cht-pipeline and all of dataemon to cht-sync.

DBT models that are expected to be different for different partners will remain in this repository as different branches, but the dockerfiles, scripts, and root models that are needed to run dbt and which should not change across partners will be moved to cht-sync

dbt will still be able to run independently of the other parts of cht-sync in case it's necessary to run dbt with couch2pg

@witash witash assigned witash and unassigned njuguna-n Mar 19, 2024
@witash witash changed the title merge with cht-sync? merge with cht-sync Mar 19, 2024
@andrablaj andrablaj added Type: Improvement Make something better and removed Question Further information is requested labels Mar 19, 2024
@andrablaj andrablaj changed the title merge with cht-sync Merge dataemon and a part of cht-pipeline with cht-sync Mar 19, 2024
@witash witash moved this from In Progress to Done in Product Team Activities Apr 2, 2024
@andrablaj
Copy link
Member

Closing this ticket as done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Improvement Make something better
Projects
No open projects
Archived in project
Development

No branches or pull requests

5 participants