Import sample data through API #2690

iamleeg · 2022-04-29T12:40:58Z

this means that the sample data is correctly age-bucketed.

data-serving/scripts/setup-db/import-sample-data.py

codecov-commenter · 2022-05-03T09:25:05Z

Codecov Report

Merging #2690 (079268a) into main (d06706d) will decrease coverage by 23.55%.
The diff coverage is 75.00%.

@@             Coverage Diff             @@
##             main    #2690       +/-   ##
===========================================
- Coverage   89.28%   65.72%   -23.56%     
===========================================
  Files          35      146      +111     
  Lines        1241     5628     +4387     
  Branches      289     1501     +1212     
===========================================
+ Hits         1108     3699     +2591     
- Misses        133     1929     +1796

Impacted Files	Coverage Δ
data-serving/data-service/src/model/age-bucket.ts	`100.00% <ø> (ø)`
...c/components/new-case-form-fields/Demographics.tsx	`72.22% <ø> (ø)`
...ion/curator-service/api/src/controllers/geocode.ts	`77.08% <50.00%> (ø)`
data-serving/data-service/src/controllers/case.ts	`82.32% <58.33%> (-0.81%)`	⬇️
...cation/curator-service/api/src/controllers/auth.ts	`46.37% <100.00%> (ø)`
...urator-service/api/src/clients/aws-batch-client.ts	`95.83% <0.00%> (ø)`
...tion/curator-service/ui/src/components/Profile.tsx	`88.29% <0.00%> (ø)`
...ation/curator-service/ui/src/hooks/useInterval.tsx	`70.00% <0.00%> (ø)`
...ication/curator-service/ui/src/redux/auth/thunk.ts	`50.00% <0.00%> (ø)`
... and 107 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d06706d...079268a. Read the comment docs.

…2670

… migration #2670

iamleeg · 2022-05-05T10:25:47Z

There is one remaining Cypress test failure, I have spent over a day diagnosing it and am no closer, but it is in behaviour we don't use so I am arguing for changing the test expectation to match the new behaviour, merging this PR, then making a new issue to remove the unused functionality.

Failure analysis

The test does manual CSV upload, then manual CSV upload of the same cases again but with the gender changed in one of two cases. It checks that only one case has been updated (the one with changed gender), but on this branch the test fails because it reports that both tests have been updated.

In fact, they have. The unmodified case has the same values that it did before, but gets attached to the new upload and gets new revision metadata. So there's no visible change, except that the case looks like it got updated (and the caseRevisions collection gets bigger).

As far as I can tell, the decision on whether a document needs updating or not is internal to mongodb's batchWrite function. The commit that introduced this failure is d1a192d12f038f9fdfcf70510bbc39814c43d246, which doesn't change the upsert logic, it introduces the use of bucketed age ranges when saving cases.

Why I think we should ignore this failure

Firstly, we don't use manual CSV uploads any more, so we won't really observe any issues with incorrect revision metadata. Secondly, we don't use the revision metadata at all, and even in our plans to have frozen API requests we will not use it, so there is no incorrect behaviour even where revisions are recorded for unchanged cases.

So I'm actually suggesting not only ignoring this failure for now and merging the PR, but making another issue to completely remove the document revision pattern at a future date. This will greatly increase write performance (i.e. ingestion) without compromising the existing or planned snapshot behaviour (i.e. frozen requests, and daily downloads).

What do you think?

jim-sheldon · 2022-05-05T14:20:23Z

Regarding your comment above, if we don't need use manual CSV uploads or revision metadata, then removing the document revision pattern makes sense.

abhidg · 2022-05-05T15:19:08Z

Agreed with @jim-sheldon, we should remove unused functionality. Document revisions could be useful for partner instances, but that’s a if, and the timeline for getting there is unknown, so we shouldn’t keep this around just in case.

iamleeg marked this pull request as ready for review April 29, 2022 14:45

iamleeg requested review from abhidg and jim-sheldon April 29, 2022 14:45