Live waveform data in Emap #66

jeremyestein · 2024-10-08T14:36:44Z

I think it's time to create this PR now. I would be particularly interested to see @stefpiatek 's feedback.

I appreciate it's quite large, but @skeating has been reviewing it in chunks, going into the sk/waveform-dev branch from which we're now merging. So this is not the first time it's been seen by anyone except me!

waveform_hf_data.md, the main design document, is a good place to start reading.

Project board is at https://github.com/orgs/UCLH-DHCT/projects/3/views/1

is not needed as long as project.build.sourceEncoding property is set.

untested.

emap setup; fix docs

the waveform queue. Fixing this a proper way was too awkward.

get around Spring circular dependency problems.

before enabling multi-row INSERTs

inserts are merged into fewer multi-row inserts

sequence for every single row.

order of messages.

would be

allow more data to be put in during validation.

to allow for configurations where not all data sources are present.

out the numeric SQL data type a bit better.

Detect overlapping data

hl7 waveform when running validation. Some refactoring to make testing better.

handling.

Read HL7 dump from file and other misc fixes

`setup -i`

github-actions · 2024-10-08T14:37:00Z

PR checklist

Default guide for a PR (if multiple PRs for the work, only keep one version of it and link to it on the other PRs)

From the UCLH data science desktop, a validation run has been set off
load times
in UCL teams has been populated with the run information
During the run, glowroot has been checked for any queries which are taking a substantial proportion of the
total processing time. This can be useful to identify indexes that are required.
After the run, look for any unexpected errors in the etl_per_message_logging table, the error_search.sql file
on the shared drive can be used for this \\sharefs6\UCLH6\EMAP\Shared\EmapSqlScripts\devops\error_search.sql.
Create an issue if you find an unexpected exception and is not related to the changes you've made, otherwise
fix them!
After the run, populate the end time in
load times
Let Aasiyah know about the completed validation and give her information on the changes and where to start
with the validation
Check validation report and give any feedback to Aasiyah if there are any changes needed on her side,
iterate on getting the validation to match at least 99% (validation and emap code).

Setup script changes and misc bug fixes

stefpiatek

Certainly a lot there, hopefully I made sense of most of it. Some comments and questions but nothing blocking

docs/dev/features/waveform_hf_data.md

emap-setup/emap_runner/runner.py

core/docker-compose.yml

...in/java/uk/ac/ucl/rits/inform/datasinks/emapstar/controllers/VisitObservationController.java

emap-setup/emap_runner/validation/validation_runner.py

emap-setup/global-configuration-EXAMPLE.yaml

emap-star/emap-star/src/main/java/uk/ac/ucl/rits/inform/informdb/visit_recordings/Waveform.java

stefpiatek · 2024-10-23T10:01:05Z

waveform-reader/src/main/java/uk/ac/ucl/rits/inform/datasources/waveform/LocationMapping.java

+            // derived from real data
+            1, List.of(11, 12, 14, 15, 16),
+            2, List.of(17, 18, 19, 20, 21),
+            3, List.of(22, 23, 24, 25, 26),
+            4, List.of(27, 28, 29, 30, 31),
+            5, List.of(33, 34, 35, 36));


doesn't need to be fixed now but would be nice to have this as an input file rather than copiled in the code

I thought about it but couldn't really see the use case.

If you want it as completely external data (ie. can change it with no docker rebuild, an env var points to an external file or something), then I'd argue this data is still important enough that you'd need some form of version control for it (eg. another git repo), which adds a dependency and thus complexity and moves away from our monorepo benefits.

If you just mean moving it to a CSV but keeping it in the repo, then it's still going to need a docker rebuild when it gets changed, but by using CSV you've lost the benefits of type/syntax checking that the java compiler provides. Perhaps if we had a dedicated config directory containing CSVs with this sort of data, we could exclude this from the docker build but mount it into the container instead. But it would only save us a rebuild as often as this data changes, which isn't very often.

stefpiatek · 2024-10-23T10:04:51Z

...orm-reader/src/main/java/uk/ac/ucl/rits/inform/datasources/waveform/hl7parse/Hl7Message.java

+ * it seems to copy data to structures that aren't what I want as final output (Varies[]),
+ * whereas this parser doesn't attempt to process the contents of any fields, allowing
+ * the calling code to do as it wishes.
+ * It's about 100-1000x faster.


ha fun, did you try and other HL7 parsers btw?

None other than HAPI

processor no longer uses batching.

… for GAE

jeremyestein added 30 commits June 13, 2024 18:59

First pass at adding waveform data to Emap

be8944d

Write test which checks some of the data and passes

7911d23

checkstyle fixes and make test quicker

50d34f8

Merge branch 'develop' into jeremy/hf-data

36f7a84

Clarify validation run docs

4aff3a9

In new version of checkstyle, the encoding tag is no longer valid, and

5407881

is not needed as long as project.build.sourceEncoding property is set.

Remove duplicated file set_mvn_proxy.sh

a6e11d9

Bare bones waveform-reader service. Docker image builds, but is

39a1fa7

untested.

Initialise the fake UDS properly; add fake UDS and waveform reader to

cf27fe5

emap setup; fix docs

Temporarily break the core proc RabbitMQ listener by only listening to

ff7db6f

the waveform queue. Fixing this a proper way was too awkward.

Make data more realistic in terms of volume. Had to add some layers to

d1c1c9c

get around Spring circular dependency problems.

Make message IDs more descriptive to enable better performance tracking

b7f10f2

Ten seconds of data per message

a3db3fc

call saveAll rather than save many times

a16adbf

Batch single-row INSERTs

a209f90

Couple of other tweaks, not sure what it'll do, but need a baseline

9615106

before enabling multi-row INSERTs

Add reWriteBatchedInserts option to db connection string so that batched

e9bff1f

inserts are merged into fewer multi-row inserts

Does this help?

3193369

Expicitly specify sequence Id generation so we don't have to query the

46ff9a4

sequence for every single row.

Can't use existing sequence as it has a different increment value

5361af2

Use SQL arrays (of double precision DB type)

e2635c8

Fix copy constructor

7b9b01a

Copy constructor checker didn't know about Double[]

182d2c2

Put data from multiple patients to test it doesn't get mixed up. Shuffle

5558e48

order of messages.

Make observation time in synthetic data evenly spread out, as real data

49d93d9

would be

Make number of patients, start date, and warp factor configurable to

cd5567f

allow more data to be put in during validation.

Add indexes to Waveform table

55e0c60

Cassandra is still running out of memory

73619d4

Make configurable the rabbitmq queues that core will listen on,

1c3cc09

to allow for configurations where not all data sources are present.

Make the synthetic data values use a variable amount of digits to test

52dbc3a

out the numeric SQL data type a bit better.

jeremyestein added 19 commits October 3, 2024 12:31

Merge pull request #59 from UCLH-DHCT/jeremy/hf-data-gaps

a7f769f

Detect overlapping data

Need to map the directory containing test data

dfed36c

Python setup script - allow independent switching of hoover, hl7 adt and

360fe56

hl7 waveform when running validation. Some refactoring to make testing better.

BooleanOptionalAction doesn't exist on python 3.8, do it ourselves

ecba32f

synth vs reader confusion

9902814

Don't parse values as numeric unless they are in fact numeric

e178b6d

Log, skip and continue if HL7 messages cause parsing errors

387f9d4

Logic for checking empty queues seems to be inverted

c9f49fd

microsecond/second mixup meant that timeout was never reached

68b8ad1

Handle missing segments without crashing

21c4259

String.substring needs to be in bounds

c5a2465

Translate more assorted parsing errors to Hl7ParseException for better

10474e0

handling.

Fix ends of files

1501016

Merge pull request #61 from UCLH-DHCT/jeremy/hf-data-misc

4ed7c1a

Read HL7 dump from file and other misc fixes

Don't use waveform generator in validation

4b96678

Reproduce issue #25

2d5872d

Fix issue #25 - don't delete existing repo without warning when running

e0d9c99

`setup -i`

Bring design doc more up to date. Still needs more work though.

fd9d987

Address review feedback

b7819da

jeremyestein requested a review from stefpiatek October 8, 2024 14:38

jeremyestein added 3 commits October 8, 2024 17:28

Add tests for recently changed code

498a495

Update docs and rename property for clarity

d2b0c8f

Merge pull request #64 from UCLH-DHCT/jeremy/hf-data-setup

4d87fac

Setup script changes and misc bug fixes

jeremyestein marked this pull request as ready for review October 9, 2024 15:29

stefpiatek approved these changes Oct 23, 2024

View reviewed changes

jeremyestein added 4 commits October 23, 2024 19:29

Address some review comments

c635859

Remove JPA options relating to insert/update batching, since waveform

fa0936b

processor no longer uses batching.

Make cassandra settings configurable with unchanged defaults suitable…

e4e6460

… for GAE

Make timeout configurable as per review feedback

7c4d9e3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Live waveform data in Emap #66

Live waveform data in Emap #66

jeremyestein commented Oct 8, 2024 •

edited

Loading

github-actions bot commented Oct 8, 2024

stefpiatek left a comment

stefpiatek Oct 23, 2024

jeremyestein Oct 23, 2024

stefpiatek Oct 23, 2024

jeremyestein Oct 23, 2024

Live waveform data in Emap #66

Are you sure you want to change the base?

Live waveform data in Emap #66

Conversation

jeremyestein commented Oct 8, 2024 • edited Loading

github-actions bot commented Oct 8, 2024

PR checklist

stefpiatek left a comment

Choose a reason for hiding this comment

stefpiatek Oct 23, 2024

Choose a reason for hiding this comment

jeremyestein Oct 23, 2024

Choose a reason for hiding this comment

stefpiatek Oct 23, 2024

Choose a reason for hiding this comment

jeremyestein Oct 23, 2024

Choose a reason for hiding this comment

jeremyestein commented Oct 8, 2024 •

edited

Loading