-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet input to OS #186
Parquet input to OS #186
Conversation
clearer. Tests will still fail due to missing private parquet files.
To do
|
* Refactor message serialization and deserialization Addiing `Message` and `SerialisedMessage` classes in attempt to improve information hiding and decoupling. * Rename `utils.py` -> `message.py` * Add `decode()` method for `SerialisedMessage` * Update docstring * Use new classes in message testing * Refactor message processing in the CLI * Refactor `process_message` to use `SerialisedMessage` class in EHR API * Refactor `process_message` to use `SerialisedMessage` class in imaging API * Fix `ImagingStudy` initalisation in `ImagingStudy.from_message()` * Fix imports * Fix test: access serialised message bodies * Turn `Message` into a `dataclass` * Fix failing tests * Use `jsonpickle` for (de)serializing messages This also removes the need for the `SerialisedMessage` class * Fix `test_deserialise_datetime()` so it uses the `Message` class to assert the `study_datetime` * Add `study_datetime` property for `Message` * No need to test deserialising individual fields, already covered by `test_deserialise()` which deserialises the entire object * Remove `study_date_from_serialised()`, use the class attribute `study_datetime` instead * Revert "Add `study_datetime` property for `Message`" This reverts commit 8719153. * Remove `Messages` class, use `list[Message]` instead * Add type checking for messages parsed from parquet input * Update `test_messages_from_parquet()` to use JSON strings instead of bytes * Update `PixlProducer.publish()` to use a list of Message objects and handle serialisation * Convert JSON string to bytes when serialising * Revert "Update `test_messages_from_parquet()` to use JSON strings instead of bytes" This reverts commit 0e4fce4. * `PixlProducer.publish()` should take a `list[Message]` as input in tests * Update EHR API to use new `Message` design * Update imaging API to use new `Message` design * Update deserialise function to accept bytes-encoded JSON string * Assert messages against list of `Message`s * Print dataclass in logs * `jsonpickle.decode()` can handle bytes so no need to decode first Also add a note about why we ignore ruff rule S301 * Make `deserialisable` a keyword only argument * Copilot forgot to convert dates to datetimes 🥲 * Refactor PixlConsumer run method to accept Message object as callback parameter and deserialise * Update consumer in `test_subscriber` to accept Message object instead of bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 Looking good. Thanks for your work on this @ruaridhg @jeremyestein and @milanmlft
could probably slim down test files to only be the ones required? Might also be worth removing more of the public files in the test resources that I added too
Closes #159
Added:
csv_file
True/False flag withinpopulate
ofcli/src/pixl_cli/main.py
(tests pass for original functionality withcsv_file = True
incli/tests/test_queue_start_and_stop.py
)messages_from_parquet
function withincli/src/pixl_cli/main.py
function which follows code example from OMOP ES files as inputs to PIXL CLI #159 and works locally with directory structurepublic_dir = ~/resources/public
andprivate_dir = ~/resources/private
cli/tests/test_queue_start_and_stop_parquet.py
which follows similar structure as csv test, but fails with the following errors which could be because I can't call my local files within the testing (the resources within the test is missing the private directory) or some otherPath
error I'm not sure about