-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Viv3ckj/refactor code (part 1) #86
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much clearer now and I think is a great start to making this code more efficient, easier to read and reausable!
I left a couple ideas that might help you find better names for some of the constants and functions.
analysis/pf_variables_library.py
Outdated
def get_events_between(event_frame, start_date, end_date): | ||
selected_events = event_frame.where( | ||
event_frame.date.is_on_or_between(start_date, end_date) | ||
) | ||
return selected_events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be easier to just use event_frame.date.is_on_or_between(start_date, end_date)
? Looks like the function is just replicating that behaviour or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to use this function to make variables like this a bit shorter:
pharmacy_first_events = get_events_between(clinical_events, INTERVAL.start_date, INTERVAL.end_date).where(
clinical_events.snomedct_code.is_in(pharmacy_first_consultation_codelist)
)
selected_medications = get_events_between(medications, INTERVAL.start_date, INTERVAL.end_date).where(
medications.consultation_id.is_in(pharmacy_first_ids)
)
What do you think? I should remove this still?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe that explicitly using .date.is_on_or_between()
to select clinical_events
and medications
useful, primariliarly because we'd be adding one more function that is a wrapper of a slightly longer function that already exists in ehrQL, but adds no additional functionality. Without the new function the code is still relatively short and very explicit.
Without get_events_between
function:
-
analysis/measures_definition_pf_breakdown.py
selected_events = clinical_events.where( clinical_events.date.is_on_or_between(INTERVAL.start_date, INTERVAL.end_date) ).where(clinical_events.consultation_id.is_in(pharmacy_first_ids))
With get_events_between
function:
-
analysis/pf_variables_library.py
def get_events_between(event_frame, start_date, end_date): selected_events = event_frame.where( event_frame.date.is_on_or_between(start_date, end_date) ) return selected_events
-
and
analysis/measures_definition_pf_breakdown.py
selected_events = get_events_between(clinical_events, INTERVAL.start_date, INTERVAL.end_date).where( clinical_events.consultation_id.is_in(pharmacy_first_ids) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I like the idea of wrapping up most of your select events operations up in functions, maybe a wrapper function for all sort of select operations would be an idea:
def select_events(event_frame, codelist=None, consultation_ids=None, start_date=None, end_date=None):
"""
Wrapper function to select events based on codelist, consultation IDs, or a date range.
Allows combining multiple selection criteria.
"""
selected_events = event_frame
if codelist is not None:
selected_events = select_events_from_codelist(selected_events, codelist)
if consultation_ids is not None:
selected_events = select_events_by_consultation_id(selected_events, consultation_ids)
if start_date is not None and end_date is not None:
selected_events = select_events_between(selected_events, start_date, end_date)
return selected_events
This wrapper function depends on our individual seelct functions:
# Function to get events linked to a specified codelist
def select_events_from_codelist(event_frame, codelist):
selected_events = event_frame.where(
event_frame.snomedct_code.is_in(codelist)
)
return selected_events
# Function to get events with specific consultation IDs
def select_events_by_consultation_id(event_frame, consultation_ids):
selected_events = event_frame.where(
event_frame.consultation_id.is_in(consultation_ids)
)
return selected_events
# Function to get events within a time frame
def select_events_between(event_frame, start_date, end_date):
selected_events = event_frame.where(
event_frame.date.is_on_or_between(start_date, end_date)
)
return selected_events
Do you think this would be an alternative?
First commit: Centralised all codelists and codes into codelists.py
Second commit: Added a config file which contains all the start dates and monthly intervals used. Also added a few helper functions to the pf_variables_library file to remove repeated variables e.g. selected events, pharmacy first ids etc.
Currently, the old ehrQL is commented out to help when reviewing the code to see if the functions do what they're supposed to. Will remove this after approval.
Closes #78
Closes #81