Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparently handle dot character in ICD-10/OPCS-4 codes #2333

Open
evansd opened this issue Dec 20, 2024 · 0 comments
Open

Transparently handle dot character in ICD-10/OPCS-4 codes #2333

evansd opened this issue Dec 20, 2024 · 0 comments
Labels
ergonomics Making ehrQL easier to use and read without adding substantial new features

Comments

@evansd
Copy link
Contributor

evansd commented Dec 20, 2024

Canonically ICD-10 and OPCS-4 codes are written with a dot between the 3rd and 4th characters e.g. A01.1.

However, in the data we currently have these dots are omitted and the equivalent code is written A011. (For example, see the apcs.all_procedures field.)

Our syntactic validation for these codes currently requires them in dotless format:

ehrql/ehrql/codes.py

Lines 91 to 111 in 784d011

class ICD10Code(BaseCode):
"ICD-10"
regex = re.compile(r"[A-Z][0-9]{2,3}")
class OPCS4Code(BaseCode):
"OPCS-4"
# The documented structure requires three digits, and a dot between the 2nd and 3rd
# digit, but the codes we have in OpenCodelists omit the dot and sometimes have only
# two digits.
# https://en.wikipedia.org/wiki/OPCS-4#Code_structure
regex = re.compile(
r"""
# Uppercase letter excluding I
[ABCDEFGHJKLMNOPQRSTUVWXYZ]
[0-9]{2,3}
""",
re.VERBOSE,
)

But it would be nicer if we accepted strings either with or without the dot and converted them to the dotless format at the point we cast them to a code type.

If ever we end up having data with the codes in the dotted format then we can make a new type which does the reverse (i.e. converts dotless to dotted). This would allow us to use the existing codelists with the new field without having to worry about arbitrary syntactic variation.

Slack thread:
https://bennettoxford.slack.com/archives/C069YDR4NCA/p1734627474856169

@evansd evansd added the ergonomics Making ehrQL easier to use and read without adding substantial new features label Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ergonomics Making ehrQL easier to use and read without adding substantial new features
Projects
None yet
Development

No branches or pull requests

1 participant