Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citation parser fails for statutes with letters in the section number #146

Open
jmesserschmidt1 opened this issue Mar 13, 2023 · 3 comments

Comments

@jmesserschmidt1
Copy link

U.S. code statutes with letters in them appear to be unrecognized. So "18 U.S.C. § 1028" and "18 U.S.C. § 1028(a)" are parsed, but "18 U.S.C. § 1028A" is not. I've tried some variations, but seems to be consistent.

@mlissner
Copy link
Member

Thanks for sending this along. I think this would be pretty easy to fix, but our code parsers aren't particularly advanced compared to our opinions parsers.

Do you want to take a stab at it?

@jmesserschmidt1
Copy link
Author

Thanks for sending this along. I think this would be pretty easy to fix, but our code parsers aren't particularly advanced compared to our opinions parsers.

Do you want to take a stab at it?

Sure. Not super familiar with the code, but suspect might need a variation on the law_section regex similar to the one that exists for page or volume, like here. This comes up with CFR cites as well (e.g., 17 CFR § 240.10b-5 is currently parsed as 17 CFR 240). So something like (?P<section>\\d+(?:[\\-.:]\\d+){,3})[a-zA-Z]{0,4}) and (?P<section>\\d+(?:[\\-.:]\\d+){,3})[a-zA-Z]{0,4})

@mlissner
Copy link
Member

I don't know that part of the code very well either, but if you want to do a PR with tests that fixes this, I think we'd probably merge it (and release a new version, if desired).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: CourtListener Backlog
Development

No branches or pull requests

2 participants