Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing error for citations with defendant 'Thompson' #174

Open
ERosendo opened this issue Mar 28, 2024 · 2 comments
Open

Parsing error for citations with defendant 'Thompson' #174

ERosendo opened this issue Mar 28, 2024 · 2 comments

Comments

@ERosendo
Copy link
Contributor

ERosendo commented Mar 28, 2024

In issue #3924, we identified a bug in Eyecite's parsing method when the defendant's last name is 'Thompson'.

For example, for the citation 'Shapiro v. Thompson, 394 U. S. 618':

  • Expected output: volume: 394, reporter: 'U.S.', page: '618'
  • Actual output: volume: None, reporter: 'Thompson', page: '394'

Other examples of inputs that are incorrectly parsed are: Adams v. Thompson, 560 F. Supp. 894 and Mozena v. Thompson, 44 A.2d 276.

I've been using the first example to debug this issue, and noticed that Eyecite identifies two tokens within the input string: "Thompson's Unreported Cases (TN)" and "United States Supreme Court Reports.". The problem arises because these tokens overlap (both include "394") and Eyecite's tokenize method prioritizes the rightmost token when encountering overlaps, leading to this results.

@ERosendo ERosendo changed the title parsing error for citations with defendant 'Thompson' Parsing error for citations with defendant 'Thompson' Mar 28, 2024
@mlissner
Copy link
Member

Any idea how easy this is to solve so that it identifies each?

@mlissner
Copy link
Member

mlissner commented Apr 1, 2024

Per discussion today, seems to be happening when citations appear to overlap. The simple solution here is to find both citations that overlap and then filter out the one that's incomplete.

@mlissner mlissner moved this to Main Backlog in @erosendo's backlog Apr 1, 2024
@ERosendo ERosendo moved this from Main Backlog to Bots Backlog in @erosendo's backlog Apr 15, 2024
@flooie flooie moved this to General Backlog in Case Law Sprint Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Bots Backlog
Status: General Backlog
Status: No status
Development

No branches or pull requests

2 participants