Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't set court='scotus' for South Carolina citations #84

Closed
mattdahl opened this issue Jun 26, 2021 · 9 comments
Closed

Don't set court='scotus' for South Carolina citations #84

mattdahl opened this issue Jun 26, 2021 · 9 comments
Assignees

Comments

@mattdahl
Copy link
Contributor

Eyecite thinks that South Carolina citations are SCOTUS citations:

from eyecite import get_citations
text = 'Lee County School Dist. No. 1 v. Gardner,  263 F.Supp. 26 (SC 1967)'
cites = get_citations(text)
cites[0].metadata.court

# prints 'scotus'

The SC in the year could be ambiguous, but the F.Supp. reporter should automatically rule SCOTUS out as a possibility for the court here.

@mattdahl mattdahl changed the title Don't set court='scotus South Carolina cit Don't set court='scotus' for South Carolina citations Jun 26, 2021
@devlux76
Copy link

You will never see SC in the year field of a Supreme Court decision (if the person who wrote it is citing things properly).
The court is only included when the relevant court is unclear from the reporter cited.
The Supreme court will always be cited to the U.S. or the S.Ct. reporters unless it's a slip opininon, so there should never be any ambiguity.

Bluebook R. 10.4(b) State courts.

In general, indicate the state and court of decision. However, do not include the name of the court if the court of decision is the highest court of the state.
The Bluebook: A Uniform System of Citation R. 10.4(b), at 106 (Columbia L. Rev. Ass’n et al. eds., 21st ed. 2020).*

@mlissner
Copy link
Member

Thanks @devlux76. This looks like a great first bug. Any interest in trying to tackle it with a test and a fix?

@mlissner
Copy link
Member

That'd be a question for @jcushman, but I suspect he wouldn't know anymore. At this point, it's worth just running with the example he gave. I'd make a test using it, make sure the test fails, then write the code to fix it.

@mattdahl
Copy link
Contributor Author

The one I encountered it in was the Bowen case. Not sure if it got my version from Courtlistener or Lexis, but in the Courtlistener one you link you'll see it if you search 263 F. Supp. 26 (SC 1967). Thanks for working on this!!

devlux76 added a commit to devlux76/eyecite that referenced this issue Dec 25, 2021
@flooie flooie moved this to General Backlog in Case Law Sprint Nov 19, 2024
@flooie flooie moved this from General Backlog to Backlog Dec 16 - Dec 27th in Case Law Sprint Dec 16, 2024
@flooie flooie moved this from Backlog Dec 16 - Dec 27th to To Do in Case Law Sprint Dec 17, 2024
@flooie flooie moved this from To Do to Buffer Zone in Case Law Sprint Jan 13, 2025
@flooie
Copy link
Contributor

flooie commented Jan 13, 2025

Lets review this for this sprint to see if this is still occurring and issue

@flooie flooie moved this from Buffer Zone to Backlog Jan 13 to Jan 24 in Case Law Sprint Jan 13, 2025
@quevon24
Copy link
Member

It seems that it doesn't just fail there, the plaintiff is incorrect, it returns 1 instead of Lee County School Dist. No. 1

If you try to parse something like: 'Foo 12334 v. Bar, 1 U.S. 1' the plaintiff only returns the number, the defendant is correct

@quevon24
Copy link
Member

It seems that it doesn't just fail there, the plaintiff is incorrect, it returns 1 instead of Lee County School Dist. No. 1

If you try to parse something like: 'Foo 12334 v. Bar, 1 U.S. 1' the plaintiff only returns the number, the defendant is correct

I think this happens because the current approach to get the plaintiff names is to get the two words before v., and the problem is that in those two words we count the spaces.

For example, if we pass this string Smith v. Bar, 263 F.Supp. 26 (SC 1967) we get this list of "words":

['Smith', ' ', StopWordToken(data='v.', start=6, end=8, groups={'stop_word': 'v'}), ' ', 'Bar,', ' ', CitationToken(data='263 F.Supp. 26', start=14, end=28, groups={'volume': '263', 'reporter': 'F.Supp.', 'page': '26'}, exact_editions=(), variation_editions=(Edition(reporter=Reporter(short_name='F. Supp.', name='Federal Supplement', cite_type='federal', source='reporters', is_scotus=False), short_name='F. Supp.', start=datetime.datetime(1932, 1, 1, 0, 0), end=datetime.datetime(1988, 12, 31, 0, 0)),), short=False), ' ', '(SC', ' ', '1967)']

The current algorithm looks for the stopword v. (It means there is a plaintiff), we use the stopword index from the word list to get the two previous elements in the list, in this case: `['Smith', ' '] which is correct. Here:

if isinstance(word, StopWordToken):
    if word.groups["stop_word"] == "v" and index > 0:
        citation.metadata.plaintiff = "".join(
            str(w) for w in words[max(index - 2, 0) : index]
        ).strip()

But it fails when the plaintiff has more than two words, for example: Lee County School Dist. No. 1 in Lee County School Dist. No. 1 v. Gardner, 263 F.Supp. 26 (SC 1967)

The algorithm will return this words list:

['Lee', ' ', 'County', ' ', 'School', ' ', 'Dist.', ' ', 'No.', ' ', '1', ' ', StopWordToken(data='v.', start=30, end=32, groups={'stop_word': 'v'}), ' ', 'Gardner,', ' ', CitationToken(data='263 F.Supp. 26', start=42, end=56, groups={'volume': '263', 'reporter': 'F.Supp.', 'page': '26'}, exact_editions=(), variation_editions=(Edition(reporter=Reporter(short_name='F. Supp.', name='Federal Supplement', cite_type='federal', source='reporters', is_scotus=False), short_name='F. Supp.', start=datetime.datetime(1932, 1, 1, 0, 0), end=datetime.datetime(1988, 12, 31, 0, 0)),), short=False), ' ', '(SC', ' ', '1967)']

and the two words before v. are: ['1', ' ']

I'm guessing this was set to two elements before v. because is common for plaintiffs to have short names.

That's why I'm thinking about how we can adjust this.

@quevon24
Copy link
Member

It seems that it doesn't just fail there, the plaintiff is incorrect, it returns 1 instead of Lee County School Dist. No. 1
If you try to parse something like: 'Foo 12334 v. Bar, 1 U.S. 1' the plaintiff only returns the number, the defendant is correct

I think this happens because the current approach to get the plaintiff names is to get the two words before v., and the problem is that in those two words we count the spaces.

For example, if we pass this string Smith v. Bar, 263 F.Supp. 26 (SC 1967) we get this list of "words":

['Smith', ' ', StopWordToken(data='v.', start=6, end=8, groups={'stop_word': 'v'}), ' ', 'Bar,', ' ', CitationToken(data='263 F.Supp. 26', start=14, end=28, groups={'volume': '263', 'reporter': 'F.Supp.', 'page': '26'}, exact_editions=(), variation_editions=(Edition(reporter=Reporter(short_name='F. Supp.', name='Federal Supplement', cite_type='federal', source='reporters', is_scotus=False), short_name='F. Supp.', start=datetime.datetime(1932, 1, 1, 0, 0), end=datetime.datetime(1988, 12, 31, 0, 0)),), short=False), ' ', '(SC', ' ', '1967)']

The current algorithm looks for the stopword v. (It means there is a plaintiff), we use the stopword index from the word list to get the two previous elements in the list, in this case: `['Smith', ' '] which is correct. Here:

if isinstance(word, StopWordToken):
    if word.groups["stop_word"] == "v" and index > 0:
        citation.metadata.plaintiff = "".join(
            str(w) for w in words[max(index - 2, 0) : index]
        ).strip()

But it fails when the plaintiff has more than two words, for example: Lee County School Dist. No. 1 in Lee County School Dist. No. 1 v. Gardner, 263 F.Supp. 26 (SC 1967)

The algorithm will return this words list:

['Lee', ' ', 'County', ' ', 'School', ' ', 'Dist.', ' ', 'No.', ' ', '1', ' ', StopWordToken(data='v.', start=30, end=32, groups={'stop_word': 'v'}), ' ', 'Gardner,', ' ', CitationToken(data='263 F.Supp. 26', start=42, end=56, groups={'volume': '263', 'reporter': 'F.Supp.', 'page': '26'}, exact_editions=(), variation_editions=(Edition(reporter=Reporter(short_name='F. Supp.', name='Federal Supplement', cite_type='federal', source='reporters', is_scotus=False), short_name='F. Supp.', start=datetime.datetime(1932, 1, 1, 0, 0), end=datetime.datetime(1988, 12, 31, 0, 0)),), short=False), ' ', '(SC', ' ', '1967)']

and the two words before v. are: ['1', ' ']

I'm guessing this was set to two elements before v. because is common for plaintiffs to have short names.

That's why I'm thinking about how we can adjust this.

I'll move this to a new issue so we can close this one, now we can relate SC to South Carolina instead of scotus. A test case has been added.

@github-project-automation github-project-automation bot moved this to Done in Citator Jan 24, 2025
@github-project-automation github-project-automation bot moved this from Backlog Jan 13 to Jan 24 to Done in Case Law Sprint Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Status: Done
Development

No branches or pull requests

5 participants