Improvements to `maybe_balance_style_tags` #231

grossir · 2025-02-21T16:40:41Z

Right now maybe_balance_style_tags only does the balancing if the tag is inmediately before the start of the token, giving some whitespace tolerance. However, from running the citation extractor we have seen the following fail cases and possible improvements:

introductory words included in the style tags: mostly see
- Example: "See id. at 642"
- Another: "(see id. at 648-650"
- Example for reference citation: "see Luperon"
party names included in the style tags
- before supra; example : "AT&T, supra" ; "South Seas Yacht Club, supra"
full case names included in the style tags, when we are only looking for a party name
- Example of reference "it established in State v. Wingler"

From these examples, I think we could search for the matching style tag for a TOLERANCE number of any character, and include it in the span if we find it. That will help catch case names, and other "introductory" words that are not "see".

This should not create more overlapping issues than already exist; because full span overlaps have already been resolved previous to this annotation step

The text was updated successfully, but these errors were encountered:

sentry-io · 2025-03-07T16:54:26Z

We briefly collected some data when the logger was active

Sentry Issue: COURTLISTENER-96A

Fixes #231 Allow searching beyond only whitespace for the missing space tag. This helps finding the missing tag when a party name or intro word "see") are included in the style span. Added tests

grossir added this to Case Law Sprint Feb 21, 2025

flooie moved this to Backlog Feb 24 to March 7 in Case Law Sprint Feb 24, 2025

grossir self-assigned this Feb 24, 2025

flooie moved this from Backlog Feb 24 to March 7 to To Do in Case Law Sprint Feb 27, 2025

grossir moved this from To Do to In progress in Case Law Sprint Mar 3, 2025

grossir linked a pull request Mar 7, 2025 that will close this issue

fix(maybe_balance_style_tags): search further for missing tag #239

Open

grossir moved this from In progress to PR'd Issues 🤞 in Case Law Sprint Mar 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to `maybe_balance_style_tags` #231

Improvements to `maybe_balance_style_tags` #231

grossir commented Feb 21, 2025 •

edited

Loading

sentry-io bot commented Mar 7, 2025

Improvements to maybe_balance_style_tags #231

Improvements to maybe_balance_style_tags #231

Comments

grossir commented Feb 21, 2025 • edited Loading

sentry-io bot commented Mar 7, 2025

Improvements to `maybe_balance_style_tags` #231

Improvements to `maybe_balance_style_tags` #231

grossir commented Feb 21, 2025 •

edited

Loading