Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Left and Right Pointing Angle Brackets as punctuation to ancient Greek #12829

Merged
merged 4 commits into from
Jul 20, 2023

Conversation

jmyerston
Copy link
Contributor

Description

This PR adds 〈 (U+2329) and 〉(U+232A) to the ancient Greek. These two symbols are very common in ancient Greek texts to mark editorial reconstructions. As right now, spaCy does not separate these symbol from words what causes improper tokenization.

Types of change

Adds the two symbols as prefixes and suffixes in the punctuation file.

Checklist

  • [ X] I confirm that I have the right to submit this contribution under the project's MIT license.
  • [X ] I ran the tests, and all new and existing tests passed.
  • [ X] My changes don't require a change to the documentation, or if they do, I've added all required information.

add some missing commas in the greCy's description.
Add mathematical left and right angle brackets as punctuation for ancient Greek for better tokenization.
@svlandeg svlandeg added enhancement Feature requests and improvements lang / el Greek language data and models labels Jul 17, 2023
@adrianeboyd adrianeboyd changed the base branch from master to develop July 17, 2023 12:02
@adrianeboyd
Copy link
Contributor

Ah, I'll switch this back until develop is back up-to-date so the history isn't incorrect. But we want this to target develop so this change is first in v3.7.0 and doesn't affect existing v3.6.x pipelines.

@adrianeboyd adrianeboyd changed the base branch from develop to master July 17, 2023 12:03
@adrianeboyd adrianeboyd added the v3.7 Related to v3.7 label Jul 17, 2023
@adrianeboyd adrianeboyd changed the base branch from master to develop July 19, 2023 14:36
@adrianeboyd
Copy link
Contributor

Thanks for the PR!

@adrianeboyd adrianeboyd merged commit 4f8daa4 into explosion:develop Jul 20, 2023
40 checks passed
@adrianeboyd adrianeboyd added lang / grc Ancient Greek language data and models and removed lang / el Greek language data and models labels Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements lang / grc Ancient Greek language data and models v3.7 Related to v3.7
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants