Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change quoting-closing apostrophe to ´ #1026

Merged
merged 15 commits into from
Aug 20, 2024
Merged

Conversation

1313ou
Copy link
Contributor

@1313ou 1313ou commented Jun 30, 2024

  1. Change quotation scheme from `quoted' to `quoted´
  2. Also includes some fixes along the way (punctuation, aligned phoneme notation, typos,...)
  3. Output is normalized YAML to avoid non-deterministic changes (the YAML-dumping process used has been fixed to be idempotent).

Apostrophe used as a mark to close a quotation is ambiguous and makes quotations very difficult to parse.
Take for example:

He's too nice, that's his `Achilles' heel.

He's too nice, that's his `Achilles' heel'.

You don't know if the quotation ends after Achilles or at the end of the sentence until you reach it. This makes it difficult for processing tools to extract and style this quotation for example.

Instead of multiplexing the apostrophe character, I suggest using a dedicated character (´) to close quotations. It's ASCII (0x00B4) and mirrors the backtick/grave accent (`).

Putting an end to this multiplexing requires sorting current uses of the apostrophe into 1) omission of character (elision, contraction, possessive ...) and 2) quotation ending. This is what is done here and thus affects only the latter use.

This change is easily reversible by automatic character substitution.

It opens the way to other quotation schemes (by automatic character substitution):

‟double quoted”
“double quoted”
„double quoted low”
❛heavy quoted❜
❟heavy quoted low❜
❝heavy double quoted❞
❠heavy double quoted low❠
«guillemet»

Added to that the YAML is simpler: fewer are the instances where apostrophes in YAML have to be escaped.

@jmccrae
Copy link
Member

jmccrae commented Jul 3, 2024

Thanks for this, I like the idea, but I wonder if the choice of characters are the best to implement this.

Introducing a lot of non-ASCII characters can cause issues, in particular in that I am still not 100% sure how well legacy apps that use the WNDB format can cope with these characters, so I would like to test it a bit more.

Secondly, if we do use Unicode characters for punctuation, wouldn't it be more appropriate to use U+2018 and U+2019 for quotes, rather then U+00B4 which is officially called "Acute Accent"

@1313ou
Copy link
Contributor Author

1313ou commented Jul 4, 2024

I don't think non-ascii will pose a problem nowadays. All modern languages and libraries will handle this seamlessly. The legacy applications that will stumble on non-ascii characters are likely to stumble on :

`¬, °, ·, ×, ⁓, −, ∞, ̃, €, ½, á, à, ä, ā, ç, é, É, ê, ë, fi, ʰ, ʻ, í, Ḳ, ñ, ó, ò, ö, ő, ś, š, ú, ü, ű, ū, α, β, γ, ρ, ъ, Ъ, ь, Ь,

not to mention the em-dash and ellipsis, all of which have already been imported in OEWN.

As for the choice of quoting characters, I agree with you that ‘quoted’ with ‘ (u2018) and ’ (u2019) for quotes would be more appropriate. Or “quoted” with “ (u201C) and ” (u201D) double quotation marks.

Actually that was my first choice but I fell back on the "Acute Accent" (u00B4) because

  • it is Extended Ascii, coded on one byte
  • the move is less extensive: you just replace the closing mark
  • it is more conservative: the backtick stays in place so that code that spots quotations with this will still work
  • the backtick for opening is actually the "Grave Accent" and the acute accent and grave accent are in a mirror relation.

If you are open to the u2018-u2019 move (yielding ‘quoted’), so am I. I can easily adjust the PR to do just this.

@fcbond
Copy link
Member

fcbond commented Jul 4, 2024 via email

@jmccrae jmccrae merged commit db9791d into globalwordnet:main Aug 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants