Change quoting-closing apostrophe to ´ #1026

1313ou · 2024-06-30T13:17:00Z

Change quotation scheme from `quoted' to `quoted´
Also includes some fixes along the way (punctuation, aligned phoneme notation, typos,...)
Output is normalized YAML to avoid non-deterministic changes (the YAML-dumping process used has been fixed to be idempotent).

Apostrophe used as a mark to close a quotation is ambiguous and makes quotations very difficult to parse.
Take for example:

He's too nice, that's his `Achilles' heel.

He's too nice, that's his `Achilles' heel'.

You don't know if the quotation ends after Achilles or at the end of the sentence until you reach it. This makes it difficult for processing tools to extract and style this quotation for example.

Instead of multiplexing the apostrophe character, I suggest using a dedicated character (´) to close quotations. It's ASCII (0x00B4) and mirrors the backtick/grave accent (`).

Putting an end to this multiplexing requires sorting current uses of the apostrophe into 1) omission of character (elision, contraction, possessive ...) and 2) quotation ending. This is what is done here and thus affects only the latter use.

This change is easily reversible by automatic character substitution.

It opens the way to other quotation schemes (by automatic character substitution):

‟double quoted”
“double quoted”
„double quoted low”
❛heavy quoted❜
❟heavy quoted low❜
❝heavy double quoted❞
❠heavy double quoted low❠
«guillemet»

Added to that the YAML is simpler: fewer are the instances where apostrophes in YAML have to be escaped.

jmccrae · 2024-07-03T09:54:22Z

Thanks for this, I like the idea, but I wonder if the choice of characters are the best to implement this.

Introducing a lot of non-ASCII characters can cause issues, in particular in that I am still not 100% sure how well legacy apps that use the WNDB format can cope with these characters, so I would like to test it a bit more.

Secondly, if we do use Unicode characters for punctuation, wouldn't it be more appropriate to use U+2018 and U+2019 for quotes, rather then U+00B4 which is officially called "Acute Accent"

1313ou · 2024-07-04T07:17:04Z

I don't think non-ascii will pose a problem nowadays. All modern languages and libraries will handle this seamlessly. The legacy applications that will stumble on non-ascii characters are likely to stumble on :

`¬, °, ·, ×, ⁓, −, ∞, ̃, €, ½, á, à, ä, ā, ç, é, É, ê, ë, ﬁ, ʰ, ʻ, í, Ḳ, ñ, ó, ò, ö, ő, ś, š, ú, ü, ű, ū, α, β, γ, ρ, ъ, Ъ, ь, Ь,

not to mention the em-dash and ellipsis, all of which have already been imported in OEWN.

As for the choice of quoting characters, I agree with you that ‘quoted’ with ‘ (u2018) and ’ (u2019) for quotes would be more appropriate. Or “quoted” with “ (u201C) and ” (u201D) double quotation marks.

Actually that was my first choice but I fell back on the "Acute Accent" (u00B4) because

it is Extended Ascii, coded on one byte
the move is less extensive: you just replace the closing mark
it is more conservative: the backtick stays in place so that code that spots quotations with this will still work
the backtick for opening is actually the "Grave Accent" and the acute accent and grave accent are in a mirror relation.

If you are open to the u2018-u2019 move (yielding ‘quoted’), so am I. I can easily adjust the PR to do just this.

fcbond · 2024-07-04T09:13:32Z

I would also prefer “ (u201C) and ” (u201D), as people are less likely to confuse them with apostrophes.

…

On Thu, 4 Jul 2024 at 09:17, Bernard Bou ***@***.***> wrote: I don't think non-ascii will pose a problem nowadays. All modern languages and libraries will handle this seamlessly. The legacy applications that will stumble on non-ascii characters are likely to stumble on : `¬, °, ·, ×, ⁓, −, ∞, ̃, €, ½, á, à, ä, ā, ç, é, É, ê, ë, ﬁ, ʰ, ʻ, í, Ḳ, ñ, ó, ò, ö, ő, ś, š, ú, ü, ű, ū, α, β, γ, ρ, ъ, Ъ, ь, Ь, not to mention the em-dash and ellipsis, all of which have already been imported in OEWN. As for the choice of quoting characters, I agree with you that ‘quoted’ with ‘ (u2018) and ’ (u2019) for quotes would be more appropriate. Or “quoted” with “ (u201C) and ” (u201D) double quotation marks. Actually that was my first choice but I fell back on the "Acute Accent" (u00B4) because - it is Extended Ascii, coded on one byte - the move is less extensive: you just replace the closing mark - it is more conservative: the backtick stays in place so that code that spots quotations with this will still work - the backtick for opening is actually the "Grave Accent" and the acute accent and grave accent are in a mirror relation. If you are open to the u2018-u2019 move (yielding ‘quoted’), so am I. I can easily adjust the PR to do just this. — Reply to this email directly, view it on GitHub <#1026 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAIPZRSMVMWRZZQO64XIJO3ZKTZIPAVCNFSM6AAAAABKEDKUFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBYGI4DAOJZHA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Francis Bond <https://fcbond.github.io/>

1313ou added 15 commits June 27, 2024 09:32

Normalize

df2a171

Minor fixes, mainly spaces

a4a036e

Change quotation-closing character from apostrophe to acute accent

4cf971b

Fix incorrect apostrophe

8ebbf51

Normalize

4ca3006

Fix phoneme notation

9303d52

Fix question dialog tag

cf0271e

Fix punctuation with dialog tag

45b17de

Fix spce before comma

16f2487

Fix expression quoting. Add esh IPA for Ssssh sound

c00b8fd

Fix affix quoting. Fix some phoneme notations

a05eabf

Fix affix quoting. Fix some phoneme notations

6163faf

Align phoneme notation to /.../

a9014d7

Fix typo

cb8ec8e

Normalize

db9791d

This was referenced Jul 5, 2024

Quoting with single quotes #1027

Merged

Quoting with double quotes #1028

Closed

jmccrae merged commit db9791d into globalwordnet:main Aug 20, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change quoting-closing apostrophe to ´ #1026

Change quoting-closing apostrophe to ´ #1026

1313ou commented Jun 30, 2024

jmccrae commented Jul 3, 2024

1313ou commented Jul 4, 2024 •

edited

Loading

fcbond commented Jul 4, 2024 via email

Change quoting-closing apostrophe to ´ #1026

Change quoting-closing apostrophe to ´ #1026

Conversation

1313ou commented Jun 30, 2024

jmccrae commented Jul 3, 2024

1313ou commented Jul 4, 2024 • edited Loading

fcbond commented Jul 4, 2024 via email

1313ou commented Jul 4, 2024 •

edited

Loading