Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Character entity larrpl isn't left-arrow-with-plus, to match rarrpl #3655

Closed
ediosyncratic opened this issue Apr 28, 2018 · 6 comments · Fixed by #7071
Closed

Character entity larrpl isn't left-arrow-with-plus, to match rarrpl #3655

ediosyncratic opened this issue Apr 28, 2018 · 6 comments · Fixed by #7071
Labels
needs implementer interest Moving the issue forward requires implementers to express interest normative change topic: parser

Comments

@ediosyncratic
Copy link
Contributor

ediosyncratic commented Apr 28, 2018

Either of these pages:

tells me:

Compare U+02946: this is ⥆ LEFTWARDS ARROW WITH PLUS BELOW
This seems a far better fit for ⤹ !

@ediosyncratic
Copy link
Contributor Author

ediosyncratic commented Apr 28, 2018

The table of character reference names appears to be generated from a file, entities.inc or entities.json, which I do not find in the source tree, so I am unable to offer a patch - unless someone cares to tell me where to find that file.

@domenic
Copy link
Member

domenic commented Apr 28, 2018

This isn't something we can fix; it's just how these entities were historically defined. Try them in your browser! ⥅ ⤹ yields ⥅ ⤹ indeed.

@ediosyncratic
Copy link
Contributor Author

ediosyncratic commented Apr 28, 2018

@domenic: close inspection of the source will show you that I used them in the bug report; so yes, I know browsers currently support the present broken spec; however, the thing that would fix it is precisely if this spec recognised a clear mistake and introduced a backwards-incompatible change. I sincerely doubt more folk use ⤹ deliberately for its specified meaning than use it expecting the meaning I suggest (and I'm fairly sure both groups are small, probably empty); and the browsers aren't hand-coding this stuff, they're surely using the entities.json file or something derived from it to generate their entity mappings, so a fix by WhatWG would soon enough fix all browsers.

@domenic
Copy link
Member

domenic commented Apr 28, 2018

I see. Well, I'll tag this as "needs implementer interest", but I'm not sure how many browsers are going to be interested in changing the meaning of existing pages just to make entity names a bit more symmetrical.

@domenic domenic added normative change needs implementer interest Moving the issue forward requires implementers to express interest labels Apr 28, 2018
@ediosyncratic
Copy link
Contributor Author

It's not just a matter of being "a bit more symmetrical" - there's a pattern in the naming that leads the reasonable reader to suppose ⤹ means "left arrow (with) plus"; but the spec wantonly has it mean something completely different.

As it happens, the given meaning for larrpl is the mirror image of what's given for cudarrl, U+02938; whose mirror image should clearly be cudarrr (U+02935, ARROW POINTING RIGHTWARDS THEN CURVING DOWNWARDS; apparently there is no matching left-then-down arrow). These are another pair whose names are incoherent; they should clearly be each other's mirror images, but both curve clockwise; cudarrr starts out pointing right and curves down; while cudarrl points down, initially a little to the right but curving to the left. One could justify either by a cunning reading of its name (cudarrl points down and curves left; cudarrr curves down from a rightwards start) but you need incompatible readings to make sense of them. (Contrast with the symmetry between rdca ⤷ and ldca ⤶, for all that their pattern implies a family, the rest of which is missing.) Still, this one really is just a borked symmetry (with incoherent naming); while it probably would make sense to change cudarrr to U+02939, this isn't as clear an error as the larrpl one, for which ⥆ is unequivocally the only sane meaning.

It may be worth checking ISO 8879 (I don't have access; there's always a pay-wall in the way), which is a source document for at least some of this mapping (e.g. the heinous there4 ∴ for which &thus; or &so; would have been terser and so much more apt). If the present situation is inherited from a mistake in it, then you really can give up on fixing this; conversely, if the present situation is a mis-copying from ISO 8879, that would strengthen the case for fixing it.

@annevk
Copy link
Member

annevk commented Apr 29, 2018

Changes to the HTML parser need to meet a very high bar (basically only security issues, non-normative changes, and <template> have succeeded since things settled down). Unfortunate as it is, I think this proposal is made about a decade too late.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs implementer interest Moving the issue forward requires implementers to express interest normative change topic: parser
Development

Successfully merging a pull request may close this issue.

3 participants