Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Halfwidth punctuation prevents ideographic line breaking #3658

Closed
1ec5 opened this issue Nov 18, 2016 · 8 comments
Closed

Halfwidth punctuation prevents ideographic line breaking #3658

1ec5 opened this issue Nov 18, 2016 · 8 comments

Comments

@1ec5
Copy link
Contributor

1ec5 commented Nov 18, 2016

mapbox-gl-js version: 0.27.0

A point-placed label that contains only Chinese characters except for an interpunct (·) uses word-based line breaking, whereas it should use ideographic line breaking. Chinese uses an interpunct to delimit transliterated Western first and last names. As a common punctuation mark in point of interest names, it shouldn’t keep a POI from getting the improved line breaking behavior we added in #3420.

interpunct

Technically other punctuation marks like the solidus shouldn’t turn off ideographic line breaking, either, but those punctuation marks are more likely to occur in bilingual labels where we definitely do need to err on the side of word-based line breaking.

/cc @nickidlugash @xrwang

@1ec5
Copy link
Contributor Author

1ec5 commented Nov 21, 2016

@ian29 spotted this case of ( turning off ideographic line breaking:

pasted image at 2016_11_20 22_46

ASCII parentheses can go either way. We probably should treat all the characters in charHasNeutralVerticalOrientation() as neutral for line breaking as well as uprightness, so that they only turn off ideographic line breaking if adjacent to a clearly non-ideographic character.

@nickidlugash nickidlugash changed the title Interpunct prevents ideographic line breaking Halfwidth punctuation prevents ideographic line breaking Dec 1, 2016
@nickidlugash
Copy link

Per chat with @1ec5 and @ChrisLoer, we should implement better line breaking behavior for ASCII parentheses, beyond just treating them as neutral for line breaking. Options we discussed so far:

  1. Do balanced (ideographic) breaking but automatically move the line break from after any ( to before it, aka "deleting" any punctuation and adding them back in after figuring out where the line breaks should be for a balanced label.

  2. Treat ideographic characters like alphabetic characters within a parentheses.

@1ec5 @ChrisLoer please feel free to expand upon my extremely rough summary of our chat! Thank you! 🙏

@1ec5
Copy link
Contributor Author

1ec5 commented Dec 1, 2016

Do balanced (ideographic) breaking but automatically move the line break from after any ( to before it, aka "deleting" any punctuation and adding them back in after figuring out where the line breaks should be for a balanced label.

Here’s a more detailed explanation of what I’m imagining: around here, if the next character is a ( that would ordinarily be a line break point, preemptively make the current character the line break point. Also, if the current character is a ) that would ordinarily be a line break point, defer the break until the next character.

@ChrisLoer
Copy link
Contributor

OK, I see, (1) is just tweaking which side of a newline the parenthesis appears. So 废话废话(废话废话)废话废话 with a ~four character maxWidth would wrap to:

废话废话
(废话废话)
废话废话

While 废话废话废话(废话废话废话) would wrap to:

废话废话
废话(废话
废话废话)

Using "treat ideographic characters like alphabetic characters within a parentheses", we'd get:

废话废话
废话
(废话废话废话)

@ian29
Copy link

ian29 commented Dec 2, 2016

@1ec5 @nickidlugash @ChrisLoer do yall know when you expect to have a fix for this? totally fine if this is a v. rough estimate.

@ChrisLoer
Copy link
Contributor

ChrisLoer commented Dec 2, 2016

@ian29 I talked with @nickidlugash this morning, and I think we're still trying to figure out what we want for handling punctuation in general. The "interpunct" fix that started this issue is in #3725, and hopefully we will merge that into gl-js soon.

ChrisLoer added a commit that referenced this issue Dec 5, 2016
* Optimize for minimal line width variation on multi-line labels
* Use same algorithm for all character types to support diglossic labels
* Avoid hanging parentheses in ideographic text
ChrisLoer added a commit that referenced this issue Dec 14, 2016
* Optimize for minimal line width variation on multi-line labels
* Use same algorithm for all character types to support diglossic labels
* Avoid hanging parentheses in ideographic text
ChrisLoer added a commit that referenced this issue Dec 14, 2016
* Optimize for minimal line width variation on multi-line labels
* Use same algorithm for all character types to support diglossic labels
* Avoid hanging parentheses in ideographic text
ChrisLoer added a commit that referenced this issue Dec 15, 2016
* Optimize for minimal line width variation on multi-line labels
* Use same algorithm for all character types to support diglossic labels
* Avoid hanging parentheses in ideographic text
ChrisLoer added a commit that referenced this issue Dec 15, 2016
* Optimize for minimal line width variation on multi-line labels
* Use same algorithm for all character types to support diglossic labels
* Avoid hanging parentheses in ideographic text
ChrisLoer added a commit that referenced this issue Dec 15, 2016
* Optimize for minimal line width variation on multi-line labels
* Use same algorithm for all character types to support diglossic labels
* Avoid hanging parentheses in ideographic text
ChrisLoer added a commit that referenced this issue Dec 15, 2016
* Optimize for minimal line width variation on multi-line labels
* Use same algorithm for all character types to support diglossic labels
* Avoid hanging parentheses in ideographic text
@ChrisLoer
Copy link
Contributor

Closing with PR #3743 -- there's no longer any special treatment for ideographic characters, so there's no way for it to get turned "off". That PR did not include special treatment for any "brace like" punctuation like the "title marks", to prevent opening or closing braces from dangling across line breaks. @nickidlugash , if you get a chance to open an issue with the set of braces we want to handle, it will be easy for me to plug in to the code.

@1ec5
Copy link
Contributor Author

1ec5 commented Dec 15, 2016

if you get a chance to open an issue with the set of braces we want to handle, it will be easy for me to plug in to the code.

#3811

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants