Skip to content
This repository has been archived by the owner on Jun 30, 2018. It is now read-only.

Letter/Word spacing for CJK languages - SC 1.4.12 Text Spacing #677

Closed
MakotoUeki opened this issue Jan 10, 2018 · 44 comments
Closed

Letter/Word spacing for CJK languages - SC 1.4.12 Text Spacing #677

MakotoUeki opened this issue Jan 10, 2018 · 44 comments

Comments

@MakotoUeki
Copy link

In Japanese language, we don't put white spaces between words. It might be the same in Chinese and Korean language. So it might be needed to add exception to web content in CJK languages at least.

And we need to investigate the letter spacing (tracking) works for the CJK characters as well.

@lauracarlson
Copy link
Contributor

Hi @MakotoUeki ,

Thanks for the info!

I wonder if this issue a duplicate of #657 ? If so, can it be closed and resolved over there? @awkawk asked you a question on that issue.

Thanks again,
Laura

@MakotoUeki
Copy link
Author

It might be so. But most of web pages on Japanese websites are not using the vertical writing mode.

For exampke:
https://www.kantei.go.jp/
http://www.metro.tokyo.jp/
https://www.yahoo.co.jp/
http://www.adobe.com/jp/
https://waic.jp/

We use the horizontal writing mode for Japanese web pages in general. So I can say that letter/word spacing are the same issues as #657.

@lauracarlson lauracarlson self-assigned this Jan 10, 2018
@lauracarlson
Copy link
Contributor

Hi @ MakotoUeki ,

I drafted a proposed response in the Wiki for WG consideration.

@MakotoUeki
Copy link
Author

Hi @lauracarlson ,

Thanks so much for addressing this.

Is "Exception: Text in Chinese, Japanese, and Korean languages." the exception for entire this SC?

I can say that the exception is needed for word spacing. This is applicable for both horizontal and vertical text in Japanese.

As to letter spacing, it might be okay. And both line height (line spacing) and spacing underneath paragraphs will be okay as well. We need to find the research-based basis for Japanese language, especially for vertical text which I'm not sure that such kind of researches have been done though.

@lauracarlson
Copy link
Contributor

lauracarlson commented Jan 11, 2018 via email

@steverep
Copy link
Member

Have we verified that the bookmarklet would make Makoto's example sites unreadable? That is, do they add spacing inappropriately between words? I'm wondering if correctly specifying the language in the markup negates style properties like word-spacing (and it probably should if Japanese is simply not applicable).

@lauracarlson
Copy link
Contributor

lauracarlson commented Jan 11, 2018 via email

@MakotoUeki
Copy link
Author

@lauracarlson

At least, I won't be able to be responsible for Chinese ,Korean and any other languages than Japanese.

At this moment, what I can suggest is the following only:
"Word spacing to at least 0.16 times the font size, except for text in Japanese."

One more thing. Is Each value based on the researches for English? Then we need to say something in NOTE like:
"The values are taken from the researches for roman texts. For other text such as CJK and Arabic text, the "equivalent" values would be taken from the same kind of researches for each language/text."

The appropriate values might be different among different language/text. The working group will not be able to specify all of the values for all of the languages.

There is a same kind of description in the "large-scale" desfinition in WCAG 2.0:
https://www.w3.org/TR/WCAG21/#dfn-large-scale

@johnfoliot
Copy link

johnfoliot commented Jan 14, 2018 via email

@awkawk
Copy link
Member

awkawk commented Jan 14, 2018

What I'm hearing from Makoto is not that it isn't supported (you can apply word-spacing) but that it isn't used, and therefore authors shouldn't be constrained to ensure that the layout doesn't break if it is.

@awkawk
Copy link
Member

awkawk commented Jan 14, 2018

@r12a - any thoughts/advice on this?

@awkawk
Copy link
Member

awkawk commented Jan 14, 2018

http://www.koreanwikiproject.com/wiki/Word_spacing - apparently not an issue in Korean
http://research.chtsai.org/dissertation/chapter-1.html - apparently is an issue in Chinese

@awkawk
Copy link
Member

awkawk commented Jan 14, 2018

How about adding:
Exception: Languages which do not typically make use of one or more of these text style properties in written text can conform using only the properties that are typically used.

That way we are covered for other languages that the WG isn't well-versed in.

@MakotoUeki
Copy link
Author

The addition works for "word spacing" in Japanese.

One more thing. I'd like to confirm if each value is common among any kind of languages/text including CJK, Arabic, etc. These properties can be used in Japanese.

  • Line height (line spacing) to at least 1.5 times the font size;
  • Spacing underneath paragraphs to at least 2 times the font size;
  • Letter spacing (tracking) to at least 0.12 times the font size;

@mraccess77
Copy link

Personal comment: Has anyone looked at this study?
https://www.sciencedirect.com/science/article/pii/S0042698907002556

@MakotoUeki
Copy link
Author

Hi @mraccess77 , thank you so much for sharing. I'll read it.

@r12a
Copy link

r12a commented Jan 15, 2018

Normally, i'd point you to our typography index, and the section at http://w3c.github.io/typography/#graphemes, but we don't seem to point to anything relevant there yet.

You may therefore find it useful in general to look at these pages by myself, which are still in development.

Perhaps the most useful starting point is
https://r12a.github.io/scripts/featurelist/
which has a column entitled "Word separator" (you can click on the column heading to group the values together).

This only covers a selection of scripts, but you can see that the following don't separate words with spaces:
Balinese, Han+Kana (basically Chinese & Japanese), Javanese, Khmer, Lao, Myanmar (Burmese), Thai, and Tibetan.

Note that Korean (hangul) is not one of those, because it does separate 'words' with spaces.

Watch out, though, because Khmer, Thai, and other SE Asian scripts do use spaces, but as phrase delimiters rather than word delimiters. This also applies to Tibetan, however (1) syllables are delimited rather than words (for which Tibetan uses a syllable separator (tsek)) and (2) Tibetans tends to prefer   rather than normal space. These scripts tend to stretch spaces between characters, rather than between words, for things like justification. But there may be complications. For example, Japanese tends to stretch gaps of ink around things like punctuation before stretching inter-character spacing when justifying.

You may hear that SE Asian scripts use ZWSP (zero width space) between words. You may even find some text that does so, but the vast majority of the time they don't, and applications rely on dictionary lookup and parsing to detect word boundaries (which are important for things like line breaking, in a way that they aren't for Japanese and Chinese). And before you ask, no, i don't think that we should recommend use of ZWSP for accessibility ;-)

If you look a bit further down the list, you will see that Ethiopic does distinguish word boundaries, but with a special word-separator character, rather than with a space. There is some flexibility in the width of that character when justifying, but i don't know how that translates into an accessibility guideline to widen the spaces between words.

By the way, if you want more information about these behaviours, follow the links next to the script names (if there is one). When you reach the page linked to, look for a section called something like Text layout > Text delimiters.

Essentially, (and i think i may have already mentioned it) these guidelines probably ought to clarify the writing system and language that the metrics proposed are relevant for.

hope that helps.

@r12a
Copy link

r12a commented Jan 15, 2018

Oh, and wrt Arabic script, you'll read that it is common to stretch the baseline between characters when justifying text (characters typically join at the baseline). Whether this provides an opportunity for more readable text, accessiblity-wise, i don't know (but i doubt it). One has to also bear in mind that certain font styles (such as ruq'a) don't allow baseline elongation.

Urdu is also an interesting case, since the nastaliq font style it uses naturally reduces the gaps between words (in part because the nastaliq font style has a sloping baseline that is word-based, and there are word final letters that help identify word endings). I have no idea whether or not the requirement for extra space between words would translate to something useful for Urdu.

@alastc
Copy link
Contributor

alastc commented Jan 15, 2018

Wow, great resource there, bookmarked.

I'm concluding from the comments that:

  1. Some or all of the scripted lanugages will need to be exceptions to the guideline, as changing the spacing could change the meaning.
  2. We need a better term than CJK. Is 'Languages written as script' a reasonable term? Or are we better off using something like: Non-latin based languages.

@r12a
Copy link

r12a commented Jan 15, 2018

Well, i think the WAI guidelines need to say something along the lines of:

Increased inter-word spacing may be helpful, and studies for English have shown that ..., however this may not be appropriate for text using other writing systems, or even for Latin script text in other languages.

I find myself wondering whether the specific recommendations will even be useful for some well-known Latin script languages, such as German, Finnish and Dutch, since these languages tend to have long compound words, (such as Eingabeverarbeitungsfunktionen) which you wouldn't really want to split - and which would be difficult to split using CSS anyway, since there is no internal delimiter. Also, in the case of German, there are capital letters for all nouns, which may also help users get by better, given the conclusions about how kanji helps japanese readers in the article linked to by @mraccess77 above.

Basically, i think you can't extrapolate some research findings for English text to any other language. You can only suggest that inter-word space stretching may be useful, and cite evidence and recommendations for that on the basis of those languages that we know have been researched.

@awkawk
Copy link
Member

awkawk commented Jan 15, 2018

Sounds like we need to pull the word spacing item.

Do the others work? And do we have a rational basis for the values being appropriate for languages across the board?

  • Line height (line spacing) to at least 1.5 times the font size;
  • Spacing underneath paragraphs to at least 2 times the font size;
  • Letter spacing (tracking) to at least 0.12 times the font size;

@alastc
Copy link
Contributor

alastc commented Jan 15, 2018

Hi Richard,

I think it helps that the aim of the SC is to allow for more spacing, it is not intended to say that those values are what the user must have. These values are there to provide a baseline for testing (to say you passed or failed), but within that roughly 10% buffer on text, you could choose all letter-spacing, all word-spacing, or a larger font family.

This SC should also help internationalisation to some extent, as it means (as a designer/developer) you need to be conscious of allowing a buffer around text. Surely that is helpful for languages like German, there is no mechanism to split longer words (if that is even is desirable), but spacing them out a little more should help from a physical reading point of view.

If we pull word-spacing out, can we increase the value of letter-spacing to compensate?

@mraccess77
Copy link

It is my personal understanding that the intention of the SC is to allow for more space for different font families or spacing if needed. Surely other languages allow multiple fonts -- even though we could not get the font family language in the SC for other reasons allowing for more room support more personalization of writing in many language..

@mraccess77
Copy link

mraccess77 commented Jan 15, 2018 via email

@mraccess77
Copy link

My personal feeling is that if we limit this to English the SC will be abandoned and all the people who can benefit from changing font family or adjust some spacing will not have what is needed. I'd be very surprised that these changes would not benefit users with low vision or cognitive disabilities. But unfortunately I don't have research at my fingertips for each language to communicate this. Studies with the general population often don't extend well to people with low vision and cognitive disabilities so we can't rely on general population studies.

@joshueoconnor
Copy link
Contributor

Thanks all, and especially @r12a for the resources and extra input. I'm also wondering about @awkawk question if the following do help move the SC to a place to cover as far as possible scripts/languages that are 'non-latin' or with diactrics etc.

Line height (line spacing) to at least 1.5 times the font size;
Spacing underneath paragraphs to at least 2 times the font size;
Letter spacing (tracking) to at least 0.12 times the font size;

I'm not expert in this, and will defer to those that are but we do need to at least try to increment the SC in way that can accomodate as much variance in natural language styles as possible.

Regarding @alastc comment about not referring to CJK etc - OTTOMH - could we say 'Latin, Cyrillic, Devanagari, Semitic and diacritic type alphabets'. @r12a would know better the details of the classifications than me :-)

@alastc
Copy link
Contributor

alastc commented Jan 16, 2018

Hi Josh,

The goal is to acheive a certain level of spacing, so taking out word spacing reduces that amount.

We can compensate by upping the letter spacing to 0.14 (going back to my previous testing and experimenting).

However, it does feel very late in the day to be fiddling with that type of thing. Wayne did some sterling analysis on size of typefaces and the relationship to letter/word spacing, I really don't want to repeat that process without him.

I think the safe term would be scoping it to Latin-based languages, I don't know what Devanagari is or how words in Semitic are put together. Languages (human ones anyway) have always been a weakness for me, so long as we don't drop the SC I'm all ears about the right terminology.

@r12a r12a closed this as completed Jan 16, 2018
@r12a r12a reopened this Jan 16, 2018
@joshueoconnor
Copy link
Contributor

+1 @alastc to restricting it to Latin based languages until we sort out what is needed for the rest. I also agree with you against last minute tinkering at this stage.

@lauracarlson
Copy link
Contributor

lauracarlson commented Jan 16, 2018

Hi @MakotoUeki, @steverep, @alastc , and all,

Makoto wrote:

It might be so. But most of web pages on Japanese websites are not using the vertical writing mode.

For example:
https://www.kantei.go.jp/
http://www.metro.tokyo.jp/
https://www.yahoo.co.jp/
http://www.adobe.com/jp/
https://waic.jp/

I used Alastair's spacing bookmarklet tool on your examples. It automatically sets spacing to what is specified in the SC. That is:

  • line height (line spacing) to at least 1.5 times the font size
  • spacing underneath paragraphs to at least 2 times the font size
  • letter spacing (tracking) to at least 0.12 times the font size
  • word spacing to at least 0.16 times the font size

The bookmarklet didn't seem to work on metro.tokyo or yahoo.co. The bookmarklet script refused to load. This should not be a problem if/when the script is put into a proper extension.

However, the following are screenshots of the other 3 :

  1. Screenshot of waic.jp
  2. Screenshot of adobe.com.jp
  3. Screenshot of kantei.go.jp

Makoto, do you detect any loss of content?

@MakotoUeki
Copy link
Author

@lauracarlson There is no problem. I'm still not sure if each value is valid enough for Japanese text. But I can say those values would be reasonable as the minimum requirements. Increasing spacing benefits Japanese users with low vision or dyslexia. One exception for Japanese is word spacing.

@lauracarlson
Copy link
Contributor

@MakotoUeki,

Thank you for checking them.

@r12a
Copy link

r12a commented Jan 17, 2018

If we pull word-spacing out, can we increase the value of letter-spacing to compensate?

See #390. I assume that one expects increases to be applied sensibly and as a minimum, so pre-existing text with extra letter-spacing (eg. Hebrew emphasis) will be taken into account by the person adding the letter-spacing increase. There's still a question around whether the SC is relevant to cursive scripts, such as Arabic, N'Ko, Mongolian, etc. since it's not clear what tracking (ie. letter-spacing) means for those scripts, if anything, given that letters are joined. I am also assuming that we are assuming that tracking does the right thing for complex scripts that add space around syllables rather than individual letters, eg. Devanagari, Bengali, Thai, Khmer, etc.

Regarding @alastc comment about not referring to CJK etc - OTTOMH - could we say 'Latin, Cyrillic, Devanagari, Semitic and diacritic type alphabets'. @r12a would know better the details of the classifications than me :-)

Why not say something like "For languages that separate words using spaces...." Would that work?

@awkawk
Copy link
Member

awkawk commented Jan 17, 2018

@r12a - we are looking at the following for an exception. Would appreciate your thoughts:

Exception: Human languages and scripts which do not make use of one or more of these text style properties in written text can conform using only the properties that are used.

@r12a
Copy link

r12a commented Jan 17, 2018

I think that makes sense.

@alastc
Copy link
Contributor

alastc commented Jan 17, 2018

Hi @r12a,

The intent is that the value is a maximum increase for the purpose of testing. The user can then use any value they like, but the site is repsonsible for making it work upto the set point. If the user goes beyond the value, things might break, but that isn't their issue.

For a language like Hebrew, it seems like the spacing value provides meaning, so we don't want to impact that. (I'm curious how that works on the web though? spans around everything?)
Again though, if the user increased the spacing proportionately, it is up to the site to test for an increase up to that point.

@awkawk given that some languages convey meaning with spacing, "do not typically make use of" might not cover it. Oh, and Steve hates "typically", so:

Exception: Human languages and scripts which do not make use of, or convey meaning with, one or more of these text style properties can conform using only the properties that are used and do not impact the meaning.

Or perhaps I misunderstood and that isn't needed? Banging head cold today, I could be confused.

@lauracarlson
Copy link
Contributor

lauracarlson commented Jan 17, 2018

@alastc wrote:

For a language like Hebrew, it seems like the spacing value provides meaning, so we don't want to impact that. (I'm curious how that works on the web though? spans around everything?)
Again though, if the user increased the spacing proportionately, it is up to the site to test for an increase up to that point.

I wonder if @lseeman may be able to advise on spacing and languages like Hebrew?

@r12a
Copy link

r12a commented Jan 17, 2018

I'm curious how that works on the web though? spans around everything?

Here's a picture: w3c/type-samples#19 (red lines aren't in the original)

I assume you would just put the relevant text in a em tag and apply CSS letter-spacing to your em tags.

@alastc
Copy link
Contributor

alastc commented Jan 17, 2018

Ah, if it's aplied semantically that would be ok, as the user-style sheet / plugin could allow for that, assuming it is done reasonably consistently across sites.

I don't think there would be much user benefit in that scenario, but it's not harmful (would pass easily).

I.e. no need for my tweak.

@allanj-uaag
Copy link
Contributor

allanj-uaag commented Jan 17, 2018 via email

@alastc
Copy link
Contributor

alastc commented Jan 17, 2018

Um, could you understand whether the meaning had changed in every instance?

No matter, if those sorts of changes are applied with markup, my adjustment isn't needed.

@allanj-uaag
Copy link
Contributor

allanj-uaag commented Jan 17, 2018 via email

@lauracarlson
Copy link
Contributor

@awkawk I think we are all in agreement that your exception wording is good to go.

@joshueoconnor
Copy link
Contributor

Great stuff all.

@awkawk
Copy link
Member

awkawk commented Jan 21, 2018

(Official WG Response)
Thank you for your comment. These properties are defined in CSS and in typographical principals, and they adhere to the meaning used in CSS. We will modify the description for paragraph spacing to "spacing following paragraphs" instead of "spacing underneath paragraphs" to address the implicit western language bias in this item. We will also add the following exception to address cases such as in Japanese where word-spacing is not typically used.

Exception: Human languages and scripts which do not make use of one or more of these text style properties in written text can conform using only the properties that are used.

#729

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants