Skip to content

Commit

Permalink
Merge pull request #146 from aphillips/aphillips-w3c-example
Browse files Browse the repository at this point in the history
Fix examples
  • Loading branch information
aphillips authored Dec 21, 2024
2 parents 46babdc + 99752fb commit 21bf5db
Show file tree
Hide file tree
Showing 2 changed files with 101 additions and 22 deletions.
96 changes: 74 additions & 22 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -757,7 +757,7 @@ <h3>Background information</h3>
<h4>Important definitions</h4>
<p>In order to correctly display text written in a 'right-to-left' script or left-to-right text containing bidirectional elements, it is important to establish the <a href="https://www.w3.org/International/articles/inline-bidi-markup/uba-basics#context" class="termref">base direction</a> that will be used to dictate the order in which elements of the text will be displayed.</p>
<p>If you are not familiar with what the Unicode Bidirectional Algorithm (UBA) does and doesn't do, and why the base direction is so important, read <a href="https://www.w3.org/International/articles/inline-bidi-markup/uba-basics">Unicode Bidirectional Algorithm basics</a>.</p>
<aside class="example">
<aside class="example" id="sec-dir-example">
<p>For example, the following annotation will not display correctly unless the application doing the display knows that the base direction needs to be right-to-left.</p>
<pre>{
"@context": "http://www.w3.org/ns/anno.jsonld",
Expand All @@ -772,10 +772,25 @@ <h4>Important definitions</h4>
"target": "http://example.org/photo1"
}
</pre>
<p>You would expect the phrase in the <code class="kw" translate="no">text</code> property value to be displayed as</p>
<p><span dir="rtl">פעילות הבינאום, W3C</span></p>
<p>however, if there is no indication that the base direction should be right-to-left the following incorrect display will be produced:</p>
<p>פעילות הבינאום, W3C</p>
<p>If there is no indication that the [=base direction=] is right-to-left, the display of the item <code>text</code> will be incorrect if the text is placed into a left-to-right context (such as the table below):</p>

<table dir="ltr" class="bidi-example-table">
<thead>
<tr><th>Description</th><th>HTML</th><th style="width:25%">Appearance</th></tr>
</thead>
<tbody>
<tr>
<td>Incorrect:<br>(without <code>dir</code>)</td>
<td><pre class="html">&lt;span lang="he"&gtות הבינאום, W3C&lt;/span&gt;</pre></td>
<td class="spilloverExample"><span lang="he">ות הבינאום, W3C</span></td>
</tr>
<tr>
<td>Correct:<br>(with <code>dir</code>)</td>
<td><pre class="html">&lt;span lang="he" dir="rtl"&gtות הבינאום, W3C&lt;/span&gt;</pre></td>
<td class="spilloverExample"><span dir="rtl" lang="he">ות הבינאום, W3C</span></td>
</tr>
</tbody>
</table>
</aside>

<p>In this section, the word <dfn class="lint-ignore">paragraph</dfn> indicates a run of text followed by a hard line-break in plain text, but may signify different things in other situations. In CSV it equates to 'cell', so a single line of comma-separated items is actually a set of comma-separated paragraphs.&nbsp; In HTML it equates to the lowest level of block element, which is often a <code class="kw" translate="no">p</code> element, but may be things such as <code class="kw" translate="no">div</code>, <code class="kw" translate="no">li</code>, etc., if they only contain text and/or inline elements. In JSON, it often equates to a quoted string value, but if a string value uses markup then paragraphs are associated with block elements, and if the string value is multiple lines of plain text then each line is a paragraph.</p>
Expand Down Expand Up @@ -878,36 +893,63 @@ <h4>Problems with control characters</h4>
<h4>Strong directional formatting characters: RLM, LRM, and ALM</h4>
<p>A word about the Unicode characters <span class="codepoint" translate="no"><img alt="RLM" src="images/200F.png"><code class="uname">U+200F RIGHT-TO-LEFT MARK</code></span> (RLM), <span class="codepoint" translate="no"><img alt="LRM" src="images/200E.png"><code class="uname">U+200E LEFT-TO-RIGHT MARK</code></span> (LRM), and <span class="codepoint" translate="no"><img alt="ALM" src="images/061C.png"><code class="uname">U+061C ARABIC LETTER MARK</code></span> (ALM) is warranted at this point.</p>
<p>The first point to be clear about is that these three characters do not establish the base direction for a range of text. They are simply invisible characters with strong directional properties.</p>
<p>This means that you cannot use RLM for example, to make the text <kbd>W3C</kbd> appear to the left of the Hebrew text in the following example.</p>
<p>The title is "<span dir="rtl" lang="he">פעילות הבינאום, W3C</span>".</p>
<p>For this you can only use metadata or the paired control characters.</p>
<p>Of course, if you are detecting base direction using first-strong heuristics (such as <code>dir="auto"</code> in HTML), then inserting an RLM, ALM, or LRM can be useful for influencing the base direction detected where the text in question begins with something that would otherwise give the wrong result. For example:</p>
<p>"<span dir="rtl" lang="ar">نشاط التدويل</span>" is how you say "i18n Activity" in Arabic.</p>
<p>Here an LRM could be placed at the start of the text, before the strong right-to-left Arabic characters, to prevent the algorithm from assuming that the text should be right-to-left. (Remember that if metadata is used to set the base direction, the strong directional formatting character is ignored, unless the metadata specifically says that first-strong heuristics should be used.)</p>
<p>Recalling an <a href="#sec-dir-example">earlier example</a>, this means that you cannot use RLM, for example, to make the text <kbd>W3C</kbd> appear to the left of the Hebrew text. Only using metadata or paired control characters results in the correct display.</p>

<aside class="example" id="rlm-not-working" title="Use metadata instead of strongly directional formatting characters">

<table dir="ltr" class="bidi-example-table">
<thead>
<tr><th>Description</th><th>HTML</th><th style="width:25%">Result</th></tr>
</thead>
<tbody>
<tr>
<td>With RLM<br>(incorrect)</td>
<td><pre class="html">&lt;span lang="he"&gtות&#x05D5;&#x05EA; &#x05D4;&#x05D1;&#x05D9;&#x05E0;&#x05D0;&#x05D5;&#x05DD;, W3C&amp;rlm;&lt;/span&gt;</pre></td>
<td class="spilloverExample"><span lang="he">ות&#x05D5;&#x05EA; &#x05D4;&#x05D1;&#x05D9;&#x05E0;&#x05D0;&#x05D5;&#x05DD;, W3C&rlm;</span></td>
</tr>
<tr>
<td>With metadata<br>(correct)</td>
<td><pre class="html">&lt;span lang="he" dir="rtl"&gtות&#x05D5;&#x05EA; &#x05D4;&#x05D1;&#x05D9;&#x05E0;&#x05D0;&#x05D5;&#x05DD;, W3C&lt;/span&gt;</pre></td>
<td class="spilloverExample"><span lang="he" dir="rtl">ות&#x05D5;&#x05EA; &#x05D4;&#x05D1;&#x05D9;&#x05E0;&#x05D0;&#x05D5;&#x05DD;, W3C</span></td>
</tr>
</tbody>
</table>
</aside>

<p>Of course, if you are detecting base direction using first-strong heuristics (such as <code>dir="auto"</code> in HTML), then inserting an RLM, ALM, or LRM can be useful for influencing the base direction detected where the text in question begins with something that would otherwise give the wrong result.</p>
<aside class="example" title="Using a strong directional formatting character to assist first-strong heuristics">
<p>This HTML has strongly right-to-left Arabic characters near the start, where they will be picked up by a first-strong heuristic. Notice that there is a neutral character right at the start:</p>
<p><pre class="html">&ltp dir="auto"&gt;"نشاط التدويل" is how you say "i18n activity" in Arabic.&lt;/p&gt;</pre></p>
<p>This produces the wrong result:</p>
<p dir="auto" class="spilloverExample">"&#x0646;&#x0634;&#x0627;&#x0637; &#x0627;&#x0644;&#x062a;&#x062f;&#x0648;&#x064a;&#x0644;" is how you say "i18n Activity" in Arabic.</p>

<p>Here an LRM could be placed at the start of the text to prevent the algorithm from assuming that the text should be right-to-left.</p>
<p><pre class="html">&ltp dir="auto"&gt;&amp;lrm;"نشاط التدويل" is how you say "i18n activity" in Arabic.&lt;/p&gt;</pre></p>
<p dir="auto" class="spilloverExample">&lrm;"&#x0646;&#x0634;&#x0627;&#x0637; &#x0627;&#x0644;&#x062a;&#x062f;&#x0648;&#x064a;&#x0644;" is how you say "i18n Activity" in Arabic.</p>

</aside>
<p>Remember that if metadata is used to set the base direction, the strong directional formatting character is ignored, unless the metadata specifically says that first-strong heuristics should be used.</p>
<p>Finally, a note about the use of <span class="codepoint" translate="no"><img alt="ALM" src="images/061C.png"><code class="uname">U+061C ARABIC LETTER MARK</code></span> (ALM). This character is used to influence the display of sequences of numbers in Arabic script text in cases where no Arabic letters occur before the number.</p>
<aside class="example" title="Example of ALM usage">
<p>In some Arabic-script languages the range <code dir="rtl">100-200</code> should appear as <code dir="rtl">&#x061c;100-200</code>. If no Arabic letters appear before the numbers, the [=Unicode Bidirectional Algorithm=] will not perform this reordering. Note that the character sequences in both cases is "100-200" and that both have a <kbd>code</kbd> element with a <code>dir="rtl"</code> around them. In the third example, an ALM is used to provide the necessary hint, like so:</p>
<table>
<table class="bidi-example-table">
<thead>
<tr><th>Description</th><th>HTML / Appearance</th></tr>
<tr><th>Description</th><th>HTML</th><th>Appearance</th></tr>
</thead>
<tbody>
<tr>
<td rowspan="2">Preceded by Arabic letters</td>
<td><pre class="html">&lt;code dir="rtl" lang="ar"&gt;&#x0634;&#x0627;&#x0637; &#x0627;&#x0644;&#x062A;&#x062F;&#x0648;&#x064A;&#x0644; 100-200&lt;/code&gt;</pre></td>
</tr><tr>
<td dir="rtl" class="spilloverExample"><code dir="rtl" lang="ar">&#x0634;&#x0627;&#x0637; &#x0627;&#x0644;&#x062A;&#x062F;&#x0648;&#x064A;&#x0644; 100-200</code></td>
<td>Preceded by Arabic letters</td>
<td><pre class="html">&lt;code dir="rtl" lang="ar"&gt;&#x0646;&#x0634;&#x0627;&#x0637; &#x0627;&#x0644;&#x062A;&#x062F;&#x0648;&#x064A;&#x0644; 100-200&lt;/code&gt;</pre></td>
<td dir="rtl" class="spilloverExample"><code dir="rtl" lang="ar">&#x0646;&#x0634;&#x0627;&#x0637; &#x0627;&#x0644;&#x062A;&#x062F;&#x0648;&#x064A;&#x0644; 100-200</code></td>
</tr>
<tr>
<td rowspan="2">Without ALM</td>
<td>Without ALM</td>
<td><pre class="html">&lt;code dir="rtl" lang="ar"&gt100-200&lt;/code&gt;</pre></td>
</tr><tr>
<td dir="rtl" class="spilloverExample"><code dir="rtl" lang="ar">100-200</code></td>
</tr>
<tr>
<td rowspan="2">With ALM</td>
<td>With ALM</td>
<td><pre class="html">&lt;code dir="rtl" lang="ar"&gt&amp;#x061C;100-200&lt;/code&gt;</pre></td>
</tr><tr>
<td dir="rtl" class="spilloverExample"><code dir="rtl" lang="ar" >&#x061C;100-200</code></td>
</tr>
</tbody>
Expand Down Expand Up @@ -5567,7 +5609,17 @@ <h2> Revision Log</h2>
<section class="appendix" id="ack">
<h2>Acknowledgements</h2>
<p>Thanks to Addison Phillips for help reviewing old reviews for recommendations.</p>
<p>Other people who contributed through reviews or issues include Steve Atkin, Andrew Cunningham, Martin Dürst, Asmus Freytag, John Klensin, Tomer Mahlin, Chaals McCathieNevile, Florian Rivoal. Some material about locale-neutral representation was adapted from [[DWBP]].</p>
<p>Other people who contributed through reviews or issues include
Steve Atkin,
Andrew Cunningham,
Martin Dürst,
Asmus Freytag,
John Klensin,
Tomer Mahlin,
Chaals McCathieNevile,
Florian Rivoal,
Najib Tounsi.
Some material about locale-neutral representation was adapted from [[DWBP]].</p>
</section>


Expand Down
27 changes: 27 additions & 0 deletions local.css
Original file line number Diff line number Diff line change
Expand Up @@ -455,9 +455,36 @@ td.exampleChar {
font-size: 140%;
}

.spilloverExample :lang(ar) {
font-family: Noto Sans Arabic, Tahoma, sans-serif;
}

.localdef {
background-color:white;
border: 1px solid brown;
margin:0.5em;
padding:0.5em;
}

table.bidi-example-table {
background-color: white;
border-collapse: collapse;
padding: 0;
width: 98%;
}

table.bidi-example-table td {
padding: 0;
}

table.bidi-example-table th {
text-align: center;
}

table.bidi-example-table tr {
border-bottom: 1px solid #ddd;
}

table.bidi-example-table tr td:last-child {
white-space: nowrap;
}

0 comments on commit 21bf5db

Please sign in to comment.