-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Permit non-ASCII within <t> without the use of <u> #895
Conversation
So far this will make it so that non-ascii utf-8 can appear in I'm not sure that's the intended result of the PR? Unless we have some other tool to let the rfc-editor know they've got utf-8 to look at for reasonableness in the chain, having it pass it quietly seems a step too far. |
The change only kicks in when the RFC has a number. Up to then, xml2rfc works like before. |
and sometimes big swaths of text get added after the RFC has a number. |
Yes, and having a tool that gives the RFC editor a histogram of character usage is probably quite useful. SMOP. |
require 'unicode/name'
hist = Hash.new(0)
ARGF.read.each_char do |c|
hist[c] += 1 unless c.ord < 128
end
cl = "*** Latin"
hist.keys.sort.partition {|c| c =~ /\A\p{Latin}\z/}.each do |l|
puts cl
l.each do |c|
puts "#{c}: #{"%4d" % hist[c]} #{Unicode::Name.correct(c)}"
end
cl = "*** Non-Latin"
end
|
I need to think about this a bit more but 2 things come to mind: |
@rjsparks, @cabo: I have committed a new change to allow bare Unicode in the Have a look at the following test results (in a different branch, I haven't merged the results to this PR yet): kesara@3998d8e#diff-c2d2944d758bb3948e7924d76edf0e05292cda738d52c51636723fbbd444cfa9 Are those test results acceptable? |
Here is another example why this needs to be fixed: |
@@ -34,6 +34,7 @@ | |||
'street', | |||
'title', | |||
'u', | |||
't', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add, both here and below: dd, dt, li, blockquote, and any other block-level elements I missed. Then add, both here and below, the "inline" elements: cref (?), em, eref (?), iref (?), relref, strong, sub, sup, tt, and xref. (I didn't think much about the cross reference stuff, but I think they can contain text.)
I have immediate use for many of these. I can wrap in <t>
for some of the block-level elements, but not the inline-level ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that with a list this long, you might consider how the latter list (bare_unicode_tags_with_notice) can be merged into the former (unicode_content_tags).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the goal of the request. But I don't think we can do that without re-engaging the RSAB for a revised interpretation, or get a doc from RSWG through the RSAB that is explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I have to agree with @rjsparks here. RFC 7997 allows non-ASCII in examples, but not e.g. in Mathematical formulæ or other cases in general text. Of course I agree with @martinthomson that it would be highly desirable to use non-ASCII for these cases, but I think that requires a change in policy.
That sounds counter-intuitive to me. If it's an error until AUTH48, how will it be used by draft authors? |
My original idea was to minimize the appearance of a change: use the workaround until RFC editing stage, and use the RFC editor's authority to decide what actually should be done. |
@cabo wrote:
I think separating Latin and non-Latin is a good first step, but for the RPC, some other statistics may also be helpful:
|
On 21. Dec 2022, at 21:17, Robert Sparks ***@***.***> wrote:
@rjsparks commented on this pull request.
In xml2rfc/util/unicode.py:
> @@ -34,6 +34,7 @@
'street',
'title',
'u',
+ 't',
I agree with the goal of the request. But I don't think we can do that without re-engaging the RSAB for a revised interpretation, or get a doc from RSWG through the RSAB that is explicit.
I’m repeating myself here: This fixes a bug in xml2rfc.
There is no policy change.
(This is unfortunate, but we have to make these steps one after another.)
Whether support for implementing the actual policy is added to xml2rfc or to some linter tool (or some diagnostic aid like the unicode-histo sketch I put in above) is orthogonal to fixing this bug.
Grüße, Carsten
|
I don't disagree with you, but it's clear that the nascent RSWG/RSAB engine feels it has a role. I misremembered the status of this PR during the RSAB meeting at 116 (claiming it was already merged), and there were several people who were very upset about it. So, yes, where Martin points is the right place to go. But I think trying to go there without nods from that new engine will make landing it take longer. |
This change allows bare Unicode in the t element. This also introduces new flag `--warn-bare-unicode` when set, `xml2rfc` warns about bare Unicode in the `<t>` element. By default, this is set to False.
Note: I gave up trying to make the tests run -- too many dependencies before I ran out of time to waste on this. But at least xml2rfc itself seems to work. Note: I gave up trying to update pycountry to 22.3.5 as listed in the xml2rfc requirements.txt, because the pycountry 22.3.5 tests failed; instead I patched xml2rfc to accept the version in pkgsrc for now. Since xml2rfc had been rendered completely nonfunctional by updates to its dependencies, I believe this is a better state than it was in even if the tests can't be run. Changes: - [`dbdda51`](ietf-tools/xml2rfc@dbdda51) - Lighter styling on internal iref links *(PR [#963](ietf-tools/xml2rfc#963) by [@martinthomson](https://github.com/martinthomson))* - [`ff1c061`](ietf-tools/xml2rfc@ff1c061) - Add support for Noto Math font *(PR [#971](ietf-tools/xml2rfc#971) by [@kesara](https://github.com/kesara))* - [`636dd08`](ietf-tools/xml2rfc@636dd08) - update CHANGELOG.md + py file versions for v3.16.0 [skip ci] *(commit by [@kesara](https://github.com/kesara))* - [`7cae8ad`](ietf-tools/xml2rfc@7cae8ad) - Remove mt.css and mt.js *(PR [#976](ietf-tools/xml2rfc#976) by [@martinthomson](https://github.com/martinthomson))* - [`ad2e035`](ietf-tools/xml2rfc@ad2e035) - Permit non-ASCII within <t> without the use of <u> *(PR [#895](ietf-tools/xml2rfc#895) by [@cabo](https://github.com/cabo))* - [`6b9aede`](ietf-tools/xml2rfc@6b9aede) - Add editorial stream *(PR [#958](ietf-tools/xml2rfc#958) by [@kesara](https://github.com/kesara))* -↘️ *addresses issue [#896](undefined) opened by [@alicerusso](https://github.com/alicerusso)* - [`5b687b1`](ietf-tools/xml2rfc@5b687b1) - Add 'auto' class for (most) parenthesized xref links *(PR [#948](ietf-tools/xml2rfc#948) by [@martinthomson](https://github.com/martinthomson))* - [`388d4b9`](ietf-tools/xml2rfc@388d4b9) - Update to pypdf>=3.2.1 on base docker file *(PR [#954](ietf-tools/xml2rfc#954) by [@kesara](https://github.com/kesara))* - [`6c9be77`](ietf-tools/xml2rfc@6c9be77) - Expand a problematic reference *(PR [#959](ietf-tools/xml2rfc#959) by [@kesara](https://github.com/kesara))* - [`b811bfd`](ietf-tools/xml2rfc@b811bfd) - update CHANGELOG.md + py file versions for v3.15.3 [skip ci] *(commit by [@kesara](https://github.com/kesara))* - [`5bbf3f7`](ietf-tools/xml2rfc@5bbf3f7) - **deps**: Move from PyPDF2 to pypdf>=3.2.1 *(PR [#953](ietf-tools/xml2rfc#953) by [@kesara](https://github.com/kesara))* - [`1381bb8`](ietf-tools/xml2rfc@1381bb8) - Move sourcecode classes *(PR [#839](ietf-tools/xml2rfc#839) by [@martinthomson](https://github.com/martinthomson))* - [`592ab81`](ietf-tools/xml2rfc@592ab81) - Only overwrite font-family when producing PDFs *(PR [#937](ietf-tools/xml2rfc#937) by [@martinthomson](https://github.com/martinthomson))* - [`a3adb84`](ietf-tools/xml2rfc@a3adb84) - Fix margin issue with dl after p inside a li *(PR [#941](ietf-tools/xml2rfc#941) by [@kesara](https://github.com/kesara))* - [`9308e40`](ietf-tools/xml2rfc@9308e40) - Update walkpdf to fix PyPDF deprecation warnings *(PR [#934](ietf-tools/xml2rfc#934) by [@kesara](https://github.com/kesara))* - [`0d3958c`](ietf-tools/xml2rfc@0d3958c) - Include OpenPGP certificates for signing the project in each release *(PR [#931](ietf-tools/xml2rfc#931) by [@dkg](https://github.com/dkg))* - [`b451ded`](ietf-tools/xml2rfc@b451ded) - Add support for Python 3.11 *(PR [#942](ietf-tools/xml2rfc#942) by [@kesara](https://github.com/kesara))* - [`9ff2476`](ietf-tools/xml2rfc@9ff2476) - Include all changes in Changelog *(PR [#944](ietf-tools/xml2rfc#944) by [@kesara](https://github.com/kesara))* - [`d86b1f2`](ietf-tools/xml2rfc@d86b1f2) - update CHANGELOG.md + py file versions for v3.15.2 [skip ci] *(commit by [@kesara](https://github.com/kesara))* - [`af9d83e`](ietf-tools/xml2rfc@af9d83e) - Skip Weasyprint 57.0 in tests *(PR [#932](ietf-tools/xml2rfc#932) by [@kesara](https://github.com/kesara))* - [`908365f`](ietf-tools/xml2rfc@908365f) - Use wcwidth to determine the monospace textual length of a string *(PR [#914](ietf-tools/xml2rfc#914) by [@Flowdalic](https://github.com/Flowdalic))* - [`0b42319`](ietf-tools/xml2rfc@0b42319) - Drop dependency on kitchen *(PR [#913](ietf-tools/xml2rfc#913) by [@Flowdalic](https://github.com/Flowdalic))* - [`1a910d9`](ietf-tools/xml2rfc@1a910d9) - Expand table columns in text output *(PR [#919](ietf-tools/xml2rfc#919) by [@kesara](https://github.com/kesara))* - [`4f9e700`](ietf-tools/xml2rfc@4f9e700) - Add Noto Sans Symbols 2 font to PDF template *(PR [#926](ietf-tools/xml2rfc#926) by [@kesara](https://github.com/kesara))* - [`18b34d8`](ietf-tools/xml2rfc@18b34d8) - Fix PDF tests *(PR [#920](ietf-tools/xml2rfc#920) by [@kesara](https://github.com/kesara))* - [`7337517`](ietf-tools/xml2rfc@7337517) - Correct spelling mistakes *(PR [#917](ietf-tools/xml2rfc#917) by [@jsoref](https://github.com/jsoref))* - [`08605de`](ietf-tools/xml2rfc@08605de) - Improve PDF generation debug logs *(PR [#907](ietf-tools/xml2rfc#907) by [@kesara](https://github.com/kesara))* - [`12a960e`](ietf-tools/xml2rfc@12a960e) - Use specified font families on SVG *(PR [#910](ietf-tools/xml2rfc#910) by [@kesara](https://github.com/kesara))* - [`70de803`](ietf-tools/xml2rfc@70de803) - Use noto fonts for non-latin unicode monospaced characters *(PR [#909](ietf-tools/xml2rfc#909) by [@kesara](https://github.com/kesara))* - [`dd2b0fe`](ietf-tools/xml2rfc@dd2b0fe) - Add bottom margin to .artwork > pre *(PR [#912](ietf-tools/xml2rfc#912) by [@kesara](https://github.com/kesara))* - [`58706b8`](ietf-tools/xml2rfc@58706b8) - Remove redundant code labels from CSS *(PR [#916](ietf-tools/xml2rfc#916) by [@kesara](https://github.com/kesara))* - [`055d64d`](ietf-tools/xml2rfc@055d64d) - Add xml2rfc class to HTML body element *(PR [#847](ietf-tools/xml2rfc#847) by [@martinthomson](https://github.com/martinthomson))* - [`7fec225`](ietf-tools/xml2rfc@7fec225) - Add classes to xref *(PR [#867](ietf-tools/xml2rfc#867) by [@martinthomson](https://github.com/martinthomson))* - [`cc6b083`](ietf-tools/xml2rfc@cc6b083) - Fix table colspan issue in text format *(PR [#886](ietf-tools/xml2rfc#886) by [@kesara](https://github.com/kesara))* - [`2475447`](ietf-tools/xml2rfc@2475447) - Include the published date when ipr is none *(PR [#897](ietf-tools/xml2rfc#897) by [@kesara](https://github.com/kesara))* - [`20cdb44`](ietf-tools/xml2rfc@20cdb44) - Fix odd page break inside rows in PDF output *(PR [#879](ietf-tools/xml2rfc#879) by [@kesara](https://github.com/kesara))* - [`2c9dfaf`](ietf-tools/xml2rfc@2c9dfaf) - Return orgnization for orgnization only contacts *(PR [#837](ietf-tools/xml2rfc#837) by [@kesara](https://github.com/kesara))* - [`9821dc6`](ietf-tools/xml2rfc@9821dc6) - RTL unicode issue in PDF *(PR [#884](ietf-tools/xml2rfc#884) by [@kesara](https://github.com/kesara))* - [`c67f5fd`](ietf-tools/xml2rfc@c67f5fd) - Align center aligned ASCII art correctly *(PR [#838](ietf-tools/xml2rfc#838) by [@kesara](https://github.com/kesara))* - [`701d5ce`](ietf-tools/xml2rfc@701d5ce) - Add github issue templates *(commit by [@kesara](https://github.com/kesara))* - [`c6343a9`](ietf-tools/xml2rfc@c6343a9) - Update WeasyPrint *(PR [#802](ietf-tools/xml2rfc#802) by [@kesara](https://github.com/kesara))* - [`95dba00`](ietf-tools/xml2rfc@95dba00) - Fix typo in README file *(PR [#843](ietf-tools/xml2rfc#843) by [@bkmgit](https://github.com/bkmgit))* - [`0f06e27`](ietf-tools/xml2rfc@0f06e27) - Prevent submission date warnings for RFCs *(PR [#842](ietf-tools/xml2rfc#842) by [@kesara](https://github.com/kesara))* - [`e5c45d4`](ietf-tools/xml2rfc@e5c45d4) - Add an option to disable rfc-local.css link *(PR [#840](ietf-tools/xml2rfc#840) by [@martinthomson](https://github.com/martinthomson))* - [`41b177a`](ietf-tools/xml2rfc@41b177a) - Fix tests to adapt bib.ietf.org *(PR [#852](ietf-tools/xml2rfc#852) by [@kesara](https://github.com/kesara))* - [`c9b9d09`](ietf-tools/xml2rfc@c9b9d09) - Update valid tests for --no-rfc-local option *(PR [#854](ietf-tools/xml2rfc#854) by [@kesara](https://github.com/kesara))* - [`63de72a`](ietf-tools/xml2rfc@63de72a) - Use bib.ietf.org for citations *(PR [#804](ietf-tools/xml2rfc#804) by [@kesara](https://github.com/kesara))* - [`ad44bb8`](ietf-tools/xml2rfc@ad44bb8) - Render unicode characters in SVG elements correctly *(PR [#832](ietf-tools/xml2rfc#832) by [@kesara](https://github.com/kesara))* - [`6938d80`](ietf-tools/xml2rfc@6938d80) - Drop support for Python 3.6 *(PR [#796](ietf-tools/xml2rfc#796) by [@kesara](https://github.com/kesara))* - [`47270ba`](ietf-tools/xml2rfc@47270ba) - Handle date type errors gracefully *(PR [#795](ietf-tools/xml2rfc#795) by [@cabo](https://github.com/cabo))* - [`79fd4d9`](ietf-tools/xml2rfc@79fd4d9) - Stop crashing when author element doesn't have a name *(PR [#800](ietf-tools/xml2rfc#800) by [@cabo](https://github.com/cabo))* - [`d5f8a1c`](ietf-tools/xml2rfc@d5f8a1c) - Use bib.ietf.org for citations *(PR [#799](ietf-tools/xml2rfc#799) by [@kesara](https://github.com/kesara))* - [`b94d6bb`](ietf-tools/xml2rfc@b94d6bb) - **deps**: Update Python dependencies *(PR [#797](ietf-tools/xml2rfc#797) by [@kesara](https://github.com/kesara))* - [`f73ece7`](ietf-tools/xml2rfc@f73ece7) - Update setuptools metadata *(PR [#789](ietf-tools/xml2rfc#789) by [@kesara](https://github.com/kesara))* - [`1643d68`](ietf-tools/xml2rfc@1643d68) - Display long ASCII art correctly in PDF *(PR [#788](ietf-tools/xml2rfc#788) by [@kesara](https://github.com/kesara))* - [`51e8b24`](ietf-tools/xml2rfc@51e8b24) - Add support for Python 3.10 *(PR [#772](ietf-tools/xml2rfc#772) by [@dkg](https://github.com/dkg))* - [`46399d7`](ietf-tools/xml2rfc@46399d7) - Implement emboldening primary iref entries *(PR [#778](ietf-tools/xml2rfc#778) by [@cabo](https://github.com/cabo))* - [`e0095fd`](ietf-tools/xml2rfc@e0095fd) - Remove Python version specific test results *(PR [#780](ietf-tools/xml2rfc#780) by [@kesara](https://github.com/kesara))* - [`42568b3`](ietf-tools/xml2rfc@42568b3) - evaluate date.today() on class init, not import *(PR [#774](ietf-tools/xml2rfc#774) by [@jennifer-richards](https://github.com/jennifer-richards))* - [`07ef95e`](ietf-tools/xml2rfc@07ef95e) - Fix warnings in text and manpage *(PR [#775](ietf-tools/xml2rfc#775) by [@kesara](https://github.com/kesara))* - [`6b32a5d`](ietf-tools/xml2rfc@6b32a5d) - Render text without toc *(PR [#766](ietf-tools/xml2rfc#766) by [@cabo](https://github.com/cabo))* - [`384399c`](ietf-tools/xml2rfc@384399c) - Display ASCII names for authors in references *(PR [#771](ietf-tools/xml2rfc#771) by [@kesara](https://github.com/kesara))* - [`8436c2f`](ietf-tools/xml2rfc@8436c2f) - Make index sort case insensitive *(PR [#763](ietf-tools/xml2rfc#763) by [@kesara](https://github.com/kesara))* - [`0884e8d`](ietf-tools/xml2rfc@0884e8d) - Don't attempt to select initials when fullname contains non Latin characters *(PR [#760](ietf-tools/xml2rfc#760) by [@kesara](https://github.com/kesara))* - [`9e12093`](ietf-tools/xml2rfc@9e12093) - Make long sourcecode sections breakable *(PR [#764](ietf-tools/xml2rfc#764) by [@kesara](https://github.com/kesara))* - [`24406e5`](ietf-tools/xml2rfc@24406e5) - Bug fix in tests/input/draft-miek-test.v3.xml *(PR [#738](ietf-tools/xml2rfc#738) by [@kesara](https://github.com/kesara))* - [`72255eb`](ietf-tools/xml2rfc@72255eb) - Pin PyPDF2 to 2.16.* versions *(PR [#762](ietf-tools/xml2rfc#762) by [@kesara](https://github.com/kesara))* - [`8fc7efb`](ietf-tools/xml2rfc@8fc7efb) - Update deprecated tox configuration option *(PR [#746](ietf-tools/xml2rfc#746) by [@kesara](https://github.com/kesara))*
Permit non-ASCII within
<t>
without the use of<u>
See RSAB decision and RSWG discussion.
Original text on PR:
(See discussion in auth48archive and others.)