Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for formatting contents of <note> field #85

Closed
cmyr opened this issue Aug 2, 2021 · 6 comments
Closed

Proposal for formatting contents of <note> field #85

cmyr opened this issue Aug 2, 2021 · 6 comments

Comments

@cmyr
Copy link
Contributor

cmyr commented Aug 2, 2021

I would like to acknowledge that this particular issue is likely of little significance, and I am not hugely invested in my position. Ultimately, to me, this is about something like "correctness".

Breaking this out from #84, now that I have had some time to dig into the code and have a more informed opinion.

current status

The current situation: currently, ufonormalizer does a bunch of opinionated formatting of the contents of <note>. Essentially it breaks the text into 72 character lines, and then indents each of these lines to the level of the parent <note> element, plus 1. This current text, so formatted, looks like:

<note>
	currently, `ufonormalizer` does a bunch of opinionated formatting of
	the contents of `&gt;note&lt;'`. Essentially it breaks the text into 
	72 character lines, and then indents each of these lines to the level 
	of the parent `&gt;note&lt;` element, plus 1. This current text, so 
	formatted, looks like:
</note>

This has the advantage that it looks nice, but it has the disadvantage that it is destructive: the actual text of the note will change, and other tools don't have any way of telling, for a given UFO, whether or not the contents of the file have been normalized (and so should be "denormalized"?) or if they are the original intended data.

The problem with this approach is that it modifies the user's data. Although this may not matter often, it can certainly matter sometimes. A trivial example of where this would matter is if the note includes something with significant whitespace, like a bit of python code, or a boxart diagram, or a poem.

Proposal

Basically: if the note contains only whitespace, we discard it; this is the current behaviour. Otherwise we escape the string as usual, and just write a "simple element".

This does mean that the XML will be "uglier" in the case where there are long unbroken lines in the note, but there are a bunch of advantages:

  • guaranteed no surprises
  • we delete code and do less work
  • existing normalized UFOs don't change: we just transparently pass through the existing formatting

In this world, the XML for our sample paragraph looks like (with the parent tags for context),

<glif>
    <note>currently, `ufonormalizer` does a bunch of opinionated formatting of the contents of `&lt;note&gt;`. Essentially it breaks the text into 72 character lines, and then indents each of these lines to the level of the parent `&lt;note&gt;` element, plus 1. This current text, so formatted, looks like:</note>
</glif>

And if it contained some manual linebreaks, those lines would not respect the outer indent level, like

<glif>
    <note>currently, `ufonormalizer` does a bunch of opinionated formatting of the contents of `&lt;note&gt;`.
Essentially it breaks the text into 72 character lines, and then indents each of these lines to the level of the parent `&lt;note&gt;` element, plus 1.
This current text, so formatted, looks like:</note>
</glif>

An alternative I had considered is stripping leading and trailing whitespace and then always putting the note body on the next line, with a new intent level, but ultimately I think this is unnecessary fanciness and extra logic for minimal concrete gain.

... okay, I'm done. If anyone has any thoughts, feel free to chime in :)

@benkiel
Copy link
Contributor

benkiel commented Aug 2, 2021

I am ok with this change. Will wait for others to chime in (@miguelsousa @typesupply), but if ok, would you be able to PR the changes (and tests)?

@typesupply
Copy link
Contributor

I'm fine with this, but I have another idea that would retain the current behavior and enable the proposed behavior:

The line length is governed by the xmlTextMaxLineLength variable in the module. We could allow this to be overridden during the public normalization functions and those would pass the setting to XMLWriter. From there, if the desired xmlTextMaxLineLength value is None or 0 the text wouldn't be passed to textwrap for wrapping and wpould be completely unchanged.

I have no strong feeling about this so I have no objection to the proposal above to not pass any text to textwrap like we currently do.

@cmyr
Copy link
Contributor Author

cmyr commented Aug 2, 2021

I am ok with this change. Will wait for others to chime in (@miguelsousa @typesupply), but if ok, would you be able to PR the changes (and tests)?

If we agree I am happy to write the patch.

@miguelsousa
Copy link
Member

No objections from me.

@benkiel
Copy link
Contributor

benkiel commented Aug 2, 2021

Alright, @cmyr, however you want to write the PR (do see what @typesupply proposes above) is cool. I think we have enough agreement on this to make the change.

@cmyr
Copy link
Contributor Author

cmyr commented Aug 2, 2021

Okay, I'll get this ready shortly. I'm not totally sure about the change to allow specifying a custom wrap width; it feels like in practice that would only possibly be used for this case? it also means that there are two codepaths to maintain.

So for now I'll start prepare a patch that does the dumbest thing possible (no line formatting, ever) and if there is a strong desire for this to be an option I can make that change during review.

Thanks all!

cmyr added a commit to cmyr/ufoNormalizer that referenced this issue Aug 2, 2021
This changes the behaviour around normalizing the <note> field of .glif
files. Previously the contents of this element would be broken into
lines and reindented, but this was a potentially destructive operation.

With this patch, any content of a <note> element is passed through
unchanged.

For more discussion, see:
unified-font-object#85
cmyr added a commit to cmyr/ufoNormalizer that referenced this issue Aug 3, 2021
This changes the behaviour around normalizing the <note> field of .glif
files. Previously the contents of this element would be broken into
lines and reindented, but this was a potentially destructive operation.

With this patch, any content of a <note> element is passed through
unchanged.

For more discussion, see:
unified-font-object#85
benkiel pushed a commit that referenced this issue Aug 3, 2021
* Do not modify contents of <note> tag

This changes the behaviour around normalizing the <note> field of .glif
files. Previously the contents of this element would be broken into
lines and reindented, but this was a potentially destructive operation.

With this patch, any content of a <note> element is passed through
unchanged.

For more discussion, see:
#85

* Address code review comments

- make sure we're escaping XML correctly, and test that
- remove dedent_tabs function
- add a test for correctly encoding 'é'
@benkiel benkiel closed this as completed Aug 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants