Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble normalizing the indention of new lines in Glyph Notes #71

Closed
colinmford opened this issue Jul 8, 2020 · 12 comments · Fixed by #72
Closed

Trouble normalizing the indention of new lines in Glyph Notes #71

colinmford opened this issue Jul 8, 2020 · 12 comments · Fixed by #72

Comments

@colinmford
Copy link
Contributor

colinmford commented Jul 8, 2020

Hi,

We've been noticing that if a glyph has line returns in a note, the Normalizer has some trouble formatting it.

For instance, here is how a note started out after it was added, and its first Normalization:

<note>
  	Line1
  	Line2
</note>

and with each round of normalization, all the lines after the first get indented an extra level

Once:

<note>
  	Line1
  		  	Line2
</note>

Twice:

<note>
  	Line1
  		  		  	Line2
</note>

Thrice:

<note>
  	Line1
  		  		  		  	Line2
</note>

etc.

@colinmford
Copy link
Contributor Author

colinmford commented Jul 8, 2020

I think I tracked it down to XMLWriter.text() and XMLWriter.raw()

def text(self, text):
text = text.strip()
text = xmlEscapeText(text)
paragraphs = []
for paragraph in text.splitlines():
if not paragraph:
paragraphs.append("")
else:
paragraph = textwrap.wrap(
paragraph.rstrip(),
width=xmlTextMaxLineLength,
expand_tabs=False,
replace_whitespace=False,
drop_whitespace=False,
break_long_words=False,
break_on_hyphens=False
)
paragraphs.extend(paragraph)
for line in paragraphs:
self.raw(line)

def raw(self, line):
if self._indentLevel:
i = xmlIndent * self._indentLevel
line = i + line
self._lines.append(line)

if the text starts out as something like this

<note>
  	Line1
  	Line2
  	Line3
</note>

it enters the XMLWriter.text() like this

"\n\tLine1\n\tLine2\n\tLine3\n"

it's immediately stripped on line 1181, making it

"Line1\n\tLine2\n\tLine3"

then, it's split into lines on line 1184, making the lines

("Line1", "\tLine2", "\tLine3")

At the end of the function, on line 1199, each line goes to XMLWriter.raw() and gets re-indented on line 1172. Making the self._lines list this:

["\tLine1", "\t\tLine2", "\t\tLine3"]

So, in #72 I use re.split() to split the text on this regex: r"[\n|\r|\r\n]\t*" instead of using .splitlines() on line 1184. It results in a nice clean list:

("Line1", "Line2", "Line3")

If there's a better way I would be open to trying again, but I think we can agree that the current behavior is not correct.

@alerque
Copy link

alerque commented Jul 8, 2020

Why is the case of one tab being treated specially here? Why is whitespace internal to a free-form content string being munged at all? If this is to normalize indentation shouldn't all strings be parsed to find the shortest whitespace prefix first and then replace all occurrences of that whitespace string at the start of each line with the desired normalized indent?

@colinmford
Copy link
Contributor Author

@alerque This test appears to making sure that white space isn't removed here, and I didn't want to contradict it.

If the decision is made that this is no longer expected behavior, then I would be totally ok with removing all whitespace prefixes. As a first-time contributor I didn't want to make that assumption though.

element = ET.fromstring(
"<note> Line1 \t\n\n Line3\t </note>")
writer = XMLWriter(declaration=None)
_normalizeGlifNote(element, writer)
self.assertEqual(
writer.getText(),
"<note>\n\tLine1\n\t\n\t Line3\n</note>")

@colinmford
Copy link
Contributor Author

colinmford commented Jul 8, 2020

@alerque Ah, ok I think I take your meaning.

textwrap.dedent() does what you're talking about, and fixes my issue, but it seems to interfere with the test I posted above... (there's 3 spaces* at the start of Line1, and 4 at the start of Line3, so it leaves Line3 with an indent of 1 space, instead of all 4 in the test):

textwrap.dedent("   Line1  \t\n\n    Line3\t  ")
>>> 'Line1  \t\n\n Line3\t  '

Maybe that test needs to be reconsidered? Or we might need a custom function based on dedent.

  • 3 spaces sounds like a mistake, no?

@colinmford
Copy link
Contributor Author

Since @moyogo wrote that test (according to Blame), maybe he can weigh in on what that test is specifically testing for?

@moyogo
Copy link
Collaborator

moyogo commented Jul 8, 2020

This was checking the behaviour of the ufoNormalizer at that time.
If the behaviour changes, so should the test.

@colinmford
Copy link
Contributor Author

@moyogo I don't really want to change other Normalizer behavior so I would like to keep all tests as-is as much as possible, but if you think that test is outdated we should discuss how it should change... but to rephrase my question:

  1. Is it possibly a mistake that in
    "<note> Line1 \t\n\n Line3\t </note>")
    the first line has 3 spaces and the other has 4? i.e. were you intending to test for a 4-space indention vs tabs, and one of the spaces accidentally was removed?
  2. Is it the intention of this test that all whitespace prefixing all lines beyond the first line be kept intact?

I would really like to come up with an answer to this issue that works for everyone!

@moyogo
Copy link
Collaborator

moyogo commented Jul 24, 2020

@colinmford The behaviour of the normalizer needs to change since it changes notes that have already been normalized. I think the difference in the number of space was supposed to replicate the case where a user adds 3 spaces at the beginning of their note.

@benkiel
Copy link
Contributor

benkiel commented Jul 24, 2020

My opinion is that it should keep those three spaces, as they may have semantic meaning to the designer, so I would do the least invasive thing —that you did first— or managing the extra tabs, I think that's the right solution to this, as it's a narrow, specific issue. Does that make sense @moyogo?

@benkiel
Copy link
Contributor

benkiel commented Jul 24, 2020

@colinmford I would just test for the case of a line with a user added tab, so something like:

<note>
  		Line1
  		  	Line2
  		Line3
</note>

The normalizer should keep the tab of Line2

@colinmford
Copy link
Contributor Author

colinmford commented Aug 6, 2020

Hi @benkiel @moyogo

I updated the method and now all tests are passing (including the additional test case you mentioned @benkiel, and a couple other related ones).

More info in #72.

Let me know if you think it's suitable!

@colinmford
Copy link
Contributor Author

colinmford commented Aug 26, 2020

Hi @benkiel @moyogo please don't forget to review #72, this is a really annoying issue I would like to fix (or is there someone else that should review it?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants