be more selective about escaping special characters #122
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a refinement of #118 (thanks @jsm28!).
The current solution escapes every instance of every special character. Although conservative, this can lead to unnecessary escaping. For example,
In our use case, our input content is technical documentation (many special characters) and the content is subsequently edited by humans, so it is desirable to minimize unnecessary escaping.
This pull request seeks to strike a balance between the following:
The tests cover a variety of required and unnecessary escaping cases, which can hopefully avoid any future regressions in escaping behavior.
This approach is not foolproof. Markdownify processes each text fragment in isolation, and thus the beginning of a particular string might not be the beginning of an output line. As a result, patterns are not applied across text fragment boundaries (such as adjacent
<span>
elements). Handling this probably requires a larger rework of the text processing code.I also noticed that the original code had
def text_misc()
instead ofdef test_misc()
, which caused the tests never to run.