Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to preserve special character escapes (for #51) #71

Merged
merged 11 commits into from
Jun 10, 2024

Conversation

SichangHe
Copy link
Contributor

This is my stab at #51.

This implementation adds a derivative cmark_resume_with_source_range_and_options function that behaves like cmark_resume_with_options but preserves special character escapes, i.e., only escapes if the Markdown source escapes, [`Vec`][`Vec`] and \[`Vec`\]\[`Vec`\]. To achieve that, this implementation:

  1. Takes the Markdown source text as an additional argument, and takes an optional "source range" accompanying each event to map back to the source. This is necessary because otherwise we can't know what the escapes were. The source ranges come with pulldown_cmark::OffsetIter; it is optional because we anticipate that the user might manipulate the events, in which case additional events would not has any accompanying ranges.
  2. Adds a field last_event_end_index to State to keep track of the end index of previous events. A gap between the start of the current Event::Text and the end of the previous event indicates there is something in between, which could be a \ or markup like [ before a link or > before a quote. In case of an event being deleted during manipulation, there is an artificial gap, but I bet it would be extremely rare if the source before ends with \ by accident.
  3. In the case of a gap before an Event::Text, this implementation checks if the last character in the gap is \; if not, it performs a hack to prevent the escape. The hack is to set State::is_in_code_block to true. Maybe we should also do back off (check if the \ is escaped, etc.).

Regarding the complex function signature, we could definitely make a trait and implement it for both Iterator<Item = (E, Option<Range<usize>>)> and Iterator<Item = (E, Range<usize>)> so people can simply pass in pulldown_cmark::OffsetIter without the .map(|(e, r)| (e, Some(r))).

And, yes, the names could be improved.

What do you guys think? @Byron, @ytmimi.

@Byron
Copy link
Owner

Byron commented May 25, 2024

Wow, this is a big one, thanks so much for tackling this!

All tests pass, additional tests have been added, and the spec now passes 435 of 649 tests, previously it was 435 as well.

Tomorrow I will take a closer look, while hoping that @ytmimi can also join in as I am not too invested into this crate at the moment so I don't necessarily know what's desirable, all I can do is check code.

Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for tackling this!

I have a few initial comments, as I ran out of time during review, I couldn't get through all the changes just yet. Disposition is merge, I'd just like to work on a few details.

Note to self: don't forget to make this a breaking release due to the added field in state.

src/lib.rs Outdated
@@ -62,6 +69,8 @@ pub struct State<'a> {
pub current_shortcut_text: Option<String>,
/// A list of shortcuts seen so far for later emission
pub shortcuts: Vec<(String, String, String)>,
/// Index of the end of the last event.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify in the docstring what this index points into? It should probably contain docstring-links to the function that matters to it, where I believe it would point into source.

src/lib.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,112 @@
use super::*;

/// Serialize a stream of [pulldown-cmark-Events][Event] each with source string into a string-backed buffer.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Serialize a stream of [pulldown-cmark-Events][Event] each with source string into a string-backed buffer.
/// Serialize a stream of [pulldown-cmark-Events][Event] while preserving escape characters using `source` for lookup.

///
/// Different from [`cmark_resume_with_options`], which always escape Markdown special characters like `#` or `[`,
/// this function preserves special character escapes.
/// That is, it only escapes a special character if it is escaped in the source.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be clearer? With the sentence preceding it, it feels like this says the opposite - it's supposed to preserve special character escapes?

src/source_range.rs Outdated Show resolved Hide resolved
src/source_range.rs Show resolved Hide resolved
src/source_range.rs Outdated Show resolved Hide resolved
@Byron Byron added the question Further information is requested label May 28, 2024
@ytmimi
Copy link
Contributor

ytmimi commented May 28, 2024

I can try to review this PR, though I probably won't be free to take a look at this until later this week

@Byron
Copy link
Owner

Byron commented May 28, 2024

Thanks for letting me know - I will wait with merging after your review is in, thanks a lot for your help!

Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this lovely contribution, it's much appreciated and great work!

I have one more question regarding the added tests: Can it be that these are mostly a duplicate of the existing tests?

If so, would it be possible to adjust the existing fmts (in the old tests) to run the source tests instead (or in addition)? It would definitely be preferable to not introduce code-duplication here.

src/source_range.rs Outdated Show resolved Hide resolved
};

fn fmts(s: &str) -> (String, State<'_>) {
let mut buf = String::new();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here fmts_with_options() could be called, right? That would avoid this duplication.

Copy link
Contributor Author

@SichangHe SichangHe May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would assume cmark_with_source_range calls cmark_resume_with_source_range_and_options with the default option. This not is defined in the test module itself, so I would avoid such an assumption.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point! Thank you.

tests/source_range_fmt.rs Outdated Show resolved Hide resolved
tests/source_range_fmt.rs Outdated Show resolved Hide resolved
tests/source_range_fmt.rs Outdated Show resolved Hide resolved
@SichangHe
Copy link
Contributor Author

Hi @Byron, I have attempted to address your clean-up suggestions. You might still want to double check the wording for some of the docstrings, etc., and maybe come up with better names for the new public functions…

Copy link
Owner

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! Looks great now and I am happy we could get rid of most of the duplication.

I will wait for @ytmimi as another set of eyes, but I also ask @SichangHe to ping me for a merge in case that doesn't happen within a week or so as I think whenever the second review happens, changes can always be made in a second PR at least as long the public API isn't affected.

};

fn fmts(s: &str) -> (String, State<'_>) {
let mut buf = String::new();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point! Thank you.

This is marked as breaking change as the field itself is new in `State`
and could be breaking for some.
Comment on lines +6 to +7
/// Different from [`cmark_resume_with_options`], which always escape Markdown special characters like `#` or `[`,
/// this function only escapes a special character if it is escaped in `source`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we could avoid adding escapes in the other cmark variants as well, or does this fundamentally require the source?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-heller, this variant "preserves escapes", i.e., adds escapes based on whether the source adds escapes, so it requires the source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should've been more explicit, sorry. Obviously we can't preserve escapes without the source, but could we prevent adding them without it? For instance, keep [foo] as [foo] but turn \[foo\] to [foo]? Or would that run in to correctness issues

@Byron Byron removed the question Further information is requested label Jun 10, 2024
@Byron Byron merged commit 386e663 into Byron:main Jun 10, 2024
1 check passed
@Byron
Copy link
Owner

Byron commented Jun 10, 2024

Thanks again for this fantastic contribution!

Here is the new release: https://github.com/Byron/pulldown-cmark-to-cmark/releases/tag/v14.0.0 .

max-heller added a commit to max-heller/mdbook-pandoc that referenced this pull request Jun 10, 2024
…ce (#95)

Using Byron/pulldown-cmark-to-cmark#71, attempts
to preserve escape characters (or lack thereof) from the Markdown
source, e.g. keeping ``[`Vec`]`` as ``[`Vec`]`` and ``\[`Vec`\]`` as
``\[`Vec`\]``.

Closes #91
@ytmimi
Copy link
Contributor

ytmimi commented Jun 12, 2024

Sorry I didn't get a chance to look this one over, but glad to see that it got merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants