-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to preserve special character escapes (for #51) #71
Conversation
new `State::last_event_end_index` to track event end source_mapping → source_range
Wow, this is a big one, thanks so much for tackling this! All tests pass, additional tests have been added, and the spec now passes 435 of 649 tests, previously it was 435 as well. Tomorrow I will take a closer look, while hoping that @ytmimi can also join in as I am not too invested into this crate at the moment so I don't necessarily know what's desirable, all I can do is check code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for tackling this!
I have a few initial comments, as I ran out of time during review, I couldn't get through all the changes just yet. Disposition is merge, I'd just like to work on a few details.
Note to self: don't forget to make this a breaking release due to the added field in state
.
src/lib.rs
Outdated
@@ -62,6 +69,8 @@ pub struct State<'a> { | |||
pub current_shortcut_text: Option<String>, | |||
/// A list of shortcuts seen so far for later emission | |||
pub shortcuts: Vec<(String, String, String)>, | |||
/// Index of the end of the last event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify in the docstring what this index points into? It should probably contain docstring-links to the function that matters to it, where I believe it would point into source
.
src/source_range.rs
Outdated
@@ -0,0 +1,112 @@ | |||
use super::*; | |||
|
|||
/// Serialize a stream of [pulldown-cmark-Events][Event] each with source string into a string-backed buffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Serialize a stream of [pulldown-cmark-Events][Event] each with source string into a string-backed buffer. | |
/// Serialize a stream of [pulldown-cmark-Events][Event] while preserving escape characters using `source` for lookup. |
src/source_range.rs
Outdated
/// | ||
/// Different from [`cmark_resume_with_options`], which always escape Markdown special characters like `#` or `[`, | ||
/// this function preserves special character escapes. | ||
/// That is, it only escapes a special character if it is escaped in the source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be clearer? With the sentence preceding it, it feels like this says the opposite - it's supposed to preserve special character escapes?
I can try to review this PR, though I probably won't be free to take a look at this until later this week |
Thanks for letting me know - I will wait with merging after your review is in, thanks a lot for your help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this lovely contribution, it's much appreciated and great work!
I have one more question regarding the added tests: Can it be that these are mostly a duplicate of the existing tests?
If so, would it be possible to adjust the existing fmts
(in the old tests) to run the source tests instead (or in addition)? It would definitely be preferable to not introduce code-duplication here.
}; | ||
|
||
fn fmts(s: &str) -> (String, State<'_>) { | ||
let mut buf = String::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here fmts_with_options()
could be called, right? That would avoid this duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would assume cmark_with_source_range
calls cmark_resume_with_source_range_and_options
with the default option. This not is defined in the test module itself, so I would avoid such an assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid point! Thank you.
Hi @Byron, I have attempted to address your clean-up suggestions. You might still want to double check the wording for some of the docstrings, etc., and maybe come up with better names for the new public functions… |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Looks great now and I am happy we could get rid of most of the duplication.
I will wait for @ytmimi as another set of eyes, but I also ask @SichangHe to ping me for a merge in case that doesn't happen within a week or so as I think whenever the second review happens, changes can always be made in a second PR at least as long the public API isn't affected.
}; | ||
|
||
fn fmts(s: &str) -> (String, State<'_>) { | ||
let mut buf = String::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid point! Thank you.
This is marked as breaking change as the field itself is new in `State` and could be breaking for some.
c25bf76
to
6c8297c
Compare
/// Different from [`cmark_resume_with_options`], which always escape Markdown special characters like `#` or `[`, | ||
/// this function only escapes a special character if it is escaped in `source`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we could avoid adding escapes in the other cmark
variants as well, or does this fundamentally require the source?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@max-heller, this variant "preserves escapes", i.e., adds escapes based on whether the source adds escapes, so it requires the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should've been more explicit, sorry. Obviously we can't preserve escapes without the source, but could we prevent adding them without it? For instance, keep [foo]
as [foo]
but turn \[foo\]
to [foo]
? Or would that run in to correctness issues
Thanks again for this fantastic contribution! Here is the new release: https://github.com/Byron/pulldown-cmark-to-cmark/releases/tag/v14.0.0 . |
…ce (#95) Using Byron/pulldown-cmark-to-cmark#71, attempts to preserve escape characters (or lack thereof) from the Markdown source, e.g. keeping ``[`Vec`]`` as ``[`Vec`]`` and ``\[`Vec`\]`` as ``\[`Vec`\]``. Closes #91
Sorry I didn't get a chance to look this one over, but glad to see that it got merged! |
This is my stab at #51.
This implementation adds a derivative
cmark_resume_with_source_range_and_options
function that behaves likecmark_resume_with_options
but preserves special character escapes, i.e., only escapes if the Markdown source escapes,[`Vec`]
⇒[`Vec`]
and\[`Vec`\]
⇒\[`Vec`\]
. To achieve that, this implementation:pulldown_cmark::OffsetIter
; it is optional because we anticipate that the user might manipulate the events, in which case additional events would not has any accompanying ranges.last_event_end_index
toState
to keep track of the end index of previous events. A gap between the start of the currentEvent::Text
and the end of the previous event indicates there is something in between, which could be a\
or markup like[
before a link or>
before a quote. In case of an event being deleted during manipulation, there is an artificial gap, but I bet it would be extremely rare if the source before ends with\
by accident.Event::Text
, this implementation checks if the last character in the gap is\
; if not, it performs a hack to prevent the escape. The hack is to setState::is_in_code_block
totrue
. Maybe we should also do back off (check if the\
is escaped, etc.).Regarding the complex function signature, we could definitely make a trait and implement it for both
Iterator<Item = (E, Option<Range<usize>>)>
andIterator<Item = (E, Range<usize>)>
so people can simply pass inpulldown_cmark::OffsetIter
without the.map(|(e, r)| (e, Some(r)))
.And, yes, the names could be improved.
What do you guys think? @Byron, @ytmimi.