-
Notifications
You must be signed in to change notification settings - Fork 878
Additional paragraph when using Markdown in raw HTML #595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Actually this is the correct behavior. The problem is that the Markdown-in-raw-HTML behavior is not well documented. For example, it may help to understand that a As a reminder, the rules state:
That rule is strictly enforced. You only avoid the extra (unwanted) Therefore, this is what you want as input: <div markdown="1">
<p markdown="1">Hello _World!_</p>
</div> |
I had already suspected that this might be an user error. Thanks for your answer with details about the background. |
Is the following expected to work? html = md.convert("""<div markdown="1">
<div markdown="1">
<p markdown="1">Hello _World!_<p>
</div>
</div>""") I get an Edit: This is #584 probably. |
We have a policy that an error should never be raised when parsing, which makes this a bug regardless of what behavior is expected there. Could you provide the error? Unfortunately, we don't use a proper HTML parser, but a simplistic set of regex (for historical reasons I won't get into here). For standard Markdown, that is sufficient, but when using the |
Is the issue because the opening and closing tags of |
I just tried it with the tags being on separate lines and I got the same error. The error is
|
Cool. That raw markdown parsing is a bit of a mess. I'll take a look at it and at least figure out why it is failing. I still haven't had time to come up with a final solution on this though. |
I'm wondering if it makes sense to replace the contents of the RawHtml preprocessor with something like this, which uses the |
Maybe, I haven't played with your implementation on this, but I'd love anything that is easier to navigate then what we currently have for raw processing. It is one of the reasons #585 is still open; I'm dreading digging into that code so I keep procrastinating. |
Just some investigation (based on my experience with raw HTML), this works:
The parsing of HTML has always been a little janky. It is really sensitive to spacing of the HTML elements and such. Even the markdown content spacing between elements can be weird. Anyways, I haven't dug deep enough into the parser to find the actual failure yet, but I plan to. To be honest rewriting all of this HTML handling and block handling is going to be key to fixing a lot of issues with Python Markdown. The current block processors and the HTML processing really needs an overhaul as they don't handle things very well. Nested indented code blocks lose new lines when they have multiple consecutive new lines. Raw HTML is kind of funky. I kind of feel HTML processing should be a block processor. Or maybe a line processor (which doesn't currently exist). Python Markdown sorely needs a way to identify the beginning of a block and be able to process the lines until it knows where the block's end is instead of relying on Anyways, hopefully this issue will be patchable, but I feel the next iteration needs an overhaul in this area. |
I agree. In fact, this was part of the original plan for 3.0. However, I just don't have the time to do the work right now and don't expect to be able to any time in the foreseeable future. |
Uh oh!
There was an error while loading. Please reload this page.
First thank you for this implementation! 👍
I provide a minimal working example of some unexpected behavior I encountered:
The output was
But I expected
My workaround
is to use two line breaks(edit)does not include the additional paragraph, but the Markdown is not replaced.
The text was updated successfully, but these errors were encountered: