-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to self modify parse method when htmltomarkdown #353
Comments
@gaofeiseu, I tried the HTML you gave and the converted markdown seems to be correct, the markdown is the first line the HTML is last:
Can you create a small test with options which does not work for you? You can use the sample as a starting point and add the configuration you use in your code: |
@gaofeiseu, sorry, I just realized what you really wanted was to add "https:" prefix to the image URL if it is missing. The easiest way to do this in the current implementation is to use the standard HTML parser to get the Markdown, then parse the Markdown and replace the URLs in the AST with what you want before passing the AST document node to formatter, which will output the changed Markdown. The sample FormatterWithMods.java shows how to change the URLs in the AST so that the formatted Markdown has replaced URLs. All you need to do is replace the logic in FormatterWithMods.java: Lines 68-71 with:
To have all URLs starting with |
@vsch |
@gaofeiseu, what you need to do is simply combine HTML to Markdown then parse the Markdown to AST, replace the URLs in the AST and render the AST as Markdown using the formatter. It is combining the two samples I mentioned into a single process. If you take the modified FormatterWithMods you can see the needed steps: FormatterWithMods2.java The current version of HTML to Markdown implementation is not extensible so there is no easy way to modify the markdown it generates. I am working on a new version that supports extensions similar to HTML Renderer and Markdown Formatter which will allow some customization to generated Markdown without needing to re-parse the markdown but this is not yet available. |
@gaofeiseu, new module with extension API for HTML to Markdown conversion implemented. See #313, last comment has a link to a sample which modifies some link URLs during conversion. |
Is your feature request related to a problem? Please describe.
Hi, I come from China, flexmark is really good tools, during my development, I found some problem.
I need to convert html to markdown.But when I convert, some tag in html has unusual src like this
<img src="//img.alicdn.com/tfscom/TB1mR4xPpXXXXXvapXXXXXXXXXX.jpg" >
such src cannot convert to markdown and behavior correct.
Describe the solution you'd like
how can I modify src parse method in img tag in a extension options way.
And get result like this
<img src="//abc.com/cde/efg.jpg" >
convert to
![](https://abc.com/cde/efg.jpg)
Describe alternatives you've considered
some extension options or already has some options I just ignore?
Additional context
The text was updated successfully, but these errors were encountered: