Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prevent <code> from being parsed in Pages render #232

Open
vincent-peugnet opened this issue Oct 14, 2022 · 6 comments
Open

prevent <code> from being parsed in Pages render #232

vincent-peugnet opened this issue Oct 14, 2022 · 6 comments
Assignees
Labels
enhancement New feature or request render engine

Comments

@vincent-peugnet
Copy link
Owner

vincent-peugnet commented Oct 14, 2022

EDIT by @n-peugnet:

One idea was to store the content of the code blocks away, using PHP_DOM extension. Then removing their content and adding them a codeblock ID, before parsing the body with the different replacing tools of W. The final step would be to put back the content of each codeblock based on their ID.

@vincent-peugnet vincent-peugnet added enhancement New feature or request render engine labels Oct 14, 2022
@n-peugnet n-peugnet changed the title prevent <code> to be parsed in Pages render prevent <code> from being parsed in Pages render Oct 31, 2024
@vincent-peugnet vincent-peugnet self-assigned this Nov 1, 2024
@vincent-peugnet
Copy link
Owner Author

vincent-peugnet commented Dec 10, 2024

I did some test on this tonight.

It is quite easy to make the base work.

But... it create many little problems 😬.

To make it work, we have to parse Markdown first to have <code> tags ready to be extracted as soon as possible.

As we can see in this graph https://github.com/vincent-peugnet/wcms/blob/master/RENDER.md, currently, Markdown is after all the W inclusions and the crazy everylink function.

So, I'm trying to put it first and se what happen !

According to the unit tests and my own tests two things are broken :

  1. In markdown-test, the summary/header id tool fail but I do not understand why. It happen on the footnotes part. It's not related to inverting W inclusions and MD render. It's really because of my <code> tag extractor. But I cannot identify preciserly what cause this bug.
  2. the everylink function is broken: as it is designed to exclude word that are inside/between brackets, it do nothing on HTML.

@vincent-peugnet
Copy link
Owner Author

  1. n markdown-test, the summary/header id tool fail but I do not understand why. It happen on the footnotes part. It's not related to inverting W inclusions and MD render. It's really because of my <code> tag extractor. But I cannot identify preciserly what cause this bug.

This is fixed by 81b028d !! (Took me some time 🥵)

@vincent-peugnet
Copy link
Owner Author

I think that for now, I could activate this only for the markdown code elements.

Because it follow the logic of: Markdown is for easy automatic usage. Unlike raw HTML, where you have more control. In case of plain HTML, it make more sense to use HTML entities to avoir parsing.

@vincent-peugnet
Copy link
Owner Author

vincent-peugnet commented Dec 22, 2024

I think that for now, I could activate this only for the markdown code elements.

I thought this was easy, but in fact, I don't know how to differentiate markdwon created <code> tags from HTML originals <code> tags.

What I could easily would be to "prevent <code> from being parsed" only id markdown is enabled. But it's less satisfying.

Markdown library

I thought I could find a solution by looking at the current MD library used in W. From Michel Fortin's PHP-Markdown configuration doc:

code_block_content_func = null

The content of code blocks can be transformed to HTML with a custom function. A simple version of this function will only change the special characters to HTML entities:

$parser->code_block_content_func = function ($code, $language) {
    return htmlspecialchars($code);
};

A more advanced version could use the $language parameter and perform syntax coloring. The result returned by this function is automatically wrapped in a <pre><code>.

I've tried to use this with htmlentities(), but it did'nt worked (percent sign where untuched)

@vincent-peugnet
Copy link
Owner Author

Mmhh,
I've managed to do this thanks to this code:

function encode2($str) {
    $str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
    $t = unpack("N*", $str);
    $t = array_map(function($n) { return "&#$n;"; }, $t);
    return implode("", $t);
}

W input

    %TITLE%
    https://coucou.com
    <a href="https://club1.fr">club1</a>
    <vincent@club1.fr>

HTML output

<pre><code>%TITLE%
https://coucou.com
&lt;a href="https://club1.fr"&gt;club1&lt;/a&gt;
&lt;vincent@club1.fr&gt;


</code></pre>

This is almost perfect ! But there's one problem: this add some trailing newlines !! 😭 I can trim them, but this is not very good 😵‍💫

There is no problems with the back-tick code block syntax.

@vincent-peugnet
Copy link
Owner Author

Maybe I could just escape the % characters in markdown code blocks.

There will still be some problems with the URL-linker (but at least it can be disabled).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request render engine
Projects
None yet
Development

No branches or pull requests

1 participant