prevent <code> from being parsed in Pages render #232

vincent-peugnet · 2022-10-14T10:24:36Z

One idea was to store the content of the code blocks away, using PHP_DOM extension. Then removing their content and adding them a codeblock ID, before parsing the body with the different replacing tools of W. The final step would be to put back the content of each codeblock based on their ID.

vincent-peugnet · 2024-12-10T02:08:47Z

I did some test on this tonight.

It is quite easy to make the base work.

But... it create many little problems 😬.

To make it work, we have to parse Markdown first to have <code> tags ready to be extracted as soon as possible.

As we can see in this graph https://github.com/vincent-peugnet/wcms/blob/master/RENDER.md, currently, Markdown is after all the W inclusions and the crazy everylink function.

So, I'm trying to put it first and se what happen !

According to the unit tests and my own tests two things are broken :

In markdown-test, the summary/header id tool fail but I do not understand why. It happen on the footnotes part. It's not related to inverting W inclusions and MD render. It's really because of my <code> tag extractor. But I cannot identify preciserly what cause this bug.
the everylink function is broken: as it is designed to exclude word that are inside/between brackets, it do nothing on HTML.

vincent-peugnet · 2024-12-21T23:50:44Z

n markdown-test, the summary/header id tool fail but I do not understand why. It happen on the footnotes part. It's not related to inverting W inclusions and MD render. It's really because of my <code> tag extractor. But I cannot identify preciserly what cause this bug.

This is fixed by 81b028d !! (Took me some time 🥵)

vincent-peugnet · 2024-12-22T00:00:56Z

I think that for now, I could activate this only for the markdown code elements.

Because it follow the logic of: Markdown is for easy automatic usage. Unlike raw HTML, where you have more control. In case of plain HTML, it make more sense to use HTML entities to avoir parsing.

vincent-peugnet · 2024-12-22T13:26:56Z

I think that for now, I could activate this only for the markdown code elements.

I thought this was easy, but in fact, I don't know how to differentiate markdwon created <code> tags from HTML originals <code> tags.

What I could easily would be to "prevent <code> from being parsed" only id markdown is enabled. But it's less satisfying.

Markdown library

I thought I could find a solution by looking at the current MD library used in W. From Michel Fortin's PHP-Markdown configuration doc:

code_block_content_func = null

The content of code blocks can be transformed to HTML with a custom function. A simple version of this function will only change the special characters to HTML entities:
$parser->code_block_content_func = function ($code, $language) {
    return htmlspecialchars($code);
};
A more advanced version could use the $language parameter and perform syntax coloring. The result returned by this function is automatically wrapped in a <pre><code>.

I've tried to use this with htmlentities(), but it did'nt worked (percent sign where untuched)

vincent-peugnet · 2024-12-22T14:11:08Z

Mmhh,
I've managed to do this thanks to this code:

function encode2($str) {
    $str = mb_convert_encoding($str , 'UTF-32', 'UTF-8');
    $t = unpack("N*", $str);
    $t = array_map(function($n) { return "&#$n;"; }, $t);
    return implode("", $t);
}

W input

    %TITLE%
    https://coucou.com
    <a href="https://club1.fr">club1</a>
    <vincent@club1.fr>

HTML output

<pre><code>%TITLE%
https://coucou.com
&lt;a href="https://club1.fr"&gt;club1&lt;/a&gt;
&lt;vincent@club1.fr&gt;


</code></pre>

This is almost perfect ! But there's one problem: this add some trailing newlines !! 😭 I can trim them, but this is not very good 😵‍💫

There is no problems with the back-tick code block syntax.

vincent-peugnet · 2024-12-31T13:13:28Z

Maybe I could just escape the % characters in markdown code blocks.

There will still be some problems with the URL-linker (but at least it can be disabled).

vincent-peugnet added enhancement New feature or request render engine labels Oct 14, 2022

vincent-peugnet mentioned this issue Oct 30, 2024

Allow escaping of W parsing #476

Open

n-peugnet changed the title ~~prevent <code> to be parsed in Pages render~~ prevent <code> from being parsed in Pages render Oct 31, 2024

vincent-peugnet self-assigned this Nov 1, 2024

vincent-peugnet added a commit that referenced this issue Dec 22, 2024

WIP try to escape <code> in render see #232

ca15515

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prevent <code> from being parsed in Pages render #232

prevent <code> from being parsed in Pages render #232

vincent-peugnet commented Oct 14, 2022 •

edited by n-peugnet

Loading

vincent-peugnet commented Dec 10, 2024 •

edited

Loading

vincent-peugnet commented Dec 21, 2024

vincent-peugnet commented Dec 22, 2024

vincent-peugnet commented Dec 22, 2024 •

edited

Loading

vincent-peugnet commented Dec 22, 2024

vincent-peugnet commented Dec 31, 2024

prevent <code> from being parsed in Pages render #232

prevent <code> from being parsed in Pages render #232

Comments

vincent-peugnet commented Oct 14, 2022 • edited by n-peugnet Loading

vincent-peugnet commented Dec 10, 2024 • edited Loading

vincent-peugnet commented Dec 21, 2024

vincent-peugnet commented Dec 22, 2024

vincent-peugnet commented Dec 22, 2024 • edited Loading

Markdown library

vincent-peugnet commented Dec 22, 2024

W input

HTML output

vincent-peugnet commented Dec 31, 2024

vincent-peugnet commented Oct 14, 2022 •

edited by n-peugnet

Loading

vincent-peugnet commented Dec 10, 2024 •

edited

Loading

vincent-peugnet commented Dec 22, 2024 •

edited

Loading