Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to parse HTML without removing newlines #807

Open
ga-jrich opened this issue Jun 17, 2021 · 6 comments
Open

Option to parse HTML without removing newlines #807

ga-jrich opened this issue Jun 17, 2021 · 6 comments

Comments

@ga-jrich
Copy link

I have a table where every other row is user-input that comes from a textarea, meaning the lines are separated using \n rather than <br>. I see below that you are stripping \n but replacing <br> with \n afterwards.

// Remove extra space and line breaks in markup to make it more similar to
// what would be shown in html
cell.innerHTML = cell.innerHTML.replace(/\n/g, '').replace(/ +/g, ' ')
// Preserve <br> tags as line breaks in the pdf
cell.innerHTML = cell.innerHTML
.split(/\<br.*?\>/) //start with '<br' and ends with '>'.
.map((part: string) => part.trim())
.join('\n')

I understand the reasoning behind this decision, but it would be nice to have the option to preserve the original line breaks. As it stands, I have to replace \n in my text with <br> only for it to be swapped back under the hood.

@simonbengtsson
Copy link
Owner

True! You mean you have a textarea inside of the html table? If so I think there are two good options we can take:

  1. Handle textareas in a special way when parsing the html. Would love to merge a pull request for this.
  2. Recommend user to use didParseCell to parse the textarea content manually

@ga-jrich
Copy link
Author

I have content that comes from a <textarea>. Currently I'm just rendering it straight into the table cell, but it wouldn't be too bad to put it in a <textarea>. Originally I was thinking it would be easiest just to give the autotable() some boolean flag telling it whether or not the input is formatted.

In that case, would it make sense to instead just skip over text within a <pre> tag? That might align more with the original philosophy of looking how it would if it was HTML.

@simonbengtsson
Copy link
Owner

Got it! Yes I think the goal for autotable should be to make it look like the HTML as closely as possible meaning linebreaks etc would not show up. I don't know how it handles <pre> tags today, but I would assume we would need to update the HTML parsing for it to parse pre tags correctly.

@ga-jrich
Copy link
Author

A quick test shows that <pre> has the desired effect in some but not all situations:

<pre>multi\nline\ntext</pre>

multi\nline\ntext


<pre>multi
line
text</pre>

multi
line
text


<pre>multi<br>line<br>text</pre>

multi
line
text


Using Vue.js, with multilineText === 'multi\nline\ntext',

<pre>{{ multilineText }}</pre>

multi
line
text


With multilineText === 'multi<br>line<br>text',

<pre>{{ multilineText }}</pre>

multi<br>line<br>text

Within autotable, the newline characters are removed, so all of the above options result in "multilinetext" (no line break or space) being rendered into the document.

An implementation of <pre> which follows its raw HTML appearance exactly should:

  • display HTML elements + entities from HTML source as usual
  • display injected HTML as raw text (as in the last example)
  • treat \n as two raw characters when it comes from HTML source
  • treat \n as a line break when it's injected (as in the fourth example)
  • render all text in the browser's default mono font

This adds up to be a lot and as far as I know, autotable has no way of knowing whether content was injected or not. That combined with the assumption that most users will prefer their multiline text in the document's font rather than mono, makes it seem counterproductive to actually implement <pre>.
Instead, it might be more user-friendly to return to the idea I opened this issue with and give the user a boolean flag which they can use to indicate whether or not autotable should remove newlines from the string.

This does seem to be blown out of proportion just for the idea of removing a small redundancy, so I'd like to clarify it a little bit. At its core, the issue is that currently, to get line breaks with dynamic content, <br> elements have to be used in the content. This creates the redundancy issue, but more importantly, if the content is permitted to have HTML elements in it, it's open to XSS attacks. This is the real driver behind me wanting to get this change done.

@simonbengtsson
Copy link
Owner

Sounds reasonable to me as well. The main question then is if this is best to add a new option for this or if it is enough to handle this case manually with didParseCell. Have you tried that?

@Maxhirez
Copy link

Maxhirez commented Aug 14, 2023

I'm having something of the same issue but I'm not sure I'm getting there the same way as @ga-jrich. Inside a didParseCell, I have something along these lines:

if(data.column.index==3){
  let cid = "#textarea-id-prefix"+data.row.index.toString();
  try{
    let cResult=document.querySelector(cid).value;
    data.cell.text=cResult;
  }
  catch{er){console.log("No notes.")}
}

Replacing the newline \n with <br/> results in the literal breakstring getting inserted into the text though.

Since the original issue doesn't seem to be using this method, did I overcomplicate things? Is there a solution that would work better for my use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants