-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow safe tags in escapeHtml #3998
Conversation
$title = $this->fields['title_short'] ?? ''; | ||
$words = explode(' ', $title); | ||
if (count($words) > 3) { | ||
$words[2] = '<em>' . $words[2] . '</em>'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so the effect is easily demoed, on any record with a reasonably long title.
I wouldn't do this with escapeHtml, but I'd be happy to contribute our cleanHtml helper that does what you want. It uses a proper HTML purifier to do the heavy lifting. Longer explanation: escapeHtml is meant for situations where the input is not supposed to be HTML. If we start treating it as HTML, it means that we mess up stuff by handling as HTML things that are not supposed to be HTML. Consider this fictitious plain-text example: ." |
For what it's worth, in the Finding Augustine bibliography instance I maintain, it is possible for titles to contain italics, so I created an I definitely do not think we want to change the default behavior of escapeHtml for all the reasons you cite -- and there may even be situations where we really want to display HTML markup, so we don't want to make that impossible. I think the best bet might be to build a couple of different helpers: 1.) unescapeHtml(string $html, array $tagsToUnescape) - the signature should be self-explanatory We could potentially put a configuration in theme.config.php to list default safe tags that get used if a list is not provided to the helpers. It might also be possible to incorporate this into the RecordDataFormatter configuration somehow to make it somewhat automatic for the majority of cases (though things like search results, related links, recommendations, etc. would require separate handling since they don't use the formatter). |
That all makes sense, and please do contribute. But the problem here is I do now expect these non-HTML fields to contain HTML, I would have to clone every template and change each potential escapeHtml to cleanHtml. Hoping there is some alternative I'm not thinking of. |
$escaped = $this->laminasEscapeHtml->__invoke($str); | ||
|
||
// Revert ok chars | ||
foreach ($except as $tag) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more efficient to create arrays of searches and replacements and then do a single str_replace call, instead of doing multiple str_replace calls in a loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...but if @EreMaijala's solution is the best way forward, it's a moot point. :-)
One other thought: maybe we need to create some kind of wrapper class that we can use as a marker for text that is expected to contain HTML. It could have a toString method so that it would seamlessly integrate with existing code, but then we could detect the wrapper class and escape it differently. This is only a half-baked idea, and it would likely require either significant view helper changes or a lot of template changes... but it might be worth thinking about, and it might offer a more seamless way to support your use case if we do it correctly. (See TranslatableString for an example of the kind of pattern I'm thinking about). |
Yeah, the idea would be, is there a way we could make these changes to the core (non-backend-specific) templates and RecordDataFormatter common classes, etc., so it doesn't become a maintenance nightmare, and yet doesn't have all the negative performance etc. side effects. Because I realize this is an unusual issue, on a backend that most will never even use, and even I don't know the extent of which fields can have the HTML in them or what specific tags there might be. But cloning all of those templates also seems like a really bad approach.
I like this idea! If I understand correctly, the new record driver could wrap its own strings in that marker class before returning them. Then as you say we would need a lot of changes to the templates or helpers to check for the wrapper and treat it differently, but at least those changes could be made to core files without actually affecting the other backends. It's a thought. The performance impact would be negligible for unaffected backends. It could also be implemented incrementally. We'd still need consensus though on which fields should actually allow those formatting HTML tags through. And which tags, but maybe that's not a real challenge if @EreMaijala's purifier function has already thought that through. |
Thanks, @maccabeelevine, this sounds like the beginnings of a plan. One other thought, though: as a short-term solution, can you just use |
Yes great idea, will do so. And will close this PR since we obviously won't be replacing escapeHtml, but I'll look forward to @EreMaijala's code for the purifying. |
Allow safe HTML that does not pose any XSS risk to pass through escapeHtml for display output.
This is an idea, just as a starting point to see if any form of it could be practical. I have my doubts. But the alternative is annoying.
I'm working on a new backend that returns "MARC", except the fields like title sometimes have
<em>
tags in them. I could write a new helper "EscapeExcept" that does the basic logic this one does, allowing those specific fields to pass through unencoded. But because I don't know which "MARC" fields will have this HTML, and because record fields are displayed all over the place, I would have to clone many different templates and search/replace escapeHtml in each of them, and then that would be a maintenance nightmare in the future.Wouldn't It Be Nice (TM) if we could just replace escapeHtml altogether? No, self, obviously it would not be nice because
But I'm hoping there is a middle ground I'm not thinking of.