Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filters for htmlEscape and htmlUnescape #1061

Merged
merged 4 commits into from
Dec 7, 2022

Conversation

ad8lmondy
Copy link
Contributor

This PR adds the filters htmlEncode and htmlDecode, inspired by the urlEncode and urlDecode filters.

This allows content like: "a > b && a < c" to be encoded to "a &gt; b &amp;&amp; a &lt; c" and back again.

I've added the filter to existing tests, and all are passing.

Regarding the info here: #1060 (comment) (thanks for that, btw!)

  • Smaller PRs: I've split the PR into 4 commits of relatively logical blocks, but I'm not entirely sure if it's appropriate to make each a PR. I'm not sure the tests would pass without some of the code, so a certain size is going to be required, afaik. Please let me know if you'd prefer something else, happy to do as recommended.
  • Adding a new crate: This PR adds in a new crate html-escape. I took inspiration from the existing code for the urlEncode filters, which uses the percent-encoding crate. I don't say this as justification for adding another crate, just that I'm not sure my skills in rust are up to task of implementing it myself 😅

Thanks!

@jcamiel
Copy link
Collaborator

jcamiel commented Dec 6, 2022

Thanks for the PR @ad8lmondy

  • regarding using an external crates: fair enough, I didn't check closely what we did on urlEncode. I think we're still going to remove this particular crate (need to check with @fabricereix) because, if I'm not missing anything, that's not a particular difficult task.
    For instance, in the Tera code, the following function escapes HTML:
pub fn escape_html(input: &str) -> String {
    let mut output = String::with_capacity(input.len() * 2);
    for c in input.chars() {
        match c {
            '&' => output.push_str("&amp;"),
            '<' => output.push_str("&lt;"),
            '>' => output.push_str("&gt;"),
            '"' => output.push_str("&quot;"),
            '\'' => output.push_str("&#x27;"),
            '/' => output.push_str("&#x2F;"),
            _ => output.push(c),
        }
    }

    // Not using shrink_to_fit() on purpose
    output
}

It's small code and easy to understand. I much prefer having this code in our own source code than using an (even small) dependency. Unescaping is a "bit more" difficult but it is also not a "hard" tasks: for instance, HTML escape and unescape in Python standard lib. That's say, we can merge this PR and implement escaping / unescaping latter,

  • regarding the naming. I think I prefer htmlEscape and htmlUnescape. I understand this is not symetric with urlEncode / urlDecode but, for instance, in Python you have urlEncode and htmlEscape/Unescape. The Tera function is also named escape_html. What do you think? Did you name it because of the existing functions urlEncode and urlDecode or another reasons?

@ad8lmondy
Copy link
Contributor Author

Yes, I suppose you're right about the difficulty - not too bad, particularly for the encoding side. In saying that, abstracting any potential nuances to a seperate crate is appealing too 😄

In regard to the naming - I agree, it should htmlEscape and htmlUnescape. I think I named it as encode/decode because I was literally copying chunks of code. Only when I found the html-escape crate did I realise it was a better name. I meant to go back and rename it, but forgot.

I'll update the PR tomorrow with the new and better name.

@jcamiel
Copy link
Collaborator

jcamiel commented Dec 6, 2022

Great, just don't forget to rebase your commits to appeal our CI and everything should be ok!

@ad8lmondy
Copy link
Contributor Author

Filter names updated, all rebased, and quick test from my end looks to be working.

@ad8lmondy
Copy link
Contributor Author

Oh yes, do let me know if there are any docs I should be updating. I wasn't sure if the grammar section on the website was generated or not - if not, I can update it 👍

@jcamiel
Copy link
Collaborator

jcamiel commented Dec 7, 2022

/accept

@hurl-bot
Copy link
Collaborator

hurl-bot commented Dec 7, 2022

🕗 /accept is running, please wait for completion.

@hurl-bot
Copy link
Collaborator

hurl-bot commented Dec 7, 2022

✅ Pull request accepted and closed by jcamiel with fast forward merge..

# List of commits merged from ad8lmondy/hurl/htmldecode branch into Orange-OpenSource/hurl/master branch:

  • 43487b0 Add integration tests for htmlEscape and htmlUnescape filters
  • e6028f3 Add support for htmlEscape and htmlUnescape in hurlfmt
  • db486f9 Add htmlEscape and htmlUnescape filters
  • 7289930 Add grammar to support htmlEscape and htmlUnescape

@github-actions github-actions bot merged commit 43487b0 into Orange-OpenSource:master Dec 7, 2022
@jcamiel
Copy link
Collaborator

jcamiel commented Dec 7, 2022

Everything is good, once again thanks for the PR!

@jcamiel jcamiel changed the title Add filters for htmlEncode and htmlDecode Add filters for htmlEscape and htmlUnescape Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

String manipulation and/or HTML Character Entities encode/decode support?
3 participants