Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String manipulation and/or HTML Character Entities encode/decode support? #1038

Closed
ad8lmondy opened this issue Nov 29, 2022 · 3 comments · Fixed by #1061
Closed

String manipulation and/or HTML Character Entities encode/decode support? #1038

ad8lmondy opened this issue Nov 29, 2022 · 3 comments · Fixed by #1061
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ad8lmondy
Copy link
Contributor

Hello,

I've got perhaps a specific use-case, but hopefully it's something that hurl can do.

So I've got a form like this:

 <form method="post" name="hiddenform" action="https://url.com/login/callback">
     <input type="hidden" name="wa" value="wsignin1.0">
     <input type="hidden" 
            name="wresult"
            value="eyJ0eXAiOiJKiOiJSUzI1NiJ9.eyJ1c2VyX2">
     <input type="hidden" name="wctx" value="{&#34;strategy&#34;:&#34;auth&#34;,&#34;authClient&#34;:&#34;eyJjQifX0=&#34;}">
     <noscript>
         <p>
             Script is disabled. Click Submit to continue.
         </p><input type="submit" value="Submit">
     </noscript>
</form>

Using a Captures block, I can get out the values:

[Captures]
action: regex "action=\"(.*)\""
wa_val: regex "name=\"wa\" value=\"(.*)\">"
wresult_val: regex "(?s)\"wresult\"[^a-z]*value=\"(.*?)\">"
wctx_val: regex "name=\"wctx\" value=\"(.*)\">" urlDecode

However, you may have noticed that the value of wctx is sort of half url encoded - it's actually JSON data, but the " have been escaped to be &#34;.

Unfortunately, when I try to submit the form with the values:

POST {{action}}
[FormParams]
wa: {{wa_val}}
wresult: {{wresult_val}}
wctx: {{wctx_val}}

hurl uses urlEncode, which re-encodes the wctx values into: %7B%26%2334%3Bstrategy%26%2334%3B%3A%26%2334%3Bauth%26%2334%3B%2C%26%2334%3BauthClient%26%2334%3B%3A%26%2334%eyJjQifX0%26%2334%%3B%7D (not 100% that's actually valid, I had to modify it a bit).
Essentially: the ampersand codes that existed in the original form value data are themselves being encoded.

If I use the 'raw' format, like

Content-Type: application/x-www-form-urlencoded
```
name=John%20Doe&key1=value1```

The re-encoding does not occur, but then the original ampersand codes are still there, and seem to mess things up.

The two solutions I can see are:

  • A filter to decode/encode HTML ampersand chars. Seems like a robust solution, but perhaps outside the scope of hurl?
  • A way to use regex to replace strings - e.g., if I do s/&#34;/"/g on the original value string, it would be native JSON, which can then be properly encoded. This seems like a fairly brittle solution to my specific problem, but it would indeed work, and perhaps be more useful to the broader hurl community?

But I'd love to know if there are other (existing) solutions too!

Thanks!

@jcamiel
Copy link
Collaborator

jcamiel commented Nov 29, 2022

Hi,
Yes your two solutions are totally in the scope of Hurl and we want to address it. You've already seen that there are urlEncode and urlDecode filters. It makes totally sens to both have:

  • htmlEscape / htlmUnescape (like from Python standard lib for instance)
  • a filter to replace parts of a string

Relates to #1028 and #1029 that proposes other filters. We'll add those really soon.

@jcamiel jcamiel added the enhancement New feature or request label Nov 29, 2022
@fabricereix fabricereix linked a pull request Dec 7, 2022 that will close this issue
@fabricereix
Copy link
Collaborator

closed with #1061

@fabricereix fabricereix added this to the 2.0.0 milestone Dec 7, 2022
@jcamiel
Copy link
Collaborator

jcamiel commented Dec 12, 2022

Noted: the current implementation mandates the semicolon when unescaping.
For instance, in Python:

import html
t = "#&65 foo"
print(html.unescape(t)) # A foo

In Go:

var s string = `&#65 foo`
fmt.Println(html.UnescapeString(s))  // A foo

While with html-escape crate:

use html_escape;
let t = "#&65 foo";
println!("{}", html_escape::decode_html_entities(t)); // #&65 foo

It's worth noting that all major browser support the lack of semi-colon so maybe, when we'll re-implement the escaping / unescaping (see #1093 ) we can add support for optional semi-colon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants