Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoded forward slash (%2F) in parameter not routing correctly #22125

Closed
Ben-Russell opened this issue Nov 17, 2017 · 38 comments
Closed

Encoded forward slash (%2F) in parameter not routing correctly #22125

Ben-Russell opened this issue Nov 17, 2017 · 38 comments
Labels

Comments

@Ben-Russell
Copy link

  • Laravel Version: 5.5
  • PHP Version: 7.2

Description:

Passing an encoded forward slash as a route parameter will be interpreted by the router as a decoded forward slash.

Steps To Reproduce:

Example Code:

Route::get('search/{search}/{params?}', 'SearchController@searchItems')
    ->where('params', '.*');
class SearchController extends Controller
{
    public function searchItems(string $search, string $params = null)
    {
        dd($search);
    }
}

Using the route:
/api/search/10%2F22/param1/5

Will result in the output of 10 instead of the expected 10%2F22.
Dying on $params shows 22/param1/5 confirming that the slash is being interpreted as a real one.

@themsaid
Copy link
Member

This is the intended behaviour, check #4338

@renlok
Copy link

renlok commented Dec 5, 2017

This shouldn't be intended behaviour

@Ben-Russell
Copy link
Author

I don't understand how this is intended. This is contradictory to the entire purpose of encoding a parameter.

https://en.wikipedia.org/wiki/Percent-encoding

@crazy4chrissi
Copy link

crazy4chrissi commented Jan 26, 2018

Please read https://tools.ietf.org/html/rfc3986 and explain why this should be the intended behavior. I understand that the RFC specifically introduces the percentage-encoding like %2F for this kind of problem.

Just don't use slashes in the URLs. Use something else.

Following this argumentation, you could also say: Don't use spaces, % or ?, ... in your URLs. Use something else. Then we would not need percentage-encoding at all.

Plus, it is not always the laravel-developer who designs how URLs look like. For example, it might be that laravel replaces a legacy system that used URLs like this and the new system should be capable of working with the same URLs, as otherwise all links would be broken. "Use something else" is not possible if already thousands of links point to URLs like this.

@ekeyte
Copy link

ekeyte commented Mar 13, 2018

To piggy-back off crazy4chrissi's comments:

In our case, we don't really have a choice, since certain strings that compose our URL segments come from data provided by our vendors. The best we can do is URL-encode them, but we shouldn't have to dream up our own encoding to get around this issue, when the intention of encoding the slashes in the first place is to prevent them being treated as URI segments.

@jrnickell
Copy link

https://tools.ietf.org/html/rfc3986#section-2.4

When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters. The only exception is for percent-encoded octets corresponding to characters in the unreserved set, which can be decoded at any time.

@ZYSHAQ
Copy link

ZYSHAQ commented Mar 18, 2018

Just ran into this problem. Someone placed an '/' which got encoded in their post title and the website goes down.

@d-luk
Copy link

d-luk commented May 3, 2018

The current behaviour is highly unintuitive. Consider the following example of a GET /search route:

Route::get('search/{query}', 'SearchController@search')
    ->where('params', '.*');

Expected behaviour:

  • /search/a%2Fb will go to the SearchController.
  • /search/a/b will return a 404.

Actual behaviour (exact opposite):

  • /search/a%2Fb returns a 404.
  • /search/a/b will go to the SearchController.

@crazy4chrissi
Copy link

@d-luk Search is a pretty good example use case where the argument "Just don't use slashes in the URLs. Use something else." would mean you cannot use the GET method for search-forms, as you cannot forbid users to enter slashes. But using POST is just not the right way and creates other problems. And its not like search is a rare use case.
@taylorotwell @themsaid Please comment on this.

@SeriousKen
Copy link

I work in a school and we want to integrate with our third party calendar. The calendars IDs are of the form 2018/2019 etc. so this problem rears its head for me too.

@laynes
Copy link

laynes commented Jun 14, 2018

Major facepalm on this one. Come on, guys. Please fix this ASAP. This should NOT be intended behavior.

@devcircus
Copy link
Contributor

Looking through github issues for all major php frameworks, this "issue" isn't just a laravel problem. I have to assume its not as simple as some think. There's also security concerns and web server considerations.

@crazy4chrissi
Copy link

crazy4chrissi commented Jun 15, 2018

I analyzed a couple of presumably Laravel based websites. Laravel.com uses POST for searches (via JavaScript). Without JavaScript, laravel.com does not have search functionality at all.

All other sites I analyzed that have search functionality use urls like this:
/search/?q=search%2Fslash

Of course, this works in Laravel, as it does not put the parameter into the route.
For search, this may even be good practice, as q is kind of a standard parameter for search.
So you might consider this option when you define new URLs with parameters that might contain slashes.

Still I see no reason why this should be the intended behavior.

@crazy4chrissi
Copy link

crazy4chrissi commented Jun 15, 2018

@devcircus That's interesting. Can you provide links to issues from other php frameworks?

Edit: I found these for Symfony:
symfony/symfony#13017
symfony/symfony#6442

Regarding "security considerations" you mean this?
https://www.owasp.org/index.php/Double_Encoding

I'd say if your app has a "security check filter that refuses requests containing characters like “../”.", i.e. you filter based on a black-list, then your filter is bad. There is stuff like realpath in PHP to handle such problems properly.
Moreover, if you have another security filter that refuses < to avoid XSS, this would mean you should not allow %3C. So why does %3C work but %2F doesn't? Because we cannot disallow everything on the URL level that might be bad on some level below if not handled there correctly.

Regarding web server considerations: Just because a few web servers (mainly Apache) don't support all RFC compliant URLs by default, this does not mean laravel needs to add another layer that is RFC incompatible. Moreover, Apache can be configured to allow such URLs, and the Apache documentation does not mention any security implications.

It may be that there are good reasons why this is considered the intended behavior, but I have not read any so far. Just saying doing otherwise is not good practice is not enough, you would need to explain why it is bad practice.

@web-engineer
Copy link

Apache does support / in the URL's you just need to turn it on in the Vhosts section 0

AllowEncodedSlashes On

This is absurd that Laravel doesn't honour the escaped part of the URL - its feels wrong to need to have to escape these with something different to handle "data" in the URL when there is already an escape sequence for data.

Double encoding is not related - this is not what were doing here.

The confusion is that this is part of the path parameter not the data component so therefore does the RFC apply?

Surely the best approach would be to allow a route definition to accept encoded slashes at the point it is defined, thereby enabling the current behaviour and giving the option of handling URL's in this way should the developer wish to.

@driesvints
Copy link
Member

driesvints commented Nov 26, 2018

Hey everyone. I did some research on this and this is already possible atm.

@Ben-Russell in your example you also need to define a wildcard character for the search parameter which would allow forward slashes. If you change your route to the following it will pass:

This is the same way as Symfony works: https://symfony.com/doc/current/routing/slash_in_parameter.html

I'll send a note to the docs which will make this more clear if anyone else is looking for this but going to close this as this is possible if you want to. I hope you understand why we don't want this as a default behavior.

@SebastianS90
Copy link

@driesvints Sorry, but that solution does not help for encoded slashes.
Try the following Route in a new Laraval application:

Route::get('/band/{band}/{song?}', function ($band, $song = null) {
    if ($song) {
        return "Show song '$song' of band '$band'";
    }
    return "List all songs of band '$band'";
})->where('band', '.*')->where('song', '.*');

Now access http://localhost:8000/band/AC%2fDC/T.N.T.

I would expect to see: Show song 'T.N.T.' of band 'AC/DC'.
But the result is: List all songs of band 'AC/DC/T.N.T.'

The typical use case: The user enters his favorite band name into a text field and the request is generated using javascript something like '/band/' + encodeURIComponent(userInput) + '/'

So please reopen this issue.

@driesvints
Copy link
Member

Hey @SebastianS90 you're right. I've removed my example above to prevent confusion.

I did some further testing and I've come to the following conclusion: it's only possible to allow encoded slashes in the last parameter. It's also impossible in Symfony (as far as I can see) to allow this for multiple segments after each other. I've added a note to my PR to the docs about this: laravel/docs#4785

@SebastianS90
Copy link

@SDekkers The %7F from your example is ascii character 128, i.e. the DEL control character.
But we are looking for %2F which is the encoded forward slash /. The problem ist that Laravel/Symphony treats %2F the same as /. That should not happen.

@SDekkers
Copy link
Contributor

SDekkers commented Nov 27, 2018

@SebastianS90 Ah, that's what I did wrong then. Looking at the regex it generates to obtain the parameters from the route:

"#^/band/(?P<band>.*)/(?P<song>.*)$#sDu"

I would say you should try to only use forward slashes in the last parameter as @driesvints mentioned, as the regex does not use the Ungreedy U modifier, and even if it would, it would be guessing what belongs to which parameter.

@web-engineer
Copy link

I think the status quo makes sense, haven't had a chance to test tha approach yet, as long as we have a prefix the remaining data should be encoded / decoded within your application with any symbol on the url using @driesvints example - I can kinda see how it needs to be one thing or the other, the frustration for me was that i wanted "/" in may parameter value - if your route then parses everything to the right then surely this is optimal?

What I missed was to look at the symphony docs that underpin Laravel, I think trying to escape the URL otherwise is a mistake, the URL should work regardless of escaping however and that I'd like to test when I get the chance.

I ended up replacing the / for an alternate character in my code (this was to generate a perma link) but it would have been better if that link was not limited to the characters it could contain.

@driesvints
Copy link
Member

So as for now we're going the same way as Symfony. We've updated the docs to indicate that it's possible for the last route segment. This also reflects the Symfony docs.

@iget-master
Copy link

Very confusing to me how this is an intended behavior... any plans for changing this behavior?

@driesvints
Copy link
Member

@iget-master no

@mvaljento
Copy link

mvaljento commented Dec 18, 2019

So let me get this straight: If i have product codes with slashes in them (a real-world example "19NA-MSD/NAV+44XG"), I can't make a RESTful GET route where one parameter is the url encoded product code?

@alcaitiff
Copy link

Yes. @mvaljento this bug hunts us for years.

The router should resolve the route and after that decode parameters, but it does decode the url parameters before resolving the route. Therefore we must use parameters that may have "%2F" after the ?

You can't do myroute/xxx%2Fzzz it will try to find a route decoded to myroute/xxx/zzz
But you can do myroute?parameter=xxx%2Fzzz it will find the route myroute and send the parameter decoded just right

Like @renlok said in 2017

This shouldn't be intended behaviour

@alcaitiff
Copy link

@mvaljento I changed the Artistan/Urlencode to work on Laravel 5.

You can get it here https://github.com/alcaitiff/laravel-urlencode

@andrewfinnell
Copy link

For anyone still interested, I've solved this in the past by double encoding the /. Encode it to %2F and then encode it again. This will result in the string %2F being in the URL, which you can then fix in a pre-processor for the web server.

@simonlucalandi
Copy link

@andrewfinnell this saved my day

@DRodero
Copy link

DRodero commented Nov 25, 2020

Amazing @andrewfinnell, finally, this has worked!!! Thanks!!

@AsafMazuz1
Copy link

Hey,
Facing this problem with moving old website to laravel.
I solved it by making an extra route with:
Origin Route:
Route::get('/{query}', [\App\Http\Controllers\Front\User\CategoryController::class, 'category'])->name('category');
Extra Route on the end of the route file:
Route::get('/{query}/{more?}', [\App\Http\Controllers\Front\User\CategoryController::class, 'category'])->name('category');

And now everything works fine :)

@mholt
Copy link

mholt commented Aug 9, 2022

I just want to chime in as a fellow developer, but I'm not a Laravel user myself. I found this issue while doing research about this very problem, because I want to make sure we address this in the Caddy web server as best as possible. So don't mind me, I'm kind of writing this post to work out my own thoughts on the matter.

My understanding is: the complaint is that servers/server-side apps/routers (Laravel) treat %2f as equivalent to /, but developers want %2f to be preserved as application data and not decoded or acted on by the server/router/what-have-you.

I have spent days in the weeds on this very topic, and let me assure you, it is really quite tricky in the general case (and Laravel is a general-purpose framework) to strike a proper balance between:

  • logical correctness
  • intuitive behavior
  • spec compliance
  • security

If we have all put down our pitchforks by now... I will say I can understand why the Laravel authors would rather keep Laravel's URI handling in normalized space. And I can also understand why there hasn't been much explanation given or response from them, because frankly this issue is exhausting.

It should be noted that this problem is not Laravel's. It is not Caddy's. It is the result of a complex mixture of legacy software, bugs, evolving Internet standards, and application requirements. A server has it tough these days.

I'd like to offer one server developer's perspective for a moment if I may.

People use Caddy to both match/route requests, and manipulate them too. Any given URI has multiple valid encoded forms. %2F%66%6F%6F%2F%62%61%72 can be decoded to /foo/bar just as well as /foo%2Fbar can, and everything in between can, too. If routers matched on non-normalized URIs, there would be plenty more security bugs to deal with: a pattern of /foo/*, which is expected to be authenticated, would no longer match /foo%2Fbar even though they are, according to ratified RFCs, equivalent.

RFC 9110, "HTTP Semantics," has a section on HTTP URI normalization, which says:

Two HTTP URIs that are equivalent after normalization (using any method) can be assumed to identify the same resource, and any HTTP component MAY perform normalization. As a result, distinct resources SHOULD NOT be identified by HTTP URIs that are equivalent after normalization.

In other words, /foo%2Fbar and /foo/bar are equivalent after normalizing, and thus they SHOULD NOT be used for distinct resources. So at this point in my journey, my recommendation is: If you are able, design a different API. It is not robust in the harsh HTTP environment.

Note that several RFCs, notably RFCs 3986 and 9110, continually repeat that URI parsing is dependent upon scheme. That's one other problem: we all use the http:// or https:// scheme and yet expect applications to handle URIs differently. So of course there's going to be head-butting, we're fighting the design.

To clarify, it is definitely possible for a URI path such as /band/AC%2fDC/T.N.T to be "properly" handled by a server application, as noted by all of you above. Simply write a server that decodes everything after /band/ except %2f. 🤷‍♂️ The problem is that this is difficult in general. Depending on what situations you do this, you may be opening yourself to bugs and security holes. This is why Caddy is handling URIs in the unescaped space: it's the "one true" representation of a URI, and normalized HTTP URIs are more or less clearly defined nowadays.

The proposed solution by others above to double-encode URIs is definitely a hack, and it violates spec:

Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string.

I mean, if it works for you, cool, but just beware of the non-conforming behavior and highlight it very prominently in documentation so you can avoid bugs.

One other interesting comment, by @alcaitiff:

The router should resolve the route and after that decode parameters, but it does decode the url parameters before resolving the route.

I can't speak for Laravel or what it's doing, but the Go standard library (what Caddy uses), for example, does do URL parsing correctly and still has this problem. Go does exactly what you and the spec recommend: it splits the URI into its components and then decodes reserved characters after parsing. It preserves the original, "raw" path in the RawPath field and offers the decoded path in the Path field. Its EscapedPath() method uses RawPath if it is a valid encoding of Path, which is interesting because any given path has multiple valid encodings as I noted above. So if I want to truly "normalize" the URI in Go, I have to call url.PathEscape(req.URL.Path) myself and ignore RawPath entirely (AFAIK). And guess what, this converts /foo/bar to... /foo/bar. In other words, decoding /foo%2Fbar is not reversible without loss of precision. (Unless the HTTP server knows your business logic, more on that in a moment.)

We can write our own logic, though, that uses RawPath as a "hint" (as the Go docs say) to maybe replace / with %2f, but if we've manipulated/rewritten the URI at all, this becomes infeasible because we don't know where or if that instance still exists in the string.

RFC 3986 section 2 states:

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component. If a reserved character is found in a URI component and
no delimiting role is known for that character, then it must be
interpreted as representing the data octet corresponding to that
character's encoding in US-ASCII.

The / is in the reserved set. Thus it is up to the implementation to determine whether it is data or a delimiter. I guess Laravel doesn't know, and it's frankly safer to assume it's a delimiter and treat it in its normalized form.

So I understand the frustration of everyone here. On my end I feel like I need to write a server that can read people's minds: is this slash data or is it a delimiter? The router needs more information, because both are very valid ways of interpreting a URI!

I'm currently thinking of ways that maybe Caddy can be configured by the developer to route requests with some "hint" that certain slashes should be treated as data, not delimiters. If the HTTP server or router has a clue as to your business logic, that can help solve the routing problem, but I'm still stumped on how to let users configure rewrites. Maybe I'll save that for another day.

@AsafMazuz1
Copy link

mplaint is that servers/server-side apps/routers (Laravel) t

Hey,
I don't know how the routes file (or dashboard screen to manage routes) looks at Caddy,
but maybe you can add a special route section for UnParsed strings like foo%2Fbar.
And then users will be able to write a route like this:
/category/{UnParsedString}
And then you can configure this route on the server somehow to ignore this section.
You can show a warning to the user before using this option and tell him about the security risks but with this approach,
you will be able to answer that need for some cases.
In this solution, you will be able to support / as data only as the last section of the URL.
Routes like category/{UnParsedString}/add for example will need to have a custom solution.
Maybe generate middleware for this route that will split the known parts from the route variables and when you have a request you will try to find the best matching to the known parts and build the route together with the variable.
Sure this is a complex topic and will require a lot of testing and adapting for Caddy and Go With the server configuration but I hope I was able to help a little bit with those ideas and directions :)

@mholt
Copy link

mholt commented Aug 9, 2022

@AsafMazuz1 Indeed, good thoughts! That's pretty close to what I'm experimenting with already. 😃

If you want to match a route like /bands/:artist/:song/ (we don't use the : syntax for routing vars in Caddy -- at least not yet -- but you get the idea), you'd currently write a matcher like /bands/*/*/ -- which will not work for /bands/AC%2fDC/T.N.T because we match in the normalized space for reasons I mentioned above. (So you'd end up trying to match /bands/AC/DC/T.N.T which doesn't match the pattern.)

But maybe if we could write a matcher pattern like /bands/%*/*/, the %* could mean "wildcard, but scan the RAW/ESCAPED path until the next slash." In other words, this way the developer can convey which part of the URI they expect to be data and not delimiter.

I'm still testing this approach to see if it's viable/feasible, since I don't know what kinds of parsing ambiguities or security problems may arise, but if it works maybe it's something that other frameworks or servers like Laravel could benefit from too.

@mholt
Copy link

mholt commented Aug 9, 2022

It works.

I have a solution that works, that doesn't resort to "compare all URIs in the normalized space only". It allows the developer to convey their intention for certain parts of the path. If the route pattern is crafted correctly, I believe this solution is even robust against common path attacks and security flaws -- though I need to test this more, I'm probably overlooking something stupid.

In Caddy, you might have configured something like this:

php_fastcgi /band/*/* 127.0.0.1:9000

This would send all band song pages to your PHP backend (i.e. /band/Pink/Try). However, this would fail as mentioned above by @SebastianS90 if the band name is "AC/DC", because AC%2fDC would be compared in the normalized space. Which, as we discussed, is generally the right and secure thing to do. But that would result in a pattern of /band/AC/DC/T.N.T which would result in a song name of "DC" instead of "T.N.T" (or fail to match entirely because too many path components).

However, while I was writing my first post above I was already hacking on a solution similar to @AsafMazuz1's idea, since what we need is a way for the developer to tell the HTTP server or router to leave some part of the path escaped. It obviously can't know this without some help.

I now have tests passing for an input path like /band/AC%2fDC/T.N.T with this config:

php_fastcgi /band/%*/%* 127.0.0.1:9000

which tells Caddy's router ("path matcher") to leave those parts of the path URI-encoded. The hint is the %* wildcard instead of the regular * wildcard.

What other servers/libraries let you give it a hint as to which specific parts of the path should be treated as data (versus delimiters)? I only know of solutions like AllowEncodedSlashes that turn on/off normalization entirely, rather than normalizing only the desired parts of the path.

Anyway I'm excited to test this more and I hope it can be a source of inspiration for Laravel and other frameworks/libraries that may want to do the same.

@AsafMazuz1
Copy link

AsafMazuz1 commented Aug 10, 2022

It works.

I have a solution that works, that doesn't resort to "compare all URIs in the normalized space only". It allows the developer to convey their intention for certain parts of the path. If the route pattern is crafted correctly, I believe this solution is even robust against common path attacks and security flaws -- though I need to test this more, I'm probably overlooking something stupid.

In Caddy, you might have configured something like this:

php_fastcgi /band/*/* 127.0.0.1:9000

This would send all band song pages to your PHP backend (i.e. /band/Pink/Try). However, this would fail as mentioned above by @SebastianS90 if the band name is "AC/DC", because AC%2fDC would be compared in the normalized space. Which, as we discussed, is generally the right and secure thing to do. But that would result in a pattern of /band/AC/DC/T.N.T which would result in a song name of "DC" instead of "T.N.T" (or fail to match entirely because too many path components).

However, while I was writing my first post above I was already hacking on a solution similar to @AsafMazuz1's idea, since what we need is a way for the developer to tell the HTTP server or router to leave some part of the path escaped. It obviously can't know this without some help.

I now have tests passing for an input path like /band/AC%2fDC/T.N.T with this config:

php_fastcgi /band/%*/%* 127.0.0.1:9000

which tells Caddy's router ("path matcher") to leave those parts of the path URI-encoded. The hint is the %* wildcard instead of the regular * wildcard.

What other servers/libraries let you give it a hint as to which specific parts of the path should be treated as data (versus delimiters)? I only know of solutions like AllowEncodedSlashes that turn on/off normalization entirely, rather than normalizing only the desired parts of the path.

Anyway I'm excited to test this more and I hope it can be a source of inspiration for Laravel and other frameworks/libraries that may want to do the same.

Glad to hear that you manage to work this out.
I believe that this problem accrued on other frameworks/libraries but I didn't encounter this kind of solution yet.
Glad to hear that my ideas help even if just by little :)
Hope that more frameworks will think about this situation that is often accused on real production websites when the SEO and URL formats are a really important part of the project. (Like when I found this problem back then)

schu added a commit to openmultiplechoice/openmultiplechoice that referenced this issue Feb 20, 2023
The current routes with a pattern like `subjects/byname/{name}' don't
work for names that include encoded characters, for example `foo%2Fbar`
(i.e. `foo/bar`).

laravel/framework#22125

Use a query parameter `...?name=...` instead.
@AMCerasoli
Copy link

This is crazy... "Intended"...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests