Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows host handling #25

Closed
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions lib/functions.php
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,15 @@ function($matches) {
if (!$result) {
$result = _parse_fallback($uri);
}
else
{
// Add empty host and trailing slash to windows file paths (file:///C:/path)
if (isset($result['scheme']) && $result['scheme'] === 'file' && isset($result['path']) &&
preg_match('/^(?<windows_path> [a-zA-Z]:(\/(?![\/])|\\\\)[^?]*)$/x', $result['path'])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex also matches something like C:\\abc\def.txt and C:\\abc/def.txt - as well as C:/abc/def.txt

In what situations does C:\\ happen?

@peterpostmann @staabm @evert - I have rebased this in #71 and want to make sure I understand all the combinations and why they need to be handled. (And I will write some comment lines in the code in that PR, to help future devs who have forgotten all the detail of Windows paths)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have the context anymore to provide useful input im afraid =(

Copy link
Contributor

@phil-davis phil-davis Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.microsoft.com/en-us/dotnet/standard/io/file-path-formats is a place where I can find C:\\ ;
Console.WriteLine("Setting current directory to 'C:\\'");

But that is C# code, and the \\ is just to put a literal \ in the string. The actual string is:
Setting current directory to 'C:\'

So that does not give me an example of how code can get here and have C:\\ in the string, and be valid representing something in the MS world.

And a tried a few combinations of strings to send to parse_url and the returned value in "path" has exactly whatever \ characters are provided in the input string - there is no "magic escaping" going on in the returned "path" element. So I still don't understand why we need to have the check in the regex to match things like C:\\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing file:///C:\\abc\\def.txt is something you probably want to do to process path strings with escaped backslashes. But revisiting, I would simplify the logic and remove the restrictions. I think the idea was to only process C:\ (valid path), C:\\ (valid escaped path), C:/ (valid path with forward slashes), but not C:// because this is not a valid Windows path anyways. But since the fallback is parsing C:// nevertheless, we should do the same to avoid inconstancy.

// Add empty host and trailing slash to windows file paths (file:///C:/path)
if (isset($result['scheme']) && $result['scheme'] === 'file' && isset($result['path']) &&
     preg_match('/^(?<windows_path> [a-zA-Z]:(\/|\\\\).*)$/x', $result['path'])) {
    $result['path'] = '/' . $result['path'];
    $result['host'] = '';
}

This will just look for a single letter followed by a colon and a slash, which is consistent with the fallback

=== file:///C:\abc\def.txt:
old parse: scheme: file, path: C:\abc\def.txt
fallback:  scheme: file, path: /C:\abc\def.txt
new parse: scheme: file, path: /C:\abc\def.txt

=== file:///C:\abc/def.txt:
old parse: scheme: file, path: C:\abc/def.txt
fallback:  scheme: file, path: /C:\abc/def.txt
new parse: scheme: file, path: /C:\abc/def.txt

=== file:///C:/abc/def.txt:
old parse: scheme: file, path: C:/abc/def.txt
fallback:  scheme: file, path: /C:/abc/def.txt
new parse: scheme: file, path: /C:/abc/def.txt

=== file:///C://abc/def.txt:
old parse: scheme: file, path: C://abc/def.txt
fallback:  scheme: file, path: /C://abc/def.txt
new parse: scheme: file, path: /C://abc/def.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterpostmann I implemented this kind of thing now in #71 - that seems to work fine.

$result['path'] = '/' . $result['path'];
$result['host'] = '';
}
}

return
$result + [
Expand Down Expand Up @@ -348,15 +357,13 @@ function($matches) {
$result['host'] = '';
} elseif (substr($uri, 0, 2) === '//') {
// Uris that have an authority part.
$regex = '
%^
$regex = '%^
//
(?: (?<user> [^:@]+) (: (?<pass> [^@]+)) @)?
(?<host> ( [^:/]* | \[ [^\]]+ \] ))
(?: : (?<port> [0-9]+))?
(?<path> / .*)?
$%x
';
$%x';
if (!preg_match($regex, $uri, $matches)) {
throw new InvalidUriException('Invalid, or could not parse URI');
}
Expand Down
15 changes: 14 additions & 1 deletion tests/ParseTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,21 @@ function parseData() {
'fragment' => 'foo',
]

]
],
// Windows Paths
[
'file:///C:/path/file.ext',
[
'scheme' => 'file',
'host' => '',
'path' => '/C:/path/file.ext',
'port' => null,
'user' => null,
'query' => null,
'fragment' => null,
]

],
];

}
Expand Down
7 changes: 6 additions & 1 deletion tests/ResolveTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,12 @@ function resolveData() {
'#',
'http://example.org/path.json',
],

// Windows Paths
[
'file:///C:/path/file_a.ext',
'file_b.ext',
'file:///C:/path/file_b.ext',
],
];

}
Expand Down