Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows host handling #25

Closed

Conversation

peterpostmann
Copy link
Contributor

Hi,

I think a found a bug on windows. For the uri 'file:///C:/path/file_a.ext' parse returns

array (size=2)
  'scheme' => string 'file' (length=4)
  'path' => string 'C:/path/file_a.ext' (length=18)

This causes resolve('file:///C:/path/file_a.ext', 'file_b.ext') to generate wrong URIs:
file://C:/path/file_b.ext (one slash is missing, because the empty host is missing).

parse_url produces the same result, which is correct with reagards to http://php.net/manual/en/wrappers.file.php, but it cannot be used to reconstruct file uris directly.

The behavior of _parse_fallback is correct and has this output:

array (size=3)
  'scheme' => string 'file' (length=4)
  'path' => string '/C:/path/file_a.ext' (length=19)
  'host' => string '' (length=0)

parse should also produce this output.

Note: I had to remove the white spaces before and after the regex because I couldn't get it tested otherwise (using PHP 7.1)

else
{
// Fix windows paths
if (isset($result['path']) && preg_match("#^(\w:)#", $result['path']))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that this change is a bit too aggressive. Do we really want to add a slash to every single uri that doesn't start with a slash? Probably not.

So if this is the case for file:// uris (I think this is true) then this can be simplified significantly with:

if (isset($result['scheme']) && $result['scheme']==='file' && $result['path'][0]!=='/') {
  $result['path'] = '/' + $result['path'];
}

Aside from that tweak, I would also say that it can use a better comment explaining exactly why this needs to be done, because I'm worried fix windows paths might not make sense to a developer maintaining this.

@peterpostmann
Copy link
Contributor Author

Hi,

thanks for your feedback. Yes, I think it can be simplified, but it doesn't add a slash to every file uri which doesn't start with a slash: It adds only a slash if it matches \w: lile "C:", etc. I would even expand this to match \w:/ ("C:")

if ( isset($result['scheme']) && $result['scheme']==='file' && 
     isset($result['path']) && preg_match("#^(\w:)\/#", $result['path']) ) {
  $result['path'] = '/' + $result['path'];
  $result['host'] = '';
}

or without regex:

if ( isset($result['scheme']) && $result['scheme']==='file' && 
     isset($result['path']) && ctype_alpha($result['path'][0]) && $result['path'][1] ===':' && $result['path'][2] === '/' ) {
  $result['path'] = '/' + $result['path'];
  $result['host'] = '';
}

Which version do you think is better?

Sorry for the bad comment. I'll be more descriptive when I update the PR.

@peterpostmann
Copy link
Contributor Author

Hi, I just updated the PR with the discussed changes

@mlambley
Copy link

This PR will fix issues #31 and #32

@evert
Copy link
Member

evert commented Jul 9, 2019

Sorry for the slow responses here, but I'm down with merging this once the conflicts are resolved. Sorry for being over a year late, but thank you for making this change.

@staabm
Copy link
Member

staabm commented Jul 10, 2019

tried to resolve the merge-conflict... lets see

{
// Add empty host and trailing slash to windows file paths (file:///C:/path)
if (isset($result['scheme']) && $result['scheme'] === 'file' && isset($result['path']) &&
preg_match('/^(?<windows_path> [a-zA-Z]:(\/(?![\/])|\\\\)[^?]*)$/x', $result['path'])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex also matches something like C:\\abc\def.txt and C:\\abc/def.txt - as well as C:/abc/def.txt

In what situations does C:\\ happen?

@peterpostmann @staabm @evert - I have rebased this in #71 and want to make sure I understand all the combinations and why they need to be handled. (And I will write some comment lines in the code in that PR, to help future devs who have forgotten all the detail of Windows paths)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really have the context anymore to provide useful input im afraid =(

Copy link
Contributor

@phil-davis phil-davis Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.microsoft.com/en-us/dotnet/standard/io/file-path-formats is a place where I can find C:\\ ;
Console.WriteLine("Setting current directory to 'C:\\'");

But that is C# code, and the \\ is just to put a literal \ in the string. The actual string is:
Setting current directory to 'C:\'

So that does not give me an example of how code can get here and have C:\\ in the string, and be valid representing something in the MS world.

And a tried a few combinations of strings to send to parse_url and the returned value in "path" has exactly whatever \ characters are provided in the input string - there is no "magic escaping" going on in the returned "path" element. So I still don't understand why we need to have the check in the regex to match things like C:\\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing file:///C:\\abc\\def.txt is something you probably want to do to process path strings with escaped backslashes. But revisiting, I would simplify the logic and remove the restrictions. I think the idea was to only process C:\ (valid path), C:\\ (valid escaped path), C:/ (valid path with forward slashes), but not C:// because this is not a valid Windows path anyways. But since the fallback is parsing C:// nevertheless, we should do the same to avoid inconstancy.

// Add empty host and trailing slash to windows file paths (file:///C:/path)
if (isset($result['scheme']) && $result['scheme'] === 'file' && isset($result['path']) &&
     preg_match('/^(?<windows_path> [a-zA-Z]:(\/|\\\\).*)$/x', $result['path'])) {
    $result['path'] = '/' . $result['path'];
    $result['host'] = '';
}

This will just look for a single letter followed by a colon and a slash, which is consistent with the fallback

=== file:///C:\abc\def.txt:
old parse: scheme: file, path: C:\abc\def.txt
fallback:  scheme: file, path: /C:\abc\def.txt
new parse: scheme: file, path: /C:\abc\def.txt

=== file:///C:\abc/def.txt:
old parse: scheme: file, path: C:\abc/def.txt
fallback:  scheme: file, path: /C:\abc/def.txt
new parse: scheme: file, path: /C:\abc/def.txt

=== file:///C:/abc/def.txt:
old parse: scheme: file, path: C:/abc/def.txt
fallback:  scheme: file, path: /C:/abc/def.txt
new parse: scheme: file, path: /C:/abc/def.txt

=== file:///C://abc/def.txt:
old parse: scheme: file, path: C://abc/def.txt
fallback:  scheme: file, path: /C://abc/def.txt
new parse: scheme: file, path: /C://abc/def.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterpostmann I implemented this kind of thing now in #71 - that seems to work fine.

@phil-davis
Copy link
Contributor

Rebased and adjusted code and merged in PR #71 - thanks @peterpostmann

@phil-davis phil-davis closed this Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants