Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows UNC Paths Broken #5127

Closed
KeenRivals opened this issue Dec 5, 2018 · 31 comments
Closed

Windows UNC Paths Broken #5127

KeenRivals opened this issue Dec 5, 2018 · 31 comments

Comments

@KeenRivals
Copy link

When running Pandoc 2.5 against a UNC path in Windows, I get a file not found error. The problem does not occur in Pandoc 2.2.3.2.

Here's a sample Pandoc 2.2.3.2 session, which works:

C:\Users\me>pandoc -t html5 -s -o \\example-nas.org\private\out.html \\example-nas.org\private\in.txt

C:\Users\me>

Here's Pandoc 2.5:

C:\Users\me>pandoc -t html5 -s -o \\example-nas.org\private\out.html \\example-nas.org\private\in.txt
pandoc.exe: \\example-nas.org\private\in.txt: openBinaryFile: does not exist (No such file or directory)

C:\Users\me>

I'm running Windows 10 64-bit, 1809. Using 64-bit versions of pandoc, but I've had the same behavior on 32-bit versions. The problem also occurs if the cwd is inside the network folder. I was also able to recreate the issue on Pandoc 2.4. The target network shares I've had it happen on are a samba share and a Windows file server.

@jgm
Copy link
Owner

jgm commented Dec 6, 2018

I'd be grateful if some who has access to a Windows box with network shares and understands Haskell could help me debug this.

@cdelker
Copy link

cdelker commented Jan 15, 2019

Can confirm the issue occurs on 2.3.1 as well. Interestingly if you use "Map Network Drive" to assign a drive letter to the path, and then use the drive letter instead of \\example-nas.org, it seems to work ok.

@agusmba
Copy link
Contributor

agusmba commented Jan 16, 2019

Having recently installed haskell and stack in order to compile pandoc, I could try to lend a hand, although I'm far from really "understanding" Haskell.

@agusmba
Copy link
Contributor

agusmba commented Jan 17, 2019

I'm not sure how to debug this, but the first version with this error is 2.3.1. I tested both that one and 2.3 and the latter one works:

$ pandoc -v
pandoc 2.3
Compiled with pandoc-types 1.17.5.1, texmath 0.11.1, skylighting 0.7.2
Default user data directory: d:\Usuarios\user\AppData\Roaming\pandoc
Copyright (C) 2006-2018 John MacFarlane
Web:  http://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.

$ pandoc -t native \\server\mypath\test.md
[Header 1 ("my-title",[],[]) [Str "My",Space,Str "Title"]
,Para [Str "My",Space,Str "content"]]

$ pandoc -v                                                                                                            
pandoc 2.3.1                                                                                                           
Compiled with pandoc-types 1.17.5.1, texmath 0.11.1.1, skylighting 0.7.3                                               
Default user data directory: d:\Usuarios\user\AppData\Roaming\pandoc                                               
Copyright (C) 2006-2018 John MacFarlane                                                                                
Web:  http://pandoc.org                                                                                                
This is free software; see the source for copying conditions.                                                          
There is no warranty, not even for merchantability or fitness                                                          
for a particular purpose.                                                                                              
                                                                                                                                                                                                 
$ pandoc -t native \\server\mypath\test.md --verbose                                       
pandoc: \\server\mypath\test.md: openBinaryFile: does not exist (No such file or directory)

Note that I tested both with forward and backward slashes and the result was the same (2.3 works, 2.3.1 doesn't)

Could it be a ghc 8.6 issue? It seems the biggest change between those versions are in the build process.

See: 2.3...jgm:2.3.1

@mb21
Copy link
Collaborator

mb21 commented Jan 17, 2019

Could it be a ghc 8.6 issue?

I suppose you could try to build pandoc 2.3 with the older ghc, to see whether that's the case.

To rule out that it's something wrong in the travis build server: you can reproduce this when you build pandoc from source on Windows, correct?

@agusmba
Copy link
Contributor

agusmba commented Jan 17, 2019

I suppose you could try to build pandoc 2.3 with the older ghc, to see whether that's the case.

To rule out that it's something wrong in the travis build server: you can reproduce this when you build pandoc from source on Windows, correct?

My build setup is a bit cumbersome. At work I can only build pandoc inside a 32bit windows VM (I will do that and report back). I'll test it with a proper 64bit version at home tonight.

I normally compile with stack build. Is there an easy way of switching to an older ghc?

@mb21
Copy link
Collaborator

mb21 commented Jan 17, 2019

Is there an easy way of switching to an older ghc?

Change the resolver field in the stack.yaml to an older stackage lts.

@agusmba
Copy link
Contributor

agusmba commented Jan 17, 2019

Ok, late~ish pandoc compiled as-is does not work (the UNC path problem)
If I change to a newer compiler (lts-13.3), it also doesn't work.
If I change to an older compiler (lts-11.22), it won't compile. First it complains about dependencies versions, and after I allow-newer, it complains about some ambiguous occurrence<>...

@jgm
Copy link
Owner

jgm commented Jan 18, 2019

Hard to believe that it's ghc 8.6 that causes this -- but I agree, I don't see anything else in the changelog that seems relevant.

To build with stack lts-11, you can try creating stack.lts11.yaml like this, and then compile with --stack-yaml=stack.lts11.yaml.

flags:
  pandoc:
    trypandoc: false
    embed_data_files: true
  pandoc-citeproc:
    bibutils: true
    embed_data_files: true
    unicode_collation: false
    test_citeproc: false
    debug: false
packages:
- '.'
extra-deps:
- pandoc-citeproc-0.14.8
- skylighting-0.7.2
- skylighting-core-0.7.2
- cmark-gfm-0.1.6
- pandoc-types-1.17.5.1
- texmath-0.11.1.2
- haddock-library-1.7.0
- HsYAML-0.1.1.1
- hs-bibutils-6.6.0.0
- yaml-0.9.0
- hslua-1.0.1
- hslua-module-text-0.2.0
- skylighting-0.7.4
- skylighting-core-0.7.4
ghc-options:
   "$locals": -fhide-source-paths -XNoImplicitPrelude
resolver: lts-11.17

@agusmba
Copy link
Contributor

agusmba commented Jan 18, 2019

Ok, I had to tweak a bit the skylighting dependencies but got it to compile (used version 0.7.5).

Initially it also failed but with a different message:

\serverVolume_1pandocpandoc-issue-3034pandoc-issue-3034.md: openBinaryFile: does not exist (No such file or directory)

It looked fishy so I tried with double-slashes on the command line:

$ stack run -- -t native \\\\server\\Volume_1\\pandoc\\pandoc-issue-3034\\pandoc-issue-3034.md

and it worked!

Note that previously I had also tried the double-slash and it still didn't work.

@agusmba
Copy link
Contributor

agusmba commented Jan 18, 2019

Since in Windows the path separators in Haskell are \\ or /, I tried again the lts-11.17 compiled pandoc:

stack run -- -t native //server/Volume_1/pandoc/pandoc-issue-3034/pandoc-issue-3034.md

and it also works

UPDATE:

Interestingly enough, I can't seem to reproduce locally #4283 which was one of the causes for the change in the ghc version (might be because now I'm on windows 10? I'll test again tomorrow, it's too late now)

UPDATE2:

Tested the lts-11.17 version on my windows7 and pandoc -h does not segfault.

@agusmba
Copy link
Contributor

agusmba commented Jan 21, 2019

@KeenRivals could you check this out?

I got a tip by reading some documentation

while

pandoc -t native \\server\path\test.md

did not work, this one does:

pandoc -t native \\?\UNC\server\path\test.md

I tested it with pandoc 2.5, could you try it on your end?

Note that this is the case for calling pandoc from cmd. If you use git-bash, you need to double-escape:

pandoc -t native \\\\?\\UNC\\server\\path\\test.md

So it seems that in the later versions of pandoc we need to add the ?\UNC\ part to our UNC paths (probably due to some core haskell library change)

@agusmba
Copy link
Contributor

agusmba commented Jan 21, 2019

Weird, I just compiled locally, with stack, a recent pandoc devel version (incorporating a couple of unrelated PRs) and it just works, without the need for ?\UNC.

Just in case I downloaded today's build, and... it doesn't work.

@jgm could we test a normal "stack build" from appveyor?

Currently we have

  • a potential problem in the official builds (if locally compiled versions work but the appveyor ones don't)
  • a (possible) workaround for non-working pandoc versions (?\UNC\)

This could still be related to ghc, my local stack build uses lts-12.23 and thus ghc 8.4.4, while I see that appveyor is using 8.6.2.

I could try using lts-13.4 to see if ghc 8.6.3 exhibits the same issues...

EDIT:

Nope, compiling with stack locally with lts-13.4 (ghc 8.6.3) is not working with \\network\shares\xxx.
It looks like something is different between ghc 8.4.4 and 8.6.x regarding network file access.

@agusmba
Copy link
Contributor

agusmba commented Jan 21, 2019

It looks like this was a conscious decision to remove MAX_PATH limitation from ghc. See here and here.

I'm not sure if this means that what we are experiencing here is a direct result of this change (and we need to adapt in our command-lines), or if there is some kind of file-path management in pandoc that can get in the way of the new automations.

@jgm
Copy link
Owner

jgm commented Jan 21, 2019

OK, this is great to know! I still don't understand this change fully, but at least we know where to look.

What does pandoc do with the paths? First, the raw command line args are obtained with System.Environment.getArgs. Then, System.Console.GetOpt.getOpt' is called to parse options. Any arguments remaining (non-options) are returned from getOpt' and populate optInputFiles. (All of this in T.P.App.CommandLineOptions.) Then in T.P.App, these paths are passed to readSource. readSource first attempts to parse a path as a URI. If this succeeds, it calls readURI; if it fails, it calls readTextFile, which slurps up the file as UTF-8 text.

One thought I had: try adding --dump-args to your command line; this will cause pandoc to dump the arguments so you can see what they look like by the time they get to T.P.App.

@jgm
Copy link
Owner

jgm commented Jan 21, 2019

The "\\?\" prefix can also be used with paths constructed according to the universal naming convention (UNC). To specify such a path using UNC, use the "\\?\UNC\" prefix. For example, "\\?\UNC\server\share", where "server" is the name of the computer and "share" is the name of the shared folder. These prefixes are not used as part of the path itself. They indicate that the path should be passed to the system with minimal modification, which means that you cannot use forward slashes to represent path separators, or a period to represent the current directory, or double dots to represent the parent directory. Because you cannot use the "\\?\" prefix with a relative path, relative paths are always limited to a total of MAX_PATH characters.

Tip. Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior.

A registry key allows you to enable or disable the new long path behavior. To enable long path behavior set the registry key at HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled (Type: REG_DWORD). The key's value will be cached by the system (per process) after the first call to an affected Win32 file or directory function (list follows). The registry key will not be reloaded during the lifetime of the process. In order for all apps on the system to recognize the value of the key, a reboot might be required because some processes may have started before the key was set.
The registry key can also be controlled via Group Policy at Computer Configuration > Administrative Templates > System > Filesystem > Enable NTFS long paths.
You can also enable the new long path behavior per app via the manifest:

<application xmlns="urn:schemas-microsoft-com:asm.v3">
    <windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
        <ws2:longPathAware>true</ws2:longPathAware>
    </windowsSettings>
</application>

Because it's opt-in, it's hard to see why it should be affecting us at all...

@agusmba
Copy link
Contributor

agusmba commented Jan 21, 2019

Because it's opt-in, it's hard to see why it should be affecting us at all...

That particular case is not affecting us. It only says that in Win10 xx you may configure it so that the old APIs don't have the MAX_PATH limitation (thus you wouldn't need the \? for long paths using old APIs)

One thought I had: try adding --dump-args to your command line; this will cause pandoc to dump the arguments so you can see what they look like by the time they get to T.P.App.

  • git-bash with lts-12.23 (works):
$ pandoc -t native \\\\server\\share\\path\\test.md --dump-args
-
\\server\share\path\test.md
  • cmd with lts-12.23 (works):
>pandoc -t native \\server\share\path\test.md --dump-args
-
\\server\share\path\test.md

  • git-bash with lts-13.4 (doesn't work):
$ pandoc -t native \\\\server\\share\\path\\test.md --dump-args
-
\\server\share\path\test.md
  • cmd with lts-13.4 (doesn't work):
>pandoc -t native \\server\share\path\test.md --dump-args
-
\\server\share\path\test.md

  • cmd with lts-13.4 (?\UNC trick to make it work):
>pandoc -t native \\?\UNC\server\share\path\test.md --dump-args
-
\\?\UNC\server\share\path\test.md

So the arguments look ok in both cases.

@KeenRivals
Copy link
Author

while

pandoc -t native \\server\path\test.md

did not work, this one does:

pandoc -t native \\?\UNC\server\path\test.md

I tested it with pandoc 2.5, could you try it on your end?

Testing with pandoc 2.5 and using the \?\UNC\ trick works. Windows 10 64-bit, 1809. Using 64-bit version of pandoc.

@jgm
Copy link
Owner

jgm commented Jan 23, 2019 via email

@agusmba
Copy link
Contributor

agusmba commented Jan 23, 2019

I don't know if we could create a very simple minimum failing case in Haskell, compiled with stack, that reads a file passed as argument and spits it back. If we experience the same issue, we could open an issue in ghc and ask if the regression is intended or not. If we don't experience the same issue, we could review what is different in pandoc.

@KeenRivals
Copy link
Author

I fired up Process Monitor and did a quick comparison of Pandoc 2.2.3.2 vs 2.5. One thing that jumped out to me is pandoc 2.5 is adding an extra \ to the UNC path.

Given pandoc -t html "\\example.org\share\test.txt", Pandoc 2.5 looks for \\\example.org\share\test.txt. Pandoc 2.2.3.2 looks for \\example.org\share\test.txt.

@jgm
Copy link
Owner

jgm commented Jan 26, 2019

Quoting from ghc docs linked above:

The NT kernel however allows you ways to opt out of this path preprocessing by the Win32 APIs. This is done by explicitly using the desired namespace in the path.

The namespaces are:

file namespace: \\?\
device namespace: \\.\
NT namespace: \

Each of these turn off path processing completely by the Win32 API and the paths are passed untouched to the filesystem.

OK, that explains why it works when you do \\?\UNC.

Paths with a drive letter are legacy paths. The drive letters are actually meaningless to the kernel. Just like Unix operating systems, drive letters are just a mount point. You can view your mount points by using the mountvol command.

Since GHC 8.6.1, the Haskell I/O manager automatically promotes paths in the legacy format to Win32 file namespace. By default the I/O manager will do two things to your paths:

replace \ with \\
expand relative paths to absolute paths

Does this explain why it adds the extra \?

If you want to opt out of all preprocessing just expliticly use namespaces in your paths. Due to this change, if you need to open raw devices (e.g. COM ports) you need to use the device namespace explicitly. (e.g. \.\COM1). GHC and Haskell programs in general no longer support opening devices in the legacy format.

So my question is whether pandoc should attempt to normalize the paths before using readFile. For example, if the path starts with \\, we could change it to \\?\ -- would this work?

@agusmba
Copy link
Contributor

agusmba commented Jan 26, 2019

So my question is whether pandoc should attempt to normalize the paths before using readFile. For example, if the path starts with \\, we could change it to \\?\ -- would this work?

I'll try to test this out and report back. Not sure if the "UNC" part is necessary or not.

Note, that the check would need to take into account paths that already are in the new format (\\?\... shouldn't be changed).

@agusmba
Copy link
Contributor

agusmba commented Jan 26, 2019

Ok, at least on my win10 box \\?\ is not enough for pandoc 2.5 to work with UNC paths.
I need the full \\?\UNC\...

So if we want to "normalize" the paths on windows we need to check for paths
starting with \\ but not with \\?\UNC\ and add the ?\UNC part.

I see that previosly (with ghc 8.4), we could also use / as a filepath separator on windows, but the UNC workaround prevents these types of paths from working (I guess its because an internal conversion was taking place, but that's prevented with the ? bit). I hope this will be acceptable for those working with pandoc and UNC paths on windows.

jgm added a commit that referenced this issue Jan 27, 2019
When pandoc is compiled with ghc 8.6, Windows paths are treated
differently, and paths beginning `\\server` no longer work.
This commit rewrites such patsh to `\\?\UNC\server` which works.

The change operates at the level of argument parsing, so it
only affects the command line program.

See #5127 and the discussion there.
@jgm
Copy link
Owner

jgm commented Jan 27, 2019

Okay, I've pushed a preliminary change. It seems kludgy, but probably worth doing to keep 8.6- and 8.4-compiled pandoc from behaving too differently.

It would not be hard to convert all forward slashes to backwards slashes in the Windows path normalization; should I add that too?

@agusmba
Copy link
Contributor

agusmba commented Jan 27, 2019

It would not be hard to convert all forward slashes to backwards slashes in the Windows path normalization; should I add that too?

Since / still works as a file separator as long as the path is not UNC, I'd say don't convert them. People dealing with UNC paths on windows would probably be using the standard form with back-slash anyway.

@jgm
Copy link
Owner

jgm commented Jan 27, 2019

OK, I'm considering this issue closed by the fix above.

@jgm jgm closed this as completed Jan 27, 2019
@102jon
Copy link

102jon commented Mar 18, 2019

It looks like this issue is still not resolved. I can see that pandoc is trying to pre-pend the \\?\UNC\ fix to the path, but it still isn't able to fetch the files it needs. Here is what the output looks like:

"C:/PROGRA~1/Pandoc/pandoc" +RTS -K512m -RTS test.utf8.md --to html4 --from markdown+autolink_bare_uris+ascii_identifiers+tex_math_single_backslash+smart --output test.html --email-obfuscation none --self-contained --standalone --section-divs --template "\\server\rlib\packrat\lib\x86_64-w64-mingw32\3.5.1\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable "theme:bootstrap" --include-in-header "C:\Users\jmartin\AppData\Local\Temp\1\Rtmpukbskq\rmarkdown-str25d47ca87fec.html" --mathjax --variable "mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --metadata pagetitle=test.utf8.md Could not fetch http://?/UNC/server/rlib/packrat/lib/x86_64-w64-mingw32/3.5.1/rmarkdown/rmd/h/default.html HttpExceptionRequest Request { host = "" port = 80 secure = False requestHeaders = [] path = "/" queryString = "?/UNC/server/rlib/packrat/lib/x86_64-w64-mingw32/3.5.1/rmarkdown/rmd/h/default.html" method = "GET" proxy = Nothing rawBody = False redirectCount = 10 responseTimeout = ResponseTimeoutDefault requestVersion = HTTP/1.1 } (InvalidDestinationHost "") Error: pandoc document conversion failed with error 61

@jgm
Copy link
Owner

jgm commented Mar 18, 2019

I'll reopen. Unlike the original issue, this concerns the argument to --template. Pandoc is evidently parsing this as a URL.

@jgm jgm reopened this Mar 18, 2019
@jgm jgm added this to the 2.7.1.1 milestone Mar 18, 2019
@jgm jgm closed this as completed in 97acf15 Mar 22, 2019
@jgm
Copy link
Owner

jgm commented Mar 22, 2019

I think this is fixed by 97acf15 but I'd appreciate if someone who uses windows and UNC paths could test. (@102jon)

@jgm
Copy link
Owner

jgm commented Aug 11, 2020

Re-opening this, because the fix doesn't cover all the places where filenames can occur (e.g. not --reference-doc). Probably we should put this code in the basic file manipulation functions in Class, rather than where it is now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants