Skip to content
This repository has been archived by the owner on Jul 30, 2020. It is now read-only.

Windows paths are not always properly normalized #543

Open
Boddlnagg opened this issue Mar 21, 2018 · 30 comments
Open

Windows paths are not always properly normalized #543

Boddlnagg opened this issue Mar 21, 2018 · 30 comments

Comments

@Boddlnagg
Copy link

I'm having a problem where cquery is not able to find certain header files, and I'm suspecting that the reason has something to do with paths in Windows (maybe case-insensitivity?).

I found that all the files where it does not work have their cache files in
c@@Develop@git@src with relative file paths within, while those that work have their cache files in @c@@Develop@git@src with absolute file paths (such as C@@Develop@git@src@some_file.cpp) within.

And I'm getting messages of the form:

[Error - 09:20:41] Request textDocument/codeLens failed.
  Message: Unable to find file c:/Develop/git/src/some_file.cpp
  Code: -32603 

... although c:/Develop/git/src/some_file.cpp exists.

@jacobdufault
Copy link
Owner

Message: Unable to find file c:/Develop/git/src/some_file.cpp

We should change this message to be more clear, ie, 'File <foo.cc> is not indexed'. Are you blacklisting certain files from being indexed? A header file is only indexed if a cc file in the project which is indexed includes it.

@Boddlnagg
Copy link
Author

I don't even know how I would blacklist a file from being indexed. The message appears for cc/cpp files, which are explicitly listed in compile_commands.json.

@Boddlnagg
Copy link
Author

Boddlnagg commented Mar 21, 2018

Another interesting observation: Things break only after I open the affected files in VS Code. When cquery starts indexing and I don't yet have the files open, I see the correct warning for the files, which means that the files can be analyzed correctly (and all headers are found). Only once I open the file in the editor, something breaks and the file becomes red all over because of header files which cannot be found.

@jacobdufault
Copy link
Owner

Can you reproduce on a small sample project that I can try locally?

@Boddlnagg
Copy link
Author

Boddlnagg commented Mar 22, 2018

Probably not, but I have some more information, which hopefully helps you to find the root cause:

As I said, the cache file that's working is in @c@@Develop@git@src\C@@Develop@git@src@source_file.cpp, and the one that does not work is in c@@Develop@git@src\source_file.cpp. I haven't yet understood under what conditions cquery/VSCode loads the one or the other, but it seems that it loads the bad file when I actually open source_file.cpp in the editor (but sometimes after a while it "fixes itself" and loads the correct one??).

I analyzed the difference between these two json files:

  • The good one has absolute paths starting with C:/, whereas the bad one has them starting with c:/
  • The bad one contains different (wrong!) args (see below)
  • The "includes" array contains all the included headers in the good one, but only some of them in the bad one. The ones that are missing are exactly those of which VSCode then reports that it can't find them (and subsequent errors in the file are reported for things that refer to definitions out of those headers).
  • The comments field in the types array contains has newlines encoded as \n in the good one, and \r\n\ in the bad one (and the bad one is missing types from the headers which it could not find).

Further analyzing the args:
What's in compile_commands.json:

{
    "directory": "C:\\Develop\\git\\out",
    "command": "cl.exe ... some other args ... -c -Fosource_file.obj C:/Develop/git/src/source_file.cpp",
    "file": "C:/Develop/git/src/source_file.cpp"
}

What's in the "good" cache file:

[
"cl.exe",
"-working-directory=C:\\Develop\\git\\out",
... some other args ...
"-FC:/Develop/git/out/osource_file.obj",
"C:/Develop/git/src/source_file.cpp",
"-resource-dir=C:/Develop/cquery/build/release/lib/LLVM-6.0.0-win64/lib/clang/6.0.0",
"-Wno-unknown-warning-option",
"-fparse-all-comments"
]

What's in the bad cache file:

[
"cl.exe",
"-working-directory=C:\\Develop\\git\\out",
"-fms-extensions",
"-fms-compatibility",
"-fdelayed-template-parsing",
... some other args taken from a different TU `other_source_file.cpp` from the same compile_commands.json ...
"-FC:/Develop/git/out/oother_source_file.obj",
"c:/Develop/git/src/source_file.cpp",
"-resource-dir=C:/Develop/cquery/build/release/lib/LLVM-6.0.0-win64/lib/clang/6.0.0",
"-Wno-unknown-warning-option",
"-fparse-all-comments"
]

There are at least two problems here:

  • The bad one contains arguments from another TU, which is ... weird
  • The -Fo parameter is parsed incorrectly (it is parsed as -F with o being part of its value, which turns -Fosource_file.obj into -FC:/Develop/git/out/osource_file.obj instead of -FoC:/Develop/git/out/source_file.obj)

@dschaefer
Copy link

Saw the same thing, I changed the case of C: in the compile_commands.json file to c: and it got further. Doesn't look like the lookup is taking case insensitivity on Windows (and to some degree on Mac) into account. But haven't dug further into it yet.

@Boddlnagg
Copy link
Author

Indeed, when I change the case in compile_commands.json, it seems to work.

@jacobdufault
Copy link
Owner

I'm a bit surprised changing the case fixes this, but it sounds like we're missing normalization somewhere. Maybe project.cc is not normalizing.

@jacobdufault jacobdufault changed the title Problem with paths in Windows Windows paths are not always properly normalized Mar 26, 2018
@jacobdufault
Copy link
Owner

There have been a bunch of normalization changes. Please reopen if this is still an issue.

@Boddlnagg
Copy link
Author

I just pulled from master, and the issue perists:

I'm still getting messages like Unable to find file c:/Develop/git/source_file.cpp, even though the file exists, and analyzing it fails because of missing headers (just like before).

Postprocessing compile_commands.json with sed --expression='s/C:\//c:\//g' makes everything work.

I'm also getting warnings Failed to normalize C:\Develop\git\out/osource_file.obj (related to wrong parsing of the -Fo parameter as mentioned above). I don't know if these have been there before or not.

@Boddlnagg
Copy link
Author

Okay, I was finally able to build an minimal example to reproduce this (it's slightly extended from the one for #594):

cquery-test.zip

The point is that compile_commands.json (with uppercase C:/) doesn't work, while the version in compile_commands.fixed.json (with lowercase c:/) does work.

"doesn't work" means that I get Message: Unable to find file c:/Develop/cquery-test/src/test.cpp even though that file exists. In this small example it doesn't lead to headers not being found, but I guess that you'll still be able to analyze the root cause with this reproduction.

@decimad
Copy link

decimad commented Apr 9, 2018

message_handler.cc:~65 does use a "NormalizedPath", which just doesn't normalize at all, consequently failing for files with "incorrect" casing... VScode seems to supply lower case drive letters to the language server when opening/switching focus to source files
Even if it was used, the true normalization function is not yet robust enough to handle all this on windows , currently either.

@jacobdufault
Copy link
Owner

The most robust fix is most likely to update paths on output to match what we received on input, even if internally we are storing something else.

@decimad
Copy link

decimad commented Apr 9, 2018

I had the impression that this normalization mainly serves the purpose for efficient lookup in the file<->index lookups?

@jacobdufault
Copy link
Owner

jacobdufault commented Apr 10, 2018

Yep, that's right. But we get paths from different sources (libclang, compile_commands.json, editor) which may spell the file differently (ie, casing, .., etc), and what we really need to do is match the client, but we cannot do that completely robustly without updating paths on output based on editor input.

@decimad
Copy link

decimad commented Apr 11, 2018

Would that be problematic to do reactively with a reverse lookup because cquery has to supply paths it didn't yet hear from the client? I remember that there were times with the ms tools where I looked up a definition and it would show them in a different tab than the already opened, just that the paths were displayed differently. I wonder if it's really the language server who should handle this, or if that's a problem only the client can truly solve.

@jacobdufault
Copy link
Owner

This is an issue in the client cquery is working around. Typically cquery has been willing to work around external bugs to provide a better end-user experience.

Would that be problematic to do reactively with a reverse lookup because cquery has to supply paths it didn't yet hear from the client?

Sorry, I'm having trouble parsing this. But for paths we haven't seen yet from the client, we will supply what the OS realpath is (hoping that matches with the client).

@decimad
Copy link

decimad commented Apr 11, 2018

You seem to have parsed it just like I meant it to be parsed ;) You probably know it, but maybe there is a way to query certain clients for how they represent paths to files within the opened folders, and then do a client-specific normalization on the path server->client?

@jacobdufault
Copy link
Owner

and then do a client-specific normalization on the path server->client

The only broken client I'm aware of is vscode; it should be feasible to do this client-side.

@jacobdufault
Copy link
Owner

(if it ends up that only vscode is affected by this, I'm good with handling it client-side)

@decimad
Copy link

decimad commented Apr 11, 2018

That sounds reasonable. Are the vscode people aware of this?

@Boddlnagg
Copy link
Author

There is microsoft/vscode#12448, which might be related

@decimad
Copy link

decimad commented Apr 16, 2018

Okay, the driver letter needs to be normalized by hand - in my experimental build I do this in a not so robust way, but it does the trick, I've had no cquery-side errors anymore. If no one else overtakes me, I'll see that better normalization is part of the unqlite pr... As to the vscode side, I don't really know much about ts/js to be of any help there. But paths really need to be prepared for vscode to consume as well. Since the issue is open for ages, I don't count on them doing anything about it anytime soon....

@jacobdufault
Copy link
Owner

Please give this a try again on the latest master. It should be fixed by b728cc0. Reopen if not. Thanks!

@Boddlnagg
Copy link
Author

I have tried the latest master, and now there's a somewhat different problem:

When I use "go to definition" on something which is defined in different file, the file that's opened is C:\path\to\file even though the file path might actually be C:\Path\to\File. When the currently opened project folder in VS Code is C:\Path, it will recognize C:\Path\to\File as belonging to that project, but not C:\path\to\file. So as far as cquery is concerned, it will now find all the files, but lowercasing everything breaks VS Code even more than before, because it considers paths with different casing as two different files (microsoft/vscode#12448 strikes back).

I admit that this should be fixed in VS Code and I'm not sure if cquery can improve the situation ...

jacobdufault added a commit that referenced this issue May 8, 2018
@jacobdufault
Copy link
Owner

I've pushed a commit that should help with that; let me know if there are still issues.

@Boddlnagg
Copy link
Author

The problem persists in master. It also materializes in another way: The Problems view in VS Code shows warnings for files that are not opened (I don't know if that's intended, but I actually like it), but the paths of those files also have the wrong casing: They are shown as C:\path\to\file instead of C:\Path\to\File.

Furthermore, when I look at the cquery output inside VS Code ("Output" panel), I can see cl.exe invocations with mixed casing: There are some -I flags with fully lowercased paths like -Ic:\path\to\include\directory (including the drive letter!) and in the same invocation other flags with original casing like -IC:\Path\to\More\includes. The difference seems to be that the directories which are not all-lowercased don't actually exist. (In compile_commands.json the paths have their original correct casing.)

jacobdufault added a commit that referenced this issue May 13, 2018
@jacobdufault
Copy link
Owner

Can you give it another try?

@jacobdufault jacobdufault reopened this May 13, 2018
@Boddlnagg
Copy link
Author

I don't notice any change ... the problem persists.

Actually, the behavior before the paths were normalized to all-lowercase was better, because the problems with VS Code didn't occur. There was only the problem that I needed to preprocess compile_commands.json, but I'd prefer that, if you can't fix it in a better way ... a real fix probably needs to be done in VS Code itself.

@jacobdufault
Copy link
Owner

Yea, unfortunately I'm not sure what else I can do on cquery's side.

decimad pushed a commit to decimad/cquery that referenced this issue Jul 13, 2018
decimad pushed a commit to decimad/cquery that referenced this issue Jul 13, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants