-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
win32: improve loading files from long paths #12119
Conversation
1a0574a
to
b4c8b43
Compare
b4c8b43
to
2b4b675
Compare
osdep/io.c
Outdated
char *filename_copy = talloc_strdup(talloc_ctx, path); | ||
if (strlen(filename_copy) >= MAX_PATH && | ||
strncmp("\\\\?\\", filename_copy, 4) != 0) { | ||
for (int i = 0; i < strlen(filename_copy); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop condition i < strlen(filename_copy)
does strlen
on each iteration (each char of that string), which makes it O(N^2). Big no.
Similar loop which is more correct would be (untested):
char *s = filename_copy;
while ((s = strchr(s, '/')))
*s++ = '\\';
There might also be some existing string replace function, but not sure about that.
The commit messages are missing an explanation. In general, please explain the exact problem and solution at the commit messages. You can add more "meta" info at the PR web page, but the commit messages should be sufficient to understand what gets fixed, why, and how (where applicable and/or where it's not obvious). The lua and js commits effectively fix identical code in identical way. They can be squashed into a single commit with a single useful commit message, or you can keep them separate, have a useful commit message at the first of them, and at the second refer to the first commit for full explanation.
This conversion tries to avoid touching existing UNC paths, but it only detects one form of a single UNC scheme - "dos drive". E.g. it detects that Also, as far as I know the Without giving it too much thought, I would think that a good enough detection is probably when the string begins with I'm not too thrilled with making a copy unconditionaly even if no change is required, but let's put that asside for now. I think that a useful UNC normalization procedure would be anytime the length reaches MAX_PATH:
I can see the PR changes plain concat of [EDIT] or maybe you mean that
Did you analyze this issue? I'm also asking because while I believe that report, I don't understand why it happens after a quick look at the source code of I don't think I see anywhere that MAX_PATH limit is applied to DIR plus FILE together... But maybe I'm looking at it wrong. I do believe the report, but didn't try it out yet. [EDIT] Ah, I see where it breaks, @rossy thoughts on this PR approach, scope, and details? |
Thanks for very informative comments. I looked again at WinAPI docs and found something that I missed yesterday. Since Windows 10 1607 it should be possible to enable handling of long path without converting them to UNC. It requires change to registry and to application manifest. I'll test that later today Would that be better approach? On other hand it will still be broken for older Windows versions. |
To summarize the changes and addressed issues in this PR so far:
Known issues:
|
If this requires the registry change, then no. mpv won't make that chage, and we can't expect the user to do that either.
True as well, or maybe even prevent running on win7 (not sure, but some manifest changes can make the binary incompatible with some earlier versions of windows). The UNC normalization code is not huge, so if it can be kept small and useful, it should be OK IMO. |
2b4b675
to
a36c424
Compare
Updated code to address issues.
|
OK, assuming this is correct, then I'm not sure it's worth making So it would be a very partial fix without For what it's worth, I don't think Fixing So there's that... Specifically about your changes: I don't think we need the first commit ("player: use mp_path_join to construct path in script_readdir "), because the UNC normalization should ensure there are no double [back]slashes (except at the begining). The logic is this: the UNC normalization takes a path which, if it wasn't too long, would work fine, and then converts it to UNC. UNC paths are more strict than non-UNC paths, and so the normalization process should process them to make them work as UNC. It already converts And so the burden on making it work is on the normalization function, not on the caller which uses a path which is generally correct, but happens to be not strict enough for UNC. So if we add this commit (which I think we should not), it would be after it already works without it because the normalization makes it work, and only if the join function does a better job in some sense than DIR +
Not sure I get this. You want to normalize short relative paths at At which stage exactly is CWD concatenated with the relative path after you normalize it? mpv generally tries very hard to work with the user-provided paths and not extend them in artificial ways. I don't think it cares about CWD when it opens a relative dir. In general, the UNC normalization is very low level at lowest libc wrappers (like And so, if it gets concatenated and becomes too long - then you normalize it when the combined path is used with some low-level API - like Do correct me if I'm wrong.
It seems to me you apply it to all normalized paths, which includes long absolute paths, and all relative paths? I might understand if you do that for long relative paths to make them absolute, so that you can make them UNC, but I don't understand exactly what it does with absolute paths. Does it convert Also, are you sure it's working? from the
But I don't see you prepend How does this length calculation work? size_t buffer_size = strlen(path) + 1;
wchar_t *buffer = talloc_array(talloc_ctx, wchar_t, buffer_size + 4); It looks to me that you pre-allocate for the At the very least, it's not obvious how you go from length of And then, if I'll stop here for now. It's been too long already... |
First commit ("player: use mp_path_join to construct path in script_readdir ") is still needed. When starting file that has long path from explorer it will pass UNC to mpv and Regarding GetFullPathNameW. It also normalizes full path if it is passed, so removes multiple slashes and replaces Regarding CWD + filename. As for usefulness of this change. It may look like partial solution without changing |
All the more reason to ensure it's getting normalized. Many places use Similarly, code which deals with paths doesn't pay much attention to We can't expect every piece of mpv code or 3rd party script to handle all paths with the care which UNC paths require (and if we do expect that, we should be ready for some dissappointments). If a path happens to be UNC, then it needs to be normalized at
Right, missed that. Thanks. I'm pretty sure at the very least it's guaranteed to become longer if it's relative, but not a big issue - it will be called twice. That's OK.
Huh. So the docs on both of these APIs are wrong, and in different ways? The It doesn't mention anywhere (that I could notice) that the length depends on CWD somehow. Do you know whether this "long CWD path can break short relative paths" apply elsewhere too? like in stat, or other APIs which takes paths/files? Or is it only a Can you find any references, preferably official, to this notion other than from your experiments? If eventually we accept that the docs are wrong as described, then this requires a comment at the source code which describes how it works, and why things are needed or not needed, and what we assume/accept/etc, on both those APIs. So the answer to whether normalization is needed would probably be:
Does that sound about right? Then, for the normalization itself, do we need anything other than ensuring it doesn't have If that's about it, then I prefer to do this in mpv code, because it's not entirely clear to me what Except for relative paths (which are too long with CWD), because making an absolute path from a relative one is, IMHO, best left for official APIs.
It doesn't look partial. It is very much partial.
That is probably true.
Sure, but you do realize it's a very very narrow use case of "support long paths" - where the path is just shy of MAX_PATH, and when adding few more chars of the filename then it becomes bigger than MAX_PATH. If the original path was only few chars longer then the solution in this PR won't help it, and if it was only few chars shorter then there probably wouldn't be any issue to begin with. To me, it's so narrow that it couldn't be seriously considered a solution, not even partial. To work (with directory listings, and opening a directory) It needs the stars to align way more than they are likely to be aligned. So I'd consider directory listing generally broken with long paths, except for some very lucky coincidences. But we could say that it solves UNC inputs, and that it solves long absolute paths, and that it solves relative paths which are too long when considering CWD. And this does have value, so let's continue with that premise. |
a36c424
to
dcb0afa
Compare
You're right. I did some experiments and it seems
I think the link I posted in previous comment to issue comment on official dotnet repo by Microsoft employee seems to be closest to official information. WinAPI docs seem to omit weird edge cases and have some parts of description copy-pasted. |
Allow mp_open and mp_stat to work with paths which reach or exceed MAX_PATH, by normalizing and changing them to UNC if needed. This allows to open long absolute dos paths from CLI or the client interface. Relative paths are also converted because CreateFileW fails when cwd + path >= MAX_PATH. Using GetFullPathNameW has also nice side effect of removing duplicated path separators and converting them to \.
dcb0afa
to
533eaf1
Compare
As I said, mpv tries very hard to not mess with paths more than necessary, and when we do mess with them, it should be as minimal and controlled as possible, therefore I disagree with that statement.
First of all, this doesn't actually need normalization, right? But if it had UNC prefix, or if it was too long, then the only way it's broken is that "it has So please first reply to my question if the suggested normalization test and procedure cover the cases we know about, and if not - what's missing. For reference, here's a possible implementation of that procedure (without prepending // convert all '/' to '\\', and then replace any sequence of '\\' with
// a single '\\' - except at the begining where up to two are allowed.
void normalize_path_construction(char *path)
{
char *s, *out;
for (s = path; (s = strchr(s, '/')); *s++ = '\\') /* empty */;
if (*path && (out = s = strstr(path + 1, "\\\\"))) {
while ((*out++ = *s++)) {
while (*s == '\\' && s[-1] == '\\')
++s;
}
}
}
mpv doesn't do that. How is this relevant?
Here's how I interpret this statement: Calling If that's what you meant, then we should first try to fix If that's not what you meant, then please be very specific on what you mean, and what exactly it implies about the solution. Also, as I said, any implementation detail which is based on undocumented assumptions or knowledge must be explained in a comment (not the commit message), together with the best references we can find (links, etc), or else it will not be possible in the future to fix issues, because it would look incorrect by the official docs, and the actual assumptions for the code would be unknown and unverifiable. According to you, these are the undocumented things the implementation is based on (feel free make corrections or add more):
Also, extrapolating from the |
Same as with UNC, needs additional path resolution if too long
How would you like that estimation to work?
I think those are all possible cases. So should mpv have code that would also do full path resolution, or should
It is not broken. It seems to be internal Windows limitation that process can't be started with or have CWD longer than
Explorer will always work because it uses absolute path. In case of CLI it will only work with absolute paths. It will never work with relative paths due to Windows limitation. Explorer and Powershell seem to be aware of that limitation and just change CWD to be able to start new process. So
I also did some research on alternate |
Please separate two things: normalization-test, and normalization-procedure. The goal of the test is to rule out cases which don't need any modifications of the path (normalization). It's allowed to err on the side of caution, i.e. decide that something needs normalization even if it actually doesn't (but preferably that's won't happen much). Currently mpv doesn't normalize paths and it generally works well, so hopefully the test will be able to rule out the majiority of cases from the need for normalization. That's the goal with the test - to rule out the need for normalization, hopefully most of the time. Then there's the normalization procedure, which modifies the path, if needed, so that it can be used with the target underlaying API (assuming the path actually exists/valid/etc). So in your reply to my questions about the test and normalization, you added the fact that normalized (UNC) path also cannot have relative components ( That's fine. For now, let's say the normalization procedure is (logically) this:
Is this a good normalization procedure in your opinion which works as good as possible with all paths? (UNC and not-UNC, absolute/relaive, short/long, etc.) Or is there anything missing here? at which case, what would be a better procedure? Now that we have a good normalization procedure, let's get back to the test which decides whether to normalize. Let's ignore relative paths for now, because this seems to be a big subject on its own. So the overall test might be:
Assuming the normalization procedure is as described above (and the relative-path test and normalization work), does this test cover our test needs? most importantly, is there any case where it will decide that there's no need for normalization but it actually won't work without normalization? Now that we have an outline of the test, let's get back to the relative path thing.
That's prettty much what I had in mind, yes. something like if (relative && (CWDLENGTH + 1 + strlen(path) >= MAX_PATH))
// we need to normalize
This is where I get a bit lost, so first, some focus. For the sake of this discussion, let's assume the mpv binary is Now, there's no problem with So our problematic use case is relative path (e.g. of a media file), and where the current working dir is (or we expect it to be) longer than MAX_PATH, and where we need to decide whether the relative path needs normalization, right?
Do correct me if I'm wrong, but when launching a media file in mpv from explorer (double clicking the media if mpv is associated with the type, or draging a media file or directory onto mpv.exe), then the media path[s] are always absolute, so this is irrelevant to the case of relative media/file path with long CWD, right? Or am I missing the relevance?
So you're saying that you can't Or that once you cd into a long path, you can't launch any process, not even a trivial hello-world C program? Or that you can launch some programs but not mpv?
Am I correct to assume that in such case normalization won't help either, because if Basically, our ability to normalize a relative path depends on whether And so the bottom line is that there's no realistic use case where mpv both needs and can deal with long CWD and relative path (explorer always launches with absolute paths, CMD.exe can't launch mpv at all, and with powershell normalization doesn't help anyway). The only case where mpv can test and/or normalize it is when Apparently, currently it only works when CWDLENGTH < MAX_PATH. And so, as far as I can tell, the only relative-paths case we can improve compared to the current behavior is when CWD is shorter than MAX_PATH, but CWD + / + FILE is longer than MAX_PATH? And for this case (and any other case where if (relative && (strlen(getcwd(...)) + 1 + strlen(path) >= MAX_PATH))
// need to normalize Do you agree? Does this conclude the discussion about normalization test and procedure? If not, what's missing?
Nice, but forget it for now. It only adds noise. Let's assume that it works with the same limitations as
So, you say this test and maybe normalization would need to happen at Let's say we did all this, and now let's see what we've gained (do correct me if I'm wrong):
Is that about right? |
Wait a minute, how does one even create a dir-chain longer than 260? I can't do that in windows exporer, and I can't do that in cmd.exe. In explorer, if I create this dir at the root of some NTFS drive (that's 203 chars long) and then inside this dir I try to create a dir longer than ~50 chars then explorer truncates it. In cmd.exe it refuses to create long dir inside it. How realistic is this long-path use case at all? What are the use cases? how do users end up with long paths? |
It can be done with PowerShell |
What if we just enable long paths in manifset ( |
Is this actually true? can mpv be launched with an absolute path to a media file where the absolute dir chain (before the final I have a hunch that as is, the most it can potentially improve is cases which previously were limited to ABS_DIR + FILE < 260 and now they would become limited a bit less - to ABS_DIR < 260. I.e. that if the sbsolute dir is more than 260 then it just won't work even with the PR. And the reason it might not work, is maybe because mpv doesn't have the manifest thingy to enable long paths, and so it's still limited to dir-chains of at most 260 chars, even if the system has the registry thing to enable long paths. If indeed that's the case, then I imagine
I don't think it's an instead thing. My hunch is that long UNC paths (where the dir chain > 260) just wouldn't work without the manifest. |
Long correct UNC paths work in current mpv master, but only to load file. I didn't test manifest thing, but if I understand correctly it would make |
UNC paths supported longer paths since ever. The manifest and registry opt-in for long paths is for common file APIs to make them work with longer paths. This is a new thing, since Windows 10 1607 and opt-in for application is done only for backward compatibility, so that application have to explicit say it is ok, I can work with longer paths.
That's correct. I think manifest opt-in is a clear way to support it instead of converting everything to UNC. |
With the spate of recent commits improving mpv on Windows (Window affinity, etc.), wouldn't it be time to finally address this one here? The relevant link to the WinAPI docs has been posted earlier in this thread here, but here is it again, direct to the relevant section:
Yes, this is also how I understood this. We don't have to deal with extended-length paths by using the First of all, I agree that mpv never should have to deal with the Windows registry, let alone make changes to it, god forbid. [1] But we don't have to care about this, let that be the responsibility of the user. All that needs to be done, as I understand it, is to embed the application manifest as a resource in [1]But, on the other hand, I think a very legitimate argument could be made in at least support of mpv doing a simple check here, and then showing a warning to the terminal, or something. I mean, mpv should already have everything included to do this. We have the (Get-ItemProperty 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem\' -Name 'LongPathsEnabled' -ErrorAction Ignore).LongPathsEnabled If this returns |
I found this official documentation that should clear some things up:
|
Long paths support has been enabled in #13134 |
Long path support on Windows in MPV didn't work in all scenarios:
To fix that, function that converts to UNC path when needed was added. It appends
\\?\
and converts/
to\
. It is used to convert file path before passing it toCreateFileW
.Lua and JS
mp.utils.split_path
returns directory with trailing slash.mp.utils.readdir
used simple string append which caused double slash in path. UNC paths do not support this, somp_path_join
was used instead.