Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF Characters for relative path #27534

Closed
JSMonk opened this issue May 2, 2019 · 11 comments · Fixed by #53991
Closed

UTF Characters for relative path #27534

JSMonk opened this issue May 2, 2019 · 11 comments · Fixed by #53991
Labels
confirmed-bug Issues with confirmed bugs. path Issues and PRs related to the path subsystem. windows Issues and PRs related to the Windows platform.

Comments

@JSMonk
Copy link

JSMonk commented May 2, 2019

Version: 10.15.3
Platform: Windows 64bit
Subsystem: "path" module

const path = require("path");

const rootPath = "İ"; // Turkish character, char code 304
const absolute = "c:\\Users\\root\\Documents\\İ\\test.txt";

const relative = path.relative(rootPath, absolute); // est.txt

Expected result: test.txt
Actual result: est.txt

@JSMonk JSMonk changed the title UTF Character for relative path UTF Characters for relative path May 2, 2019
@targos
Copy link
Member

targos commented May 3, 2019

The problem here is that path.relative computes everything using the lowercase versions of the paths but returns the result using the original case.
"İ" has one char code while "İ".toLowerCase() has two.

There's an easy fix for this case:

console.log(path.win32.relative('c:\\a\\İ', 'c:\\a\\İ\\test.txt'));
// est.txt
diff --git a/lib/path.js b/lib/path.js
index 2301a6ebb8..e76513f0f3 100644
--- a/lib/path.js
+++ b/lib/path.js
@@ -499,7 +499,7 @@ const win32 = {
         if (to.charCodeAt(toStart + i) === CHAR_BACKWARD_SLASH) {
           // We get here if `from` is the exact base path for `to`.
           // For example: from='C:\\foo\\bar'; to='C:\\foo\\bar\\baz'
-          return toOrig.slice(toStart + i + 1);
+          return toOrig.slice(toStart + i + 1 - from.length + fromOrig.length);
         }
         if (i === 2) {
           // We get here if `from` is the device root.
console.log(path.win32.relative('c:\\a\\İ', 'c:\\a\\İ\\test.txt'));
// test.txt

However this case is trickier:

console.log(path.win32.relative('c:\\İ\\a\\İ', 'c:\\İ\\b\\İ\\test.txt'));
// ..\..b\İ\test.txt

@targos targos added confirmed-bug Issues with confirmed bugs. path Issues and PRs related to the path subsystem. windows Issues and PRs related to the Windows platform. labels May 3, 2019
@targos
Copy link
Member

targos commented May 3, 2019

/cc @BridgeAR @mscdex @cjihrig

@BridgeAR
Copy link
Member

BridgeAR commented May 3, 2019

This should be easy to fix: instead of using the lower cased variation it's possible to use the upper case one. These two ways do not work identical and the upper case variation does not show the negative sides as the lower case. I am compiling right now to see if that works as expected.

@BridgeAR
Copy link
Member

BridgeAR commented May 3, 2019

> console.log(path.win32.relative('c:\\a\\İ', 'c:\\a\\İ\\test.txt'));
test.txt
> console.log(path.win32.relative('c:\\İ\\a\\İ', 'c:\\İ\\b\\İ\\test.txt'));
..\..\b\İ\test.txt

@targos
Copy link
Member

targos commented May 3, 2019

@BridgeAR what about: console.log(path.win32.relative('c:\\İ\\a\\İ'.toLowerCase(), 'c:\\İ\\b\\İ\\test.txt'.toLowerCase())); ?

@targos
Copy link
Member

targos commented May 3, 2019

Sorry ^. That's not the example I meant. Isn't there the same risk with toUpperCase but with other characters?

@targos
Copy link
Member

targos commented May 3, 2019

@BridgeAR Make sure this case works as well: console.log(path.win32.relative('c:\\ß\\a\\ß', 'c:\\ß\\b\\ß\\test.txt'));

@BridgeAR
Copy link
Member

BridgeAR commented May 3, 2019

@targos ß is converted to SS when using toUpperCase, so it has a similar problem like the initial issue. Unicode is pretty ugly and we have problems with Turkish, German and likely more. There is the toLocaleLowerCase() function and it should work for all cases.

@mathiasbynens PTAL.

@targos
Copy link
Member

targos commented May 3, 2019

I'm afraid the behavior of toLocaleLowerCase() depends on the user's locale.

@BridgeAR
Copy link
Member

BridgeAR commented May 3, 2019

@targos yes, that's correct. As far as I know a device name may also contain unicode characters and if that's the case also path.win32.resolve() would fail (and likely more).

I have to think about this a bit closer when I find to do so. We probably also have issues even without changing the casing since we expect characters to always have a length of 1. This is mostly not an issue, since we almost always just check for slashes and take everything in-between but we expect the device root to always be of length 2.

mscdex added a commit to mscdex/io.js that referenced this issue May 12, 2019
monoblaine added a commit to monoblaine/node that referenced this issue Jul 7, 2020
This commit changes the way two paths are compared in path.relative:
Instead of comparing each char code in path strings one by one, which
causes problems when the number of char codes in lowercased path string
does not match the original one (e.g. path contains certain Unicode
characters like 'İ'), it now splits the path string by backslash and
compares the parts instead.

Fixes: nodejs#27534
@sr258
Copy link

sr258 commented Aug 18, 2021

This is still an issue that can lead to bugs in August 2021. Any news?

monoblaine added a commit to monoblaine/node that referenced this issue Jul 20, 2024
This commit changes the way two paths are compared in path.relative:
Instead of comparing each char code in path strings one by one, which
causes problems when the number of char codes in lowercased path string
does not match the original one (e.g. path contains certain Unicode
characters like 'İ'), it now splits the path string by backslash and
compares the parts instead.

Fixes: nodejs#27534
monoblaine added a commit to monoblaine/node that referenced this issue Jul 20, 2024
This commit changes the way two paths are compared in path.relative:
Instead of comparing each char code in path strings one by one, which
causes problems when the number of char codes in lowercased path string
does not match the original one (e.g. path contains certain Unicode
characters like 'İ'), it now splits the path string by backslash and
compares the parts instead.

Fixes: nodejs#27534
targos pushed a commit that referenced this issue Aug 14, 2024
PR-URL: #53991
Fixes: #27534
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: James M Snell <jasnell@gmail.com>
targos pushed a commit that referenced this issue Sep 21, 2024
PR-URL: #53991
Fixes: #27534
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: James M Snell <jasnell@gmail.com>
targos pushed a commit that referenced this issue Oct 2, 2024
PR-URL: #53991
Fixes: #27534
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. path Issues and PRs related to the path subsystem. windows Issues and PRs related to the Windows platform.
Projects
None yet
4 participants