-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguish between URL paths and file paths at the type level #881
Conversation
82b8c8e
to
7221d92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two small questions for now
export function unFilePath(path: FilePath | string): string { | ||
return path as unknown as string; | ||
} | ||
|
||
export function unUrlPath(path: UrlPath | string): string { | ||
return path as unknown as string; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these no-op functions are here only for the sake of typing (or, untyping), there should be a better way to do this with typescript only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that I was disappointed about was that I couldn't figure out how to safely tell TypeScript that UrlPath
and FilePath
were actually just type safe strings. If we remove the overrides of replace
and similar in the definitions of the path types, it can understand it, but then we erase types when working on the path.
I think if we wanted to eliminate these functions, we'd have to inline the as unknown as string
text everywhere we call this function. I chose not to do that because I treat as
-casts as dangerous code. I'd prefer to isolate it to as few places as possible. Then those isolated places can be thoroughly vetted, either through tests or through reading the code. In this case the cast is trivial to verify, so just having this function is enough.
I agree though that I would prefer some better way to do this, like maybe a "type level function". I don't know of one though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the wrapping worries me, but I can't express why. It feels excessive or invasive maybe? I'm definitely wishing we didn't have to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern, of having a new type that is just an existing type with a new name is something that I've used in other languages. In those languages there are often nice features built into the language that help make this easier, like implicit conversion, operator overloading, and monad-style-map. TypeScript doesn't really give us any of those things, so it is all a bit clunky.
I could imagine taking out most of these wrappers now (except for the places where we need slash conversion). I don't know how we would prevent new changes to the code base from getting it wrong in the future. Maybe just test coverage and Windows in CI? That feels much less robust than TypeScript to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, this PR includes 3 changes that are almost independent:
- generalizes some system call (
rimraf
andcross-env
) - types all paths as either
filePath
orurlPath
- carries
path.sep
into the internal representation of file paths
(It also adds platform-specific tests—in CI, and in unit tests.)
Generalizations/abstractions seem like the right thing to do. Adding platform-specific tests is obviously needed.
For step 2, I think it's great that you're typing these paths as "filesystem path" or "url path", as it helps bring clarity to the code, where it's easy otherwise to get a bit confused between what the paths are for. It certainly helps with step 3. However in my mind it's almost an orthogonal concern.
Step 3 feels overwhelming. Our app code now sees different strings depending on the OS it runs on, and has to reason about them differently. On windows, file paths will be a string docs\sub\file.md
, and on linux an macos it will be docs/sub/file.md
. This representation is carried throughout, meaning that at every step we need to convert between filepath and urlpath (which always use /
). In a sense, now every line of code needs to know about windows.
Could we consider an approach where we would normalize the internal representation of all paths to forward slashes, and adapt/convert only when talking to the filesystem? Basically an abstraction layer on top of fs that would use /
-delimited paths as inputs and outputs, restricting the use of \
to the underlying OS calls.
An analogy that comes to mind is to use an ORM if we want to talk to different database flavors. If that is possible, it means we could do a much smaller step 3.
More importantly, when some logic needs to reason about a path, say "if this file is foo/bar, do something", it can be written directly rather than having to call conversion functions. For a concrete example: when we create the search index (in search.ts
), we attribute an id
to each page, which corresponds to its relative path to the docs root. That id is also used as a link[href]
when we present the results in the browser. I’d rather keep this logic insulated from having to care about windows-style paths.
If that approach proves possible, we could even skip step 2 altogether for now (and do it separately later, independently of windows support).
We could also land step 1 separately.
Finally, abstracting fs
might also help with unicode file names (for international text or even emoji #783).
Thanks for the comments, Fil. Your first two summary points I agree with. I'm not sure if it makes sense to split 1 out, since the project is still quite broken even with those changes in place. I don't quite agree about the third though. I like the idea of using internally normalized paths. For me the biggest hurdle is that it would require writing wrappers around all of The other concern I have is about the appropriateness of abstraction. Framework has to deal with both files and HTTP very directly. There are a few places where we need to understand the differences between win32 and posix paths. I'm not sure if that is fundamental complexity, or if we would prefer to work in an abstracted space where we don't have to worry about it. Given that, I plan to continue with this "slash-aware" version, and get it passing in CI. After I'm done with that, I can explore the idea of internally normalized paths, if someone else doesn't beat me to it. Regarding the |
Claro. (I did not mean to imply that it was working before.) My question is whether we can control it at the file system interface, and continue ignoring it in the app code (but, confident in that it is safe and controlled). |
const time = Date.UTC(2023, 11, 1) / 1000; | ||
await utimes(loader.path, atime, time); | ||
// remove the cache set by another test (unless we it.only this test). | ||
try { | ||
await unlink("test/.observablehq/cache/dataloaders/data1.txt"); | ||
} catch { | ||
// ignore; | ||
} | ||
// populate the cache (missing) | ||
await loader.load(outputEffects); | ||
// run again (fresh) | ||
await loader.load(outputEffects); | ||
// touch the loader | ||
await utimes(loader.path, atime, Date.now() + 100); | ||
// run it with useStale=true (using stale) | ||
const loader2 = Loader.find("test", "dataloaders/data1.txt", {useStale: true})!; | ||
await loader2.load(outputEffects); | ||
// run it with useStale=false (stale) | ||
await loader.load(outputEffects); | ||
// revert the loader to its original mtime | ||
await utimes(loader.path, atime, mtime); | ||
assert.deepStrictEqual( | ||
// eslint-disable-next-line no-control-regex | ||
out.map((l) => l.replaceAll(/\x1b\[[0-9]+m/g, "")), | ||
[ | ||
"load test/dataloaders/data1.txt.js → ", | ||
"[missing] ", | ||
"load test/dataloaders/data1.txt.js → ", | ||
"[fresh] ", | ||
"load test/dataloaders/data1.txt.js → ", | ||
"[using stale] ", | ||
"load test/dataloaders/data1.txt.js → ", | ||
"[stale] " | ||
] | ||
); | ||
}); | ||
}; | ||
const loader = Loader.find(FilePath("test"), FilePath("dataloaders/data1.txt"))!; | ||
// save the loader times. | ||
const {atime, mtime} = await stat(loader.path); | ||
// set the loader mtime to Dec. 1st, 2023. | ||
const time = new Date("2023-11-01"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this from "seconds since epoch" to a Date because non-date values throw an error on Windows: nodejs/node#5561 (comment)
This reverts commit 1b6fdbc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I converted this file to use LF endings instead of CRLF. If modifying this file isn't ok, I could add an exception to .gitattributes
for it.
With this branch, I was able to create, preview, build, login, and deploy in Windows. It's a big change, but I think most of the changes only affect their immediate area. It works for me, and it works on CI, though I would certainly appreciate more testing on this, both on Windows machines as well as Non-Windows. I'm going to start another PR based on this to explore Fil's idea of using forward slashes everywhere internally, and only converting to OS-correct paths at the last moment. |
bin/observable-init.js
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can sort of use shebangs on Windows, but we can't do complicated things like pass arguments with -S
. I think that this new version is equivalent to what we had before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a part of my .gitattributes changes, I converted the line endings of this file (and the next one) from \r\n
to \n
. I can put them back and update .gitattributes if it matters.
if (!color) color = (s) => s; | ||
if (!color) color = (s) => `${s}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lets us directly print branded strings, since they implement { toString(): string }
.
cwd() { | ||
return join("/", "opt", "projects", "acme-bi"); | ||
// it is an important detail that this is not inside the home dir | ||
return isWindows ? fileJoin("D:", "Projects", "acme-bi") : fileJoin("/", "opt", "projects", "acme-bi"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fil this is one thing I'm not sure how to handle in a "normalized internal paths" approach. Would we use hybrid Windows/Posix paths, like C:/Projects/acme-bi
?
export function unFilePath(path: FilePath | string): string { | ||
return path as unknown as string; | ||
} | ||
|
||
export function unUrlPath(path: UrlPath | string): string { | ||
return path as unknown as string; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern, of having a new type that is just an existing type with a new name is something that I've used in other languages. In those languages there are often nice features built into the language that help make this easier, like implicit conversion, operator overloading, and monad-style-map. TypeScript doesn't really give us any of those things, so it is all a bit clunky.
I could imagine taking out most of these wrappers now (except for the places where we need slash conversion). I don't know how we would prevent new changes to the code base from getting it wrong in the future. Maybe just test coverage and Windows in CI? That feels much less robust than TypeScript to me.
Just a lurker here - and don't want to get into the whole functional v OOP debate - but wouldn't this change be a lot cleaner with a FilePath/URLPath Classes (or more use of the Node URL Class)? |
@GordonSmith thanks for the comments. I'm not sure if a class based approach would help. The original intent was to do this entirely at the TypeScript, except where conversion was needed. It would be nice if this had as little runtime overhead as possible. I'm not sure how well that worked out though. I think that a class would only be a change of "spelling" without really changing the overall cleanliness. I'm working on an alternative approach that fill Fil recommended that doesn't use any wrapping at all. I think it is promising, but we'll see once I get it working. As for the URL class, as far as I know that requires full, absolute URLs, including hostname. I don't think that would work well here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should disable autocrlf.
* text=false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to remove our runtime dependency on tsx. #850 Also, we’ll need the same fix in the @observablehq/create
repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s leave these as-is to minimize this PR.
@@ -92,6 +94,7 @@ | |||
"c8": "^8.0.1", | |||
"chai": "^4.3.10", | |||
"chai-http": "^4.4.0", | |||
"cross-env": "^7.0.3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is in dependencies
already, we shouldn’t also need it in devDependencies
.
"cross-env": "^7.0.3", |
Closing in favor of #944. |
This is an attempt to improved Windows support. The idea here is that we can use TypeScript to tag some strings as being either
FilePath
s orUrlPath
s. From there we can make a series of helpers around the Node filesystem and path functions that are stricter about which one they take.I'm putting this up as a draft now to get some early feedback about this approach. I think there is some inherent boilerplate here to tag strings as the two types. I think there may also be some boilerplate in this PR that we could get rid of. Any comments or suggestions about how we could improve this are welcome!
As of now, I have all the tests passing on Windows locally.
Besides the path annotations, I've introduced some libraries to ease in cross-platform compatibility:
cross-env
- Handles setting environment variables inline inpackage.json
scriptsrimraf
- A cross platformrm
-like toolcross-spawn
- For executing files more easily in a cross platform way.I'm not sure how we can get
.sh
data loaders working, so for now I've simply disabled those tests on Windows. I think we can save that for another PR.In order for our snapshot tests to work, I added a
.gitattributes
file that insists on posix-style line endings everywhere. I based that off of these GitHub docs and this blogpost.To Test
yarn install
to update packagesyarn observable create
to create a new projectcd
into the project directoryyarn remove @observablehq/framework
yarn add link:..
yarn dev
,yarn build
,yarn observable login
andyarn deploy
.