Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between URL paths and file paths at the type level #881

Closed
wants to merge 32 commits into from

Conversation

mythmon
Copy link
Member

@mythmon mythmon commented Feb 20, 2024

This is an attempt to improved Windows support. The idea here is that we can use TypeScript to tag some strings as being either FilePaths or UrlPaths. From there we can make a series of helpers around the Node filesystem and path functions that are stricter about which one they take.

I'm putting this up as a draft now to get some early feedback about this approach. I think there is some inherent boilerplate here to tag strings as the two types. I think there may also be some boilerplate in this PR that we could get rid of. Any comments or suggestions about how we could improve this are welcome!

As of now, I have all the tests passing on Windows locally.

Besides the path annotations, I've introduced some libraries to ease in cross-platform compatibility:

  • cross-env - Handles setting environment variables inline in package.json scripts
  • rimraf - A cross platform rm-like tool
  • cross-spawn - For executing files more easily in a cross platform way.

I'm not sure how we can get .sh data loaders working, so for now I've simply disabled those tests on Windows. I think we can save that for another PR.

In order for our snapshot tests to work, I added a .gitattributes file that insists on posix-style line endings everywhere. I based that off of these GitHub docs and this blogpost.


To Test

  • Check out this branch
  • Run yarn install to update packages
  • Run yarn observable create to create a new project
  • cd into the project directory
  • Edit the project to use the branch instead of the npm published version of Framework:
    • yarn remove @observablehq/framework
    • yarn add link:..
  • Use Framework with commands like yarn dev, yarn build, yarn observable login and yarn deploy.

@mythmon mythmon force-pushed the mythmon/240216/branded-paths branch from 82b8c8e to 7221d92 Compare February 21, 2024 23:32
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two small questions for now

Comment on lines +28 to +35
export function unFilePath(path: FilePath | string): string {
return path as unknown as string;
}

export function unUrlPath(path: UrlPath | string): string {
return path as unknown as string;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these no-op functions are here only for the sake of typing (or, untyping), there should be a better way to do this with typescript only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that I was disappointed about was that I couldn't figure out how to safely tell TypeScript that UrlPath and FilePath were actually just type safe strings. If we remove the overrides of replace and similar in the definitions of the path types, it can understand it, but then we erase types when working on the path.

I think if we wanted to eliminate these functions, we'd have to inline the as unknown as string text everywhere we call this function. I chose not to do that because I treat as-casts as dangerous code. I'd prefer to isolate it to as few places as possible. Then those isolated places can be thoroughly vetted, either through tests or through reading the code. In this case the cast is trivial to verify, so just having this function is enough.

I agree though that I would prefer some better way to do this, like maybe a "type level function". I don't know of one though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the wrapping worries me, but I can't express why. It feels excessive or invasive maybe? I'm definitely wishing we didn't have to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern, of having a new type that is just an existing type with a new name is something that I've used in other languages. In those languages there are often nice features built into the language that help make this easier, like implicit conversion, operator overloading, and monad-style-map. TypeScript doesn't really give us any of those things, so it is all a bit clunky.

I could imagine taking out most of these wrappers now (except for the places where we need slash conversion). I don't know how we would prevent new changes to the code base from getting it wrong in the future. Maybe just test coverage and Windows in CI? That feels much less robust than TypeScript to me.

src/config.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this PR includes 3 changes that are almost independent:

  1. generalizes some system call (rimraf and cross-env)
  2. types all paths as either filePath or urlPath
  3. carries path.sep into the internal representation of file paths

(It also adds platform-specific tests—in CI, and in unit tests.)

Generalizations/abstractions seem like the right thing to do. Adding platform-specific tests is obviously needed.

For step 2, I think it's great that you're typing these paths as "filesystem path" or "url path", as it helps bring clarity to the code, where it's easy otherwise to get a bit confused between what the paths are for. It certainly helps with step 3. However in my mind it's almost an orthogonal concern.

Step 3 feels overwhelming. Our app code now sees different strings depending on the OS it runs on, and has to reason about them differently. On windows, file paths will be a string docs\sub\file.md, and on linux an macos it will be docs/sub/file.md. This representation is carried throughout, meaning that at every step we need to convert between filepath and urlpath (which always use /). In a sense, now every line of code needs to know about windows.

Could we consider an approach where we would normalize the internal representation of all paths to forward slashes, and adapt/convert only when talking to the filesystem? Basically an abstraction layer on top of fs that would use /-delimited paths as inputs and outputs, restricting the use of \ to the underlying OS calls.

An analogy that comes to mind is to use an ORM if we want to talk to different database flavors. If that is possible, it means we could do a much smaller step 3.

More importantly, when some logic needs to reason about a path, say "if this file is foo/bar, do something", it can be written directly rather than having to call conversion functions. For a concrete example: when we create the search index (in search.ts), we attribute an id to each page, which corresponds to its relative path to the docs root. That id is also used as a link[href] when we present the results in the browser. I’d rather keep this logic insulated from having to care about windows-style paths.

If that approach proves possible, we could even skip step 2 altogether for now (and do it separately later, independently of windows support).

We could also land step 1 separately.

Finally, abstracting fs might also help with unicode file names (for international text or even emoji #783).

@mythmon
Copy link
Member Author

mythmon commented Feb 22, 2024

Thanks for the comments, Fil.

Your first two summary points I agree with. I'm not sure if it makes sense to split 1 out, since the project is still quite broken even with those changes in place.

I don't quite agree about the third though. path.sep was already in the internal representation of these paths when running on Windows. However, it was there in an uncontrolled, chaotic way. When working through the test cases, I occasionally found paths like foo/bar\baz where different slashes had been mixed within one path. What this PR does is make the existing code explicitly handle that.

I like the idea of using internally normalized paths.

For me the biggest hurdle is that it would require writing wrappers around all of node:path and node:fs. However, as a part of this implementation I ended up writing writing these kind of wrappers anyways in the form of the branded path and fs modules.

The other concern I have is about the appropriateness of abstraction. Framework has to deal with both files and HTTP very directly. There are a few places where we need to understand the differences between win32 and posix paths. I'm not sure if that is fundamental complexity, or if we would prefer to work in an abstracted space where we don't have to worry about it.

Given that, I plan to continue with this "slash-aware" version, and get it passing in CI. After I'm done with that, I can explore the idea of internally normalized paths, if someone else doesn't beat me to it.

Regarding the FilePath/UrlPath annotations: Given that the work is fundamentally done in that area, I think it would make sense to keep it. Even when we aren't mixing slashes, it made the intent of some of the code clearer for me.

@Fil
Copy link
Contributor

Fil commented Feb 22, 2024

path.sep was already in the internal representation

Claro. (I did not mean to imply that it was working before.) My question is whether we can control it at the file system interface, and continue ignoring it in the app code (but, confident in that it is safe and controlled).

Comment on lines -59 to +63
const time = Date.UTC(2023, 11, 1) / 1000;
await utimes(loader.path, atime, time);
// remove the cache set by another test (unless we it.only this test).
try {
await unlink("test/.observablehq/cache/dataloaders/data1.txt");
} catch {
// ignore;
}
// populate the cache (missing)
await loader.load(outputEffects);
// run again (fresh)
await loader.load(outputEffects);
// touch the loader
await utimes(loader.path, atime, Date.now() + 100);
// run it with useStale=true (using stale)
const loader2 = Loader.find("test", "dataloaders/data1.txt", {useStale: true})!;
await loader2.load(outputEffects);
// run it with useStale=false (stale)
await loader.load(outputEffects);
// revert the loader to its original mtime
await utimes(loader.path, atime, mtime);
assert.deepStrictEqual(
// eslint-disable-next-line no-control-regex
out.map((l) => l.replaceAll(/\x1b\[[0-9]+m/g, "")),
[
"load test/dataloaders/data1.txt.js → ",
"[missing] ",
"load test/dataloaders/data1.txt.js → ",
"[fresh] ",
"load test/dataloaders/data1.txt.js → ",
"[using stale] ",
"load test/dataloaders/data1.txt.js → ",
"[stale] "
]
);
});
};
const loader = Loader.find(FilePath("test"), FilePath("dataloaders/data1.txt"))!;
// save the loader times.
const {atime, mtime} = await stat(loader.path);
// set the loader mtime to Dec. 1st, 2023.
const time = new Date("2023-11-01");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this from "seconds since epoch" to a Date because non-date values throw an error on Windows: nodejs/node#5561 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I converted this file to use LF endings instead of CRLF. If modifying this file isn't ok, I could add an exception to .gitattributes for it.

@mythmon mythmon marked this pull request as ready for review February 23, 2024 00:00
@mythmon
Copy link
Member Author

mythmon commented Feb 23, 2024

With this branch, I was able to create, preview, build, login, and deploy in Windows. It's a big change, but I think most of the changes only affect their immediate area. It works for me, and it works on CI, though I would certainly appreciate more testing on this, both on Windows machines as well as Non-Windows.

I'm going to start another PR based on this to explore Fil's idea of using forward slashes everywhere internally, and only converting to OS-correct paths at the last moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can sort of use shebangs on Windows, but we can't do complicated things like pass arguments with -S. I think that this new version is equivalent to what we had before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a part of my .gitattributes changes, I converted the line endings of this file (and the next one) from \r\n to \n. I can put them back and update .gitattributes if it matters.

if (!color) color = (s) => s;
if (!color) color = (s) => `${s}`;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lets us directly print branded strings, since they implement { toString(): string }.

Comment on lines 64 to 67
cwd() {
return join("/", "opt", "projects", "acme-bi");
// it is an important detail that this is not inside the home dir
return isWindows ? fileJoin("D:", "Projects", "acme-bi") : fileJoin("/", "opt", "projects", "acme-bi");
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fil this is one thing I'm not sure how to handle in a "normalized internal paths" approach. Would we use hybrid Windows/Posix paths, like C:/Projects/acme-bi?

Comment on lines +28 to +35
export function unFilePath(path: FilePath | string): string {
return path as unknown as string;
}

export function unUrlPath(path: UrlPath | string): string {
return path as unknown as string;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern, of having a new type that is just an existing type with a new name is something that I've used in other languages. In those languages there are often nice features built into the language that help make this easier, like implicit conversion, operator overloading, and monad-style-map. TypeScript doesn't really give us any of those things, so it is all a bit clunky.

I could imagine taking out most of these wrappers now (except for the places where we need slash conversion). I don't know how we would prevent new changes to the code base from getting it wrong in the future. Maybe just test coverage and Windows in CI? That feels much less robust than TypeScript to me.

@mythmon mythmon mentioned this pull request Feb 23, 2024
@GordonSmith
Copy link

GordonSmith commented Feb 24, 2024

Just a lurker here - and don't want to get into the whole functional v OOP debate - but wouldn't this change be a lot cleaner with a FilePath/URLPath Classes (or more use of the Node URL Class)?

@mythmon
Copy link
Member Author

mythmon commented Feb 24, 2024

@GordonSmith thanks for the comments. I'm not sure if a class based approach would help. The original intent was to do this entirely at the TypeScript, except where conversion was needed. It would be nice if this had as little runtime overhead as possible. I'm not sure how well that worked out though. I think that a class would only be a change of "spelling" without really changing the overall cleanliness.

I'm working on an alternative approach that fill Fil recommended that doesn't use any wrapping at all. I think it is promising, but we'll see once I get it working.

As for the URL class, as far as I know that requires full, absolute URLs, including hostname. I don't think that would work well here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should disable autocrlf.

* text=false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to remove our runtime dependency on tsx. #850 Also, we’ll need the same fix in the @observablehq/create repo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let’s leave these as-is to minimize this PR.

@@ -92,6 +94,7 @@
"c8": "^8.0.1",
"chai": "^4.3.10",
"chai-http": "^4.4.0",
"cross-env": "^7.0.3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is in dependencies already, we shouldn’t also need it in devDependencies.

Suggested change
"cross-env": "^7.0.3",

@mythmon
Copy link
Member Author

mythmon commented Mar 2, 2024

Closing in favor of #944.

@mythmon mythmon closed this Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants