Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is ts.sys.readDirectory even supposed to do? #46788

Open
andrewbranch opened this issue Nov 11, 2021 · 3 comments
Open

What is ts.sys.readDirectory even supposed to do? #46788

andrewbranch opened this issue Nov 11, 2021 · 3 comments
Labels
Needs Investigation This issue needs a team member to investigate its status.

Comments

@andrewbranch
Copy link
Member

Following up on #46673 (comment), I started a more thorough investigation of ts.matchFiles (which is just ts.sys.readDirectory without the system host) and discovered that even its long-standing behavior prior to 4.4 is confusing and probably undesigned. Consider this file system:

projects/
├─ a/
│  ├─ subfolder/
│  │  └─ 1.ts
│  └─ 2.ts
└─ b/
   └─ 3.ts

Things tend to work as expected if you do three things: (1) use relative paths, (2) read the same directory as your CWD, and (3) don’t use any includes that access a sibling folder with "../". If you start doing those things, especially in combination, the behavior is a bit unpredictable.

Let’s start with a simple example where I think things are working as intended:

> process.cwd()
'/projects/a'

> ts.sys.readDirectory(".", [".ts"], /*excludes*/ undefined, /*includes*/ ["**/*"])
[
  '2.ts',
  'subfolder/1.ts'
]

Note that the entries are relative to our CWD / the directory we’re reading (which one? We don’t know yet because they’re the same), but without the leading ./. This seems probably intentional because it matches the format of fs.readdir. Also, replacing the first argument with the empty string yields the same results.

Ok, let’s see what happens if we include something outside of the directory we’re reading:

> process.cwd()
'/projects/a'

> ts.sys.readDirectory(".", [".ts"], /*excludes*/ undefined, /*includes*/ ["**/*", "../b/**/*"])
[
  '2.ts',
  'subfolder/1.ts'
]

No change. This seems to make sense with the name readDirectory. Now let’s find out whether our results are relative to our CWD or to the directory we’re reading. We’ll cd up a level first:

> process.cwd()
'/projects'

> ts.sys.readDirectory("a", [".ts"], /*excludes*/ undefined, /*includes*/ ["**/*"])
[
  'a/2.ts',
  'a/subfolder/1.ts'
]

Interesting—they’re relative to our CWD, not the directory we’re reading, which is a divergence from fs.readdir. Does this mean we can now get includes outside of the directory we’re reading?

> ts.sys.readDirectory("a", [".ts"], /*excludes*/ undefined, /*includes*/ ["**/*", "../b/**/*"])
[
  'a/2.ts',
  'a/subfolder/1.ts',
  'b/3.ts'
]

Yes it does! readDirectory can return results outside of the target directory, but not outside the CWD. This seems very odd to me. Let’s cd back into a and try absolute paths:

> process.cwd()
'/projects/a'

> ts.sys.readDirectory("/projects/a", [".ts"], /*excludes*/ undefined, /*includes*/ ["**/*"])
[
  '/projects/a/2.ts',
  '/projects/a/subfolder/1.ts'
]

Absolute in, absolute out. Quite possibly an accident? Now for the final test, an absolute input path with an includes outside the CWD:

> ts.sys.readDirectory("/projects/a", [".ts"], /*excludes*/ undefined, /*includes*/ ["**/*", "../b/**/*"])
[
  '/projects/a/2.ts',
  '/projects/a/subfolder/1.ts',
  '/projects/b/3.ts'
]

👀

@andrewbranch andrewbranch added the Needs Investigation This issue needs a team member to investigate its status. label Nov 11, 2021
@andrewbranch
Copy link
Member Author

@rbuckton in #46673, @weswigham said you might have some general context on this.

@rbuckton
Copy link
Member

Unfortunately matchFiles's inconsistency is consistent with our other path operations in the compiler and how Program expected to read files.

For one, it assumes the currentDirectory argument is an absolute path. It doesn't validate this, however, as there are quite a few of places in our tests where we just return "" for getCurrentDirectory.

Each result that it matches is pre-combined with path using our existing combinePaths function:

  • combinePaths(".", "2.ts") -> "2.ts"
  • combinePaths("a", "2.ts") -> "a/2.ts"
  • combinePaths("/projects/a", "2.ts") -> "/projects/a/2.ts"

Finally, readDirectory was originally just readDirectory(path: string) with no support for globs. The include/exclude functionality was added later in a way intended to be compatible with existing API consumers and our own internal use cases. We didn't return the absolute path (derived from the current directory), because the results of this function were consumed by upstream functions that expected the results in a specific format (see ParsedCommandLine.fileNames).

Essentially, matchFiles (and readDirectory) were purpose-built to fit into a specific set of use cases. This is one of the reasons the API hasn't been made public despite requests from the community (#34545, #13793).

@fatcerberus
Copy link

fatcerberus commented Nov 13, 2021

For what it's worth I was rather surprised that I had to do all this in my readDirectory implementation just to even get close to tsc's behavior:

readDirectory(directoryName, extensions, excludePaths, includePaths, depth)
{
	return from(new DirectoryStream(directoryName, { recursive: true }))
		.where(it => !it.isDirectory)
		.where(it => extensions.includes(FS.extensionOf(it.fullPath)))
		.where(it => {
			// implicitly exclude node_modules, etc.
			return !from(TS.commonPackageFolders)
				.any(dirName => it.fullPath.includes(`${dirName}/`) && !it.fullPath.includes("@types/"));
		})
		.where(it => !FS.match(it.fullPath, excludePaths))
		.where(it => FS.match(it.fullPath, includePaths))
		.select(it => it.fullPath)
		.toArray();
}

I expected to just have to return a dumb directory listing, but it seems that readDirectory has a lot more responsibilities than its name would suggest. In particular, having to manually exclude node_modules (but include @types/!) was quite surprising, and I only figured that out through trial and error and studying tsc's source code.

The compiler API really needs to be documented better. I would take a stab, but even I'm not confident I have all the intricacies right and I don't want to mislead people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Investigation This issue needs a team member to investigate its status.
Projects
None yet
Development

No branches or pull requests

3 participants