Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For duplicate source files of the same package, make one redirect to the other #16274

Merged
19 commits merged into from
Aug 9, 2017

Conversation

ghost
Copy link

@ghost ghost commented Jun 5, 2017

Fixes #6496
When we resolve modules, we will now track what package name and version we loaded them from based on package.json.
If we see another package with an identical name and version, we will create a source file that redirects to the previously-created source file for that package.
We will still load in the underlying source file so that services can detect changes, but program.getSourceFileByPath will return the redirect, so imports to two different locations will return source files with the same content, meaning that any classes with privates will have the same identity.

@ghost ghost force-pushed the package-id branch from 211a2cf to 60b9aa3 Compare June 6, 2017 00:11
@ghost
Copy link
Author

ghost commented Jun 6, 2017

On our baselines performance is the same, because they don't have duplicate packages and checking for them costs almost nothing.
I generated a sample project designed to have lots of duplication using the following script:

// @ts-check

const { existsSync,mkdirSync, writeFileSync } = require("fs");
const maxL = 50;
const maxX = 100;

main();

/** @return {void} */
function main() {
    writeFileSync("tsconfig.json", JSON.stringify({ "compilerOptions": { "noEmit": true } }));
    writeFileSync("src/index.ts", genIndex());

    ensureDir("node_modules");
    for (let li = 0; li < maxL; li++) {
        const ldir = `node_modules/l${li}`;
        ensureDir(ldir);
        writeFileSync(`${ldir}/index.d.ts`, genLIndex());

        ensureDir(`${ldir}/node_modules`);
        for (let xi = 0; xi < maxX; xi++) {
            const xdir = `${ldir}/node_modules/x${xi}`;
            ensureDir(xdir);
            writeFileSync(`${xdir}/index.d.ts`, `export class X${xi} { private x; }`);
            writeFileSync(`${xdir}/package.json`, JSON.stringify({ name: `x${xi}`, version: "1", }));
        }
    }
}

/** @return {string} */
function genIndex() {
    const lines = [];
    for (let li = 0; li < maxL; li++) {
        const imports = [];
        for (let xi = 0; xi < maxX; xi++) {
            imports.push(`X${xi} as L${li}X${xi}`);
        }
        lines.push(`import { ${imports.join(", ")} } from "l${li}";`);
    }

    lines.push(`let x = L0X0;`);
    for (let li = 1; li < maxL; li++) {
        lines.push(`x = L${li}X0;`);
    }

    return lines.join("\n");
}

/** @return {number} */
function genLIndex() {
    const lines = [];
    for (let xi = 0; xi < maxX; xi++) {
        lines.push(`export { X${xi} } from "x${xi}"`);
    }
    return lines.join("\n");
}

/**
 * @param {string} dirpath
 * @return {void}
 */
function ensureDir(dirPath) {
    if (existsSync(dirPath)) return;
    mkdirSync(dirPath);
}

I then compared tsc --diagnostics on master vs on this branch.

  • In master, there were 28004 types generated, and only 8404 on this branch, due to the deduplication.
  • Parse time and bind time are nearly identical. (The duplicate packages still get parsed.)
  • Memory used: 14% less (137201K -> 118066K)
  • Check time: 18.7% less (Fewer files checked, and also we there are no compile errors in this branch.)

After that I tested using Object.assign({}, redirectTo) instead of Object.create(redirectTo). It increased memory use by 28.5% (118065.75 -> 151612.5) and didn't affect total time. Based on this I would recommend continuing to use Object.create.

@mhegazy
Copy link
Contributor

mhegazy commented Jun 7, 2017

We also need to add some tests to src\harness\unittests\reuseProgramStructure.ts to ensure that we are not reusing after edits in cases like:

  • redirect was there but target changed
  • redirect was there but underlying changed
  • duplicate packageid change cause program structure invalidation

const newSourceFile = host.getSourceFileByPath
? host.getSourceFileByPath(oldSourceFile.fileName, oldSourceFile.path, options.target)
: host.getSourceFile(oldSourceFile.fileName, options.target);

const packageId = oldProgram.getPackageIdOfSourceFile(oldSourceFile);
if (packageId) sourceFileToPackageId.set(newSourceFile.path, packageId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would move this to the end, when we are sure that we are completely reusing the program.

@@ -765,11 +775,42 @@ namespace ts {
const modifiedSourceFiles: { oldFile: SourceFile, newFile: SourceFile }[] = [];
oldProgram.structureIsReused = StructureIsReused.Completely;

for (const oldSourceFile of oldProgram.getSourceFiles()) {
const oldSourceFiles = oldProgram.getSourceFiles();
for (const oldSourceFile of oldSourceFiles) {
const newSourceFile = host.getSourceFileByPath
? host.getSourceFileByPath(oldSourceFile.fileName, oldSourceFile.path, options.target)
: host.getSourceFile(oldSourceFile.fileName, options.target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should assert here that the sourceFile we got from the host never has a redirect set.

continue;
}
}
else if (oldProgram.sourceFileIsRedirectedTo(oldSourceFile)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSourceFileTargetOfRedirect ?

return oldProgram.structureIsReused = StructureIsReused.Not;
}
else {
sourceFileIsRedirectedToSet.set(oldSourceFile.path, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to move this to the end, and just copy them all instead of one by one.

sourceFileIsRedirectedToSet.set(oldSourceFile.path, true);
filePaths.push(oldSourceFile.path);
newSourceFiles.push(oldSourceFile);
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would let this fall through, and then we will add it to filePaths, and newSoruceFiles

if (packageId) {
// If this has a package id, and some other source file has the same package id, they are candidates to become redirects.
// In that case we must rebuild the program.
const hasDuplicate = oldSourceFiles.some(o => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider putting this in a map.

@@ -1517,8 +1574,28 @@ namespace ts {
}
}

function createRedirectSourceFile(redirectTo: SourceFile, underlying: SourceFile, fileName: string, path: Path): SourceFile {
sourceFileIsRedirectedToSet.set(redirectTo.path, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider putting all the map setting in this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, it would be better to do all of that outside since it makes more sense in the context of findSourceFile.

@ghost
Copy link
Author

ghost commented Jun 7, 2017

@mhegazy Comments should all be resolved now.

@ghost ghost requested a review from amcasey July 10, 2017 21:07
@amcasey
Copy link
Member

amcasey commented Jul 11, 2017

It seems like another possible approach would be to dirty projects referencing a file sharing the same package ID as a modified file when the file watcher for that file fires. Would that allow greater structure reuse?

@ghost ghost requested a review from aozgaa July 13, 2017 20:41
Copy link
Member

@amcasey amcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, comments stay pending until I "submit".

@@ -1551,8 +1597,26 @@ namespace ts {
}
}

function createRedirectSourceFile(redirectTo: SourceFile, underlying: SourceFile, fileName: string, path: Path): SourceFile {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some comments about the various SourceFiles involved? I think redirectTo is pointed to by the link and the result does the pointing. I'm not sure what underlying is. Maybe it's the file we would use if we weren't redirecting? If so, maybe "redirectionTarget" and "unredirected" would be clearer?

// This lets us know if the underlying file has changed. If it has we should break the redirect.
if (newSourceFile !== oldSourceFile.redirect.underlying) {
// Underlying file has changed. Might not redirect anymore. Must rebuild program.
return oldProgram.structureIsReused = StructureIsReused.Not;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SafeModules isn't conservative enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, because we haven't set the dirty bit on the affected projects? Couldn't we? We're watching the files anyway, aren't we?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, we use StructureIsReused.Not in situations where files are added or removed, which would include the case of redirects being broken or created.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our offline discussion, I agree that this is an appropriate way to handle changes.

return oldProgram.structureIsReused = StructureIsReused.Not;
}
}
else if (oldProgram.isSourceFileTargetOfRedirect.has(oldSourceFile.path)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find it strange to call a set "is_".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

@ghost ghost force-pushed the package-id branch from 8bf33cb to 09e1c92 Compare July 13, 2017 20:57
Copy link
Member

@amcasey amcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I don't know much about module resolution.

@@ -2330,12 +2330,12 @@ namespace ts {
*/
/* @internal */ redirect?: {
/** Source file this redirects to. */
readonly redirectTo: SourceFile,
readonly redirectTarget: SourceFile,
/**
* Source file for the duplicate package. This will not be used by the Program,
* but we need to keep this around so we can watch for changes in underlying.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to update the comments that still refer to "underlying".

}
}
return extensions;
return deduplicate([...allSupportedExtensions, ...extraFileExtensions.map(e => e.extension)]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own edification, does this have some advantage over calling concat?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #17076

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cute.

/** Result of trying to resolve a module at a file. Needs to have 'packageId' added later. */
interface PathAndExtension {
path: string;
// (Use a different name than `extension` to make sure Resolved isn't assignable to PathAndExtension.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that they not be assignable to each other? Is it just to avoid confusion or something more?

Copy link
Author

@ghost ghost Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents us from computing packageId and then not using it. See #12936

@@ -891,13 +915,18 @@ namespace ts {
// Even if extensions is DtsOnly, we can still look up a .ts file as a result of package.json "types"
const nextExtensions = extensions === Extensions.DtsOnly ? Extensions.TypeScript : extensions;
// Don't do package.json lookup recursively, because Node.js' package lookup doesn't.
return nodeLoadModuleByRelativeName(nextExtensions, file, failedLookupLocations, onlyRecordFailures, state, /*considerPackageJson*/ false);
const result = nodeLoadModuleByRelativeName(nextExtensions, file, failedLookupLocations, onlyRecordFailures, state, /*considerPackageJson*/ false);
if (result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What gets returned otherwise? Is return undefined implied?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, JS functions always return undefined if they don't reach a return statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I like to be explicit, but I'm fine if this is idiomatic JS.

// they might become redirects. So we must rebuild the program.
const prevKind = seenPackageNames.get(packageName);
const newKind = oldSourceFile === newSourceFile ? SeenPackageName.Exists : SeenPackageName.Modified;
if (prevKind !== undefined && newKind === SeenPackageName.Modified || prevKind === SeenPackageName.Modified) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would parenthesize the && subexpression.

(oldResolution.packageId === newResolution.packageId || oldResolution.packageId && newResolution.packageId && packageIdIsEqual(oldResolution.packageId, newResolution.packageId));
}

function packageIdIsEqual(a: PackageId, b: PackageId): boolean {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly this is following a precedent, but why not have packageIdIsEqual handle undefined?

getScriptFileNames(): string[] { return this.getFilenames(); }
getScriptFileNames(): string[] {
return this.getFilenames().filter(f =>
ts.isAnySupportedFileExtension(f) !== undefined);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why check for undefined? Doesn't it return a boolean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! We need --strictNullChecks...


it("Target changes -> redirect broken", () => {
const program_1 = createRedirectProgram();
assert.lengthOf(program_1.getSemanticDiagnostics(), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would deepEquals provide more useful output in the event of a failure?

assert.lengthOf(program_2.getSemanticDiagnostics(), 1);
});

it("Previously duplicate packages -> program structure not reused", () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed it, but is there a case where redirects are used and the structure is reused?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

});

function updateProgramText(files: ReadonlyArray<NamedSourceText>, fileName: string, newProgramText: string) {
const file = find(files, f => f.name === fileName)!;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own edification, what does the ! mean?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserts (to the type-checker) that a value is not null or undefined. Has no effect as we don't have --strictNullChecks turned on, but I like to pretend. 😃

@amcasey
Copy link
Member

amcasey commented Jul 13, 2017

What diagnostics, if any, would be reported if the duplicated module contained an error?

@ghost
Copy link
Author

ghost commented Jul 14, 2017

See the test duplicatePackage_withErrors.

@amcasey
Copy link
Member

amcasey commented Jul 14, 2017

Thanks for making those changes!

let fileChanged: boolean;
if (oldSourceFile.redirect) {
// We got `newSourceFile` by path, so it is actually for the unredirected file.
// This lets us know if the unredirected file has changed. If it has we should break the redirect.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this comment inside of the inner if

Copy link
Contributor

@aozgaa aozgaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple nits after our review yesterday.


export function getSupportedExtensions(options?: CompilerOptions, extraFileExtensions?: ReadonlyArray<JsFileExtensionInfo>): ReadonlyArray<string> {
const needAllExtensions = options && options.allowJs;
if (!extraFileExtensions || extraFileExtensions.length === 0 || !needAllExtensions) {
return needAllExtensions ? allSupportedExtensions : supportedTypeScriptExtensions;
// TODO: Return a ReadonlyArray<string> from this function to avoid casts. https://github.com/Microsoft/TypeScript/issues/16312
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Return a ReadonlyArray<string> to avoid casts. see https://..."

@@ -109,7 +109,8 @@ namespace ts {
function createTestCompilerHost(texts: NamedSourceText[], target: ScriptTarget, oldProgram?: ProgramWithSourceTexts): TestCompilerHost {
const files = arrayToMap(texts, t => t.name, t => {
if (oldProgram) {
const oldFile = <SourceFileWithText>oldProgram.getSourceFile(t.name);
let oldFile = <SourceFileWithText>oldProgram.getSourceFile(t.name);
if (oldFile && oldFile.redirect) oldFile = oldFile.redirect.unredirected;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use braces.


const program_2 = updateRedirectProgram(program_1, files => {
updateProgramText(files, bxIndex, "export default class X { private x: number; private y: number; }");
updateProgramText(files, bxPackage, JSON.stringify('{ name: "x", version: "1.2.4" }'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you pass a string to JSON.stringify? This appears inconsistent across usages.

});

function updateProgramText(files: ReadonlyArray<NamedSourceText>, fileName: string, newProgramText: string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this declaration next to updateProgram?

@@ -93,7 +105,8 @@ namespace ts {
}
}

function readJson(path: string, host: ModuleResolutionHost): { typings?: string, types?: string, main?: string } {
interface PackageJson { name?: string; version?: string; typings?: string; types?: string; main?: string; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just an implementation detail (as it appears to be for me), then move this declaration up with the others at the top of this file. Otherwise, move it to types.ts.

Please add newlines.

@ghost ghost merged commit 37b20ee into master Aug 9, 2017
@ghost ghost deleted the package-id branch August 9, 2017 21:39
uniqueiniquity pushed a commit to uniqueiniquity/TypeScript that referenced this pull request Aug 9, 2017
…the other (microsoft#16274)

* For duplicate source files of the same package, make one redirect to the other

* Add reuseProgramStructure tests

* Copy `sourceFileToPackageId` and `isSourceFileTargetOfRedirect` only if we completely reuse old structure

* Use fallthrough instead of early exit from loop

* Use a set to efficiently detect duplicate package names

* Move map setting outside of createRedirectSourceFile

* Correctly handle seenPackageNames set

* sourceFileToPackageId -> sourceFileToPackageName

* Renames

* Respond to PR comments

* Fix bug where `oldSourceFile !== newSourceFile` because oldSourceFile was a redirect

* Clean up redirectInfo

* Respond to PR comments
@ghost ghost mentioned this pull request Apr 10, 2018
@microsoft microsoft locked and limited conversation to collaborators Jun 14, 2018
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate type declarations with npm link
4 participants