The source code for the TypeScript scanner is located entirely in scanner.ts
. Scanner is controlled internally by the Parser
to convert the source code to an AST. Here is what the desired outcome is.
SourceCode ~~ scanner ~~> Token Stream ~~ parser ~~> AST
There is a singleton scanner
created in parser.ts
to avoid the cost of creating scanners over and over again. This scanner is then primed by the parser on demand using the initializeState
function.
Here is a simplied version of the actual code in the parser that you can run demonstrating this concept:
code/compiler/scanner/runScanner.ts
import * as ts from "ntypescript";
// TypeScript has a singleton scanner
const scanner = ts.createScanner(ts.ScriptTarget.Latest, /*skipTrivia*/ true);
// That is initialized using a function `initializeState` similar to
function initializeState(text: string) {
scanner.setText(text);
scanner.setOnError((message: ts.DiagnosticMessage, length: number) => {
console.error(message);
});
scanner.setScriptTarget(ts.ScriptTarget.ES5);
scanner.setLanguageVariant(ts.LanguageVariant.Standard);
}
// Sample usage
initializeState(`
var foo = 123;
`.trim());
// Start the scanning
var token = scanner.scan();
while (token != ts.SyntaxKind.EndOfFileToken) {
console.log(ts.formatSyntaxKind(token));
token = scanner.scan();
}
This will print out the following :
VarKeyword
Identifier
FirstAssignment
FirstLiteralToken
SemicolonToken
After you call scan
the scanner updates its local state (position in the scan, current token details etc). The scanner provides a bunch of utility functions to get the current scanner state. In the below sample we create a scanner and then use it to identify the tokens as well as their positions in the code.
code/compiler/scanner/runScannerWithPosition.ts
// Sample usage
initializeState(`
var foo = 123;
`.trim());
// Start the scanning
var token = scanner.scan();
while (token != ts.SyntaxKind.EndOfFileToken) {
let currentToken = ts.formatSyntaxKind(token);
let tokenStart = scanner.getStartPos();
token = scanner.scan();
let tokenEnd = scanner.getStartPos();
console.log(currentToken, tokenStart, tokenEnd);
}
This will print out the following:
VarKeyword 0 3
Identifier 3 7
FirstAssignment 7 9
FirstLiteralToken 9 13
SemicolonToken 13 14
Even though the TypeScript parser has a singleton scanner you can create a standalone scanner using createScanner
and use its setText
/setTextPos
to scan at different points in a file for your amusement.