Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docgen: remove doctrine #18045

Closed
oandregal opened this issue Oct 21, 2019 · 10 comments · Fixed by #28615
Closed

docgen: remove doctrine #18045

oandregal opened this issue Oct 21, 2019 · 10 comments · Fixed by #28615
Labels
Needs Dev Ready for, and needs developer efforts [Tool] Docgen /packages/docgen

Comments

@oandregal
Copy link
Member

doctrine is the JSDoc parser we use in docgen to get structured information out of the JSDoc comments. It was created and maintained by the ESLint people, although it's no longer actively developed. We should migrate out of doctrine for that reason.

The suggested replacement (eslint-plugin-jsoc) uses a different JSDoc parser with support for many other syntaxes (typescript, etc). We should look into trying jsdoctypeparser or any other that's actively developed.

@oandregal oandregal added the [Tool] Docgen /packages/docgen label Oct 21, 2019
@oandregal
Copy link
Member Author

oandregal commented Oct 21, 2019

Some notes:

The plan would be to update the get-jsdoc-from-token exported function to use whatever doctrine alternative we settle on.

@oandregal
Copy link
Member Author

oandregal commented Oct 21, 2019

Relevant to this is the research I did at the time on doctrine alternatives (see comment on Feb 22): #13329 (comment)

@gziolo gziolo added the Needs Dev Ready for, and needs developer efforts label Oct 22, 2019
@aduth
Copy link
Member

aduth commented Nov 30, 2019

I started exploring this, initially aiming to use comment-parser, seeing as it's the same in use by eslint-plugin-jsdoc, and on the assumption that we don't really need the type parsing (vs. just the verbatim name). It worked okay, but unlike Doctrine, there's no built-in understanding of "typed" or "named" tags, so these had to be manually adapted from Doctrine. Even then, however, there's some data loss that occurs when trying to reconstruct unnamed tags (like an @example tag, losing whitespace).

This is about as far as I got with the implementation, but while it technically passes the unit tests, it fails in creating identical documentation for many of the packages for the above reason:

(Click to expand get-jsdoc-from-token.js source)
/**
 * External dependencies.
 */
const parse = require( 'comment-parser' );

/**
 * Internal dependencies.
 */
const getLeadingComments = require( './get-leading-comments' );

/**
 * Set of tags which can assign a name.
 *
 * @see https://github.com/eslint/doctrine/blob/0e8eba7/lib/doctrine.js#L64-L66
 *
 * @type {Set}
 */
const NAMED_TAGS = new Set( [
	'param',
	'argument',
	'arg',
	'property',
	'prop',
	'alias',
	'this',
	'mixes',
	'requires',
	'const',
	'constant',
] );

/**
 * Set of tags which can assume a type.
 *
 * @see https://github.com/eslint/doctrine/blob/0e8eba7/lib/doctrine.js#L85-L90
 *
 * @type {Set}
 */
const TYPED_TAGS = new Set( [
	'throws',
	'const',
	'constant',
	'property',
	'prop',
	'namespace',
	'member',
	'var',
	'module',
	'constructor',
	'class',
	'extends',
	'augments',
	'public',
	'private',
	'protected',
	'return',
	'returns',
	'define',
	'enum',
	'implements',
	'this',
	'type',
	'typedef',
	'param',
	'argument',
	'arg',
] );

/**
 * @typedef {Object} DocgenJSDocTag
 *
 * @property {string}  title       Tag title.
 * @property {?string} description Tag description. Null if empty.
 * @property {string=} type        Tag type, if relevant for tag name.
 * @property {string=} name        Tag name, if of a named tag title.
 */

/**
 * @typedef {Object} DocgenJSDoc
 *
 * @property {string}           description Comment description.
 * @property {DocgenJSDocTag[]} tags        Comment tags.
 */

/**
 * Function that takes an Espree token and returns a object representing the
 * leading JSDoc comment of the token, if any.
 *
 * @param {Object} token Espree token.
 *
 * @return {DocgenJSDoc|undefined} Object representing the JSDoc comment.
 */
module.exports = function( token ) {
	const comments = getLeadingComments( token );
	if ( ! comments ) {
		return;
	}

	const parsed = parse( '/*' + comments + '*/' );
	if ( ! parsed.length ) {
		return;
	}

	const { description, tags } = parsed[ 0 ];

	return {
		description,
		tags: tags.map( ( tag ) => {
			const isNamedTag = NAMED_TAGS.has( tag.tag );
			const isTypedTag = TYPED_TAGS.has( tag.tag );

			const mappedTag = {
				title: tag.tag,
				description: tag.description,
			};

			if ( isNamedTag ) {
				mappedTag.name = tag.name;
			} else {
				mappedTag.description = [
					tag.name.trim(),
					mappedTag.description,
				].filter( Boolean ).join( ' ' );
			}

			if ( ! mappedTag.description ) {
				mappedTag.description = null;
			}

			if ( isTypedTag ) {
				mappedTag.type = tag.type;
			}

			return mappedTag;
		} ),
	};
};

Curious to align to TypeScript's JSDoc tooling, given that we want to embrace some of this typing syntax (#18838), I discovered theirs is an entirely home-grown solution (some of the source). It makes me wonder though, seeing as we already have to parse the source to find comments and export statements, whether it might be worth using TypeScript both for the parsing of the source and for the parsing of the JSDoc. I haven't yet followed this idea to see how feasible it would be.

@aduth
Copy link
Member

aduth commented Dec 3, 2019

It makes me wonder though, seeing as we already have to parse the source to find comments and export statements, whether it might be worth using TypeScript both for the parsing of the source and for the parsing of the JSDoc.

Relevant documentation: https://github.com/microsoft/TypeScript/wiki/Using-the-Compiler-API

Notable here is to feed a source string into ts.createSourceFile:

var ts = require( 'typescript' );
ts.createSourceFile( '', '/**\n * Function description.\n *\n * @param {string} myVariable My variable description.\n */', ts.ScriptTarget.ES2015 );

ASTExplorer also has a TypeScript mode for exploring the AST: https://astexplorer.net/#/gist/34e05c635ea8a8904a4d557afc9dccc8/04f2263c5748d52b7a523809ace596e7f6e31e5e

It's includes full parsing of the JSDoc, including types.

@aduth
Copy link
Member

aduth commented Jan 9, 2020

Noting: While the TypeScript parser does parse the complex types, we will need to make sure we can print those in a meaningful way. Rolling something custom here could become very hairy very quickly, given the full scope of supported types.

Consider: https://astexplorer.net/#/gist/4a7169d71b3f44e840f7f57bfbaee5e6/1d0dbdd8e0d27568388e9a5a980402e6d4f1a5ba

VSCode actually evaluates this very nicely:

image

Ideally we can reuse this for our own documented output.

I've found a little bit of a lead in the TypeScript Compiler API documentation, though the documentation is far from exhaustive.

Specifically, we may be able to use parts of the "Using the Type Checker" section:

https://github.com/microsoft/TypeScript/wiki/Using-the-Compiler-API#using-the-type-checker

let program = ts.createProgram(fileNames, options);
// ...
let checker = program.getTypeChecker();
// ...
checker.typeToString( checker.getTypeOfSymbolAtLocation( /* ... */ ) );

@aduth
Copy link
Member

aduth commented Jan 9, 2020

Even without a complete substitution of Doctrine, we probably need to find a way to address some of the current problems with the types printing sooner than later. Specifically, we are seeing many instances that docgen will generate a type of null for TypeScript-specific complex types. To me, this is worse than if it simply didn't document the type at all, since it is both incorrect and misleading.

For example, see also: #19520

@aduth
Copy link
Member

aduth commented Jan 10, 2020

Even without a complete substitution of Doctrine, we probably need to find a way to address some of the current problems with the types printing sooner than later. Specifically, we are seeing many instances that docgen will generate a type of null for TypeScript-specific complex types. To me, this is worse than if it simply didn't document the type at all, since it is both incorrect and misleading.

Interim fix proposed at #19571

@aduth
Copy link
Member

aduth commented Jan 24, 2020

One other possible advantage of using the TypeScript compiler is more in how we already use a different parser (Espree) than that which runs over our own code (Babel). This hasn't been much of an issue thus far, but I just encountered my first issue where a particularly new syntax that we we support through Babel (ES2019 optional catch binding) will throw an error when running npm run docs:build:

export function isURL( url ) {
	try {
		new URL( url );
		return true;
	} catch {
		return false;
	}
}
 ⇒ npm run docs:build

> gutenberg@7.3.0 docs:build /Users/andrew/Documents/Code/gutenberg
> node ./docs/tool/index.js && node ./bin/api-docs/update-readmes.js

url 
SyntaxError: Unexpected token {

Using TypeScript doesn't really address the underlying issue that we have different parsers being used, but we could perhaps assume the TypeScript would be more up-to-date. Alternatively (and as an interim solution), it might just be a matter of keeping the espree dependency up to date, assuming that it's been updated for newer versions of the specification.

@aduth
Copy link
Member

aduth commented Jan 27, 2020

Alternatively (and as an interim solution), it might just be a matter of keeping the espree dependency up to date, assuming that it's been updated for newer versions of the specification.

This is unfortunately not as simple as it might seem, since one of the breaking changes introduced in espree@5.0.0 is to remove the attachComment feature, which is pretty critical to how docgen works (or at least is currently implemented). Thus, upgrading to the latest version would require a pretty significant refactor to the code, one where we'd likely emulate how comment attachment worked previously, using the comment (and perhaps tokens) option(s) to infer where a comment occurs in relation to an exported member. I started down this path on Friday, but it seems significant enough to warrant whether it's worth the effort, considering if the alternative to Doctrine would bring with it a different parser implementation as well (e.g. TypeScript).

One thing I noted in the process of this is that we configure the ECMAScript version here:

ecmaVersion: 2018,

This is particularly relevant for my previous comment where I mention being unable to use ES2019 features. It's pretty clear by this code why this wouldn't be expected to work 😄 I haven't tested yet, but it's possible that Espree versions prior to the breaking 5.0 version might have support for newer versions of ECMAScript, so that we could still use those language features without bumping to the latest Espree version.

@sainthkh
Copy link
Contributor

I'm working on this with #19952.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Dev Ready for, and needs developer efforts [Tool] Docgen /packages/docgen
Projects
None yet
4 participants