Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to check links #173

Merged
merged 7 commits into from
Oct 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 32 additions & 15 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -44,5 +44,8 @@
"unist-util-visit": "^4.0.0",
"yargs": "^17.7.2",
"zx": "^7.2.3"
},
"dependencies": {
"markdown-link-extractor": "^3.1.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better as a dev dependency.

}
}
140 changes: 140 additions & 0 deletions scripts/commands/checkLinks.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
// This code is a Qiskit project.
//
// (C) Copyright IBM 2023.
//
// This code is licensed under the Apache License, Version 2.0. You may
// obtain a copy of this license in the LICENSE file in the root directory
// of this source tree or at http://www.apache.org/licenses/LICENSE-2.0.
//
// Any modifications or derivative works of this code must retain this
// copyright notice, and modified files need to carry a notice indicating
// that they have been altered from the originals.

import { globby } from "globby";
import { existsSync, readFileSync } from 'fs';
import path from 'node:path';
import markdownLinkExtractor from 'markdown-link-extractor';

const DOCS_ROOT = "./docs"
const CONTENT_FILE_EXTENSIONS = [".md", ".mdx", ".ipynb"]

class Link {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be super valuable to have some unit tests for Link.resolve() and Link.check(). They don't have any side effects anymore and they're now pure functions, so it should be easy to test.

To do so, you'd add a file scripts/commands/linkChecker.ts or something like that. Copy this code over. Then import it from your commands file. And add a test file similar to dedupeIds.test.ts.

npm test to ensure the test works.

readonly value: string
readonly anchor: string
readonly origin: string
readonly isExternal: boolean

constructor(linkString: string, origin: string) {
/*
* linkString: Link as it appears in source file
* origin: Path to source file containing link
*/

const splitLink = linkString.split('#', 1)
this.value = splitLink[0]
this.anchor = (splitLink.length > 1) ? `#${splitLink[1]}` : ''
this.origin = origin
this.isExternal = linkString.startsWith("http")
}

resolve(): string[] {
/*
* Return list of possible paths link could resolve to
*/
if ( this.isExternal ) { return [ this.value ] }
if ( this.value === '' ) { return [ this.origin ] } // link is just anchor
if ( this.value.startsWith("/images") ) {
return [ path.join("public/", this.value) ]
}

let baseFilePath
if (this.value.startsWith('/')) {
// Path is relative to DOCS_ROOT
baseFilePath = path.join(DOCS_ROOT, this.value)
} else {
// Path is relative to origin file
baseFilePath = path.join(
path.dirname(this.origin),
this.value
)
}
// Remove trailing '/' from path.join
baseFilePath = baseFilePath.replace(/\/$/gm, '')

// File may have one of many extensions (.md, .ipynb etc.), and/or be
// directory with an index file (e.g. `docs/build` should resolve to
// `docs/build/index.mdx`). We return a list of possible filenames.
frankharkins marked this conversation as resolved.
Show resolved Hide resolved
let possibleFilePaths = []
for (let index of ['', '/index']) {
for (let extension of CONTENT_FILE_EXTENSIONS) {
possibleFilePaths.push(
baseFilePath + index + extension
)
}
}
return possibleFilePaths
}

check(filePathCache: string[] = []): boolean {
/*
* True if link points to existing file, otherwise false
* filePathCache: array of known existing files (to reduce disk I/O)
*/
if (this.isExternal) {
// External link checking not supported yet
return true
}

const possiblePaths = this.resolve()
for (let filePath of possiblePaths) {
if (filePathCache.includes(filePath)) {
return true
}
}
// Check disk for files not in cache (images etc.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was thinking the images should be in the cache too. The only files that we expect you to use locally are images from public/images and other docs/ files. Otherwise it's an error.

for (let filePath of possiblePaths) {
if (existsSync(filePath)) {
return true
}
frankharkins marked this conversation as resolved.
Show resolved Hide resolved
}

console.log(`❌ ${this.origin}: Could not find link '${this.value}'`)
return false
Comment on lines +101 to +102
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of logging directly in this function, it might be a cleaner user interface to collect all the failing links in the main function or checkLinksInFile, and then bundle them together as a single console error, where you list all the failures.

This is a lot of repeated warnings right now, which can be noisy.

See an example:

const notebookErrors = [];
const notebookFiles = await globby("docs/**/*.ipynb");
for (const file of notebookFiles) {
if (IGNORED_FILES.has(file)) {
continue;
}
const metadata = await readMetadata(file);
if (!isValidMetadata(metadata)) {
notebookErrors.push(file);
}
}

You'd have this function return something like structured data or a string representing what the failure is. Return undefined if it's okay. Return type is string?.

}
}

function markdownFromNotebook(source: string): string {
let markdown = ''
for (let cell of JSON.parse(source).cells) {
if (cell.source === 'markdown') {
markdown += cell.source
}
}
return markdown
}

function checkLinksInFile(filePath: string, filePaths: string[]): boolean {
const source = readFileSync(filePath, {encoding: 'utf8'})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll still want this to be an async read, which will require marking async function checkLinksInFile. This line will become const source = await readFile(...)

Then, lines 130-133 will become:

const results = await Promise.all(filePaths.map(fp => checkLinksInFile(fp, filePaths))
if (results.some(x => !x)) {
}

That will parallelize things.

const markdown = (path.extname(filePath) === '.ipynb') ? markdownFromNotebook(source) : source
const links = markdownLinkExtractor(source).links.map((x: string) => new Link(x, filePath))

let allGood = true
for (let link of links) {
allGood = link.check(filePaths) && allGood
}
return allGood
}

async function main() {
const filePaths = await globby('docs/**/*.{ipynb,md,mdx}')
let allGood = true
for (let sourceFile of filePaths) {
allGood = checkLinksInFile(sourceFile, filePaths) && allGood
}
if (!allGood) {
console.log("\nSome links appear broken 💔\n")
process.exit(1)
}
}

main()
9 changes: 9 additions & 0 deletions scripts/commands/declarations.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
type LinkExtractionResult = {
links: string[];
anchors: string[];
}

declare module 'markdown-link-extractor' {
function markdownLinkExtractor(string): LinkExtractionResult;
export = markdownLinkExtractor;
}