Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spellchecker to CI #1943

Open
brillout opened this issue Nov 1, 2024 · 23 comments
Open

Add spellchecker to CI #1943

brillout opened this issue Nov 1, 2024 · 23 comments

Comments

@brillout
Copy link
Member

brillout commented Nov 1, 2024

Description

Maybe we an use the tool that was used at #1835; IIRC the tool has an npm package.

@IanWorley
Copy link

IanWorley commented Nov 20, 2024

I would like to take a shot with this issue. Does not seem to difficult at all. I look into using the suggested spell checker @brillout

@brillout
Copy link
Member Author

@IanWorley Ok! So far I think the best would be to integrate the spell check similar to how the formatting check is integrated: https://github.com/vikejs/vike/blob/main/.github/workflows/formatting.yml.

@IanWorley
Copy link

IanWorley commented Nov 20, 2024

I will submit a draft pr but I still have some issue at this time for right now I have wrote a config to detect typos from mdx files in the docs directory. Do we want this running on the entire project and what is the scope? Anyway I will also reach out in the discussion of typos as it seems there examples of typos are not being pick up in the mdx file.

@brillout
Copy link
Member Author

Do we want this running on the entire project and what is the scope?

Not for now. (Eventually yes once we don't have any pending open PRs.)

For now we can run against the whole docs/ directory, include .tsx files. (No pending PR is modifying it.)

I will also reach out in the discussion of typos as it seems there examples of typos are not being pick up in the mdx file.

👍

@IanWorley
Copy link

IanWorley commented Nov 20, 2024

@IanWorley 👍 Btw. how big is their npm package? It's fine if it's a bit heavy, but it shouldn't be too heavy.
Sorry I did not mean to close the pr I just drop my comments.

To response to this I do not think typos is a npm package. I will look into using cspell at this moment my only problem is that there might be dictionary file if that is fine with you. By the way size of cspell is 203kb

@brillout
Copy link
Member Author

I found https://github.com/dalisoft/typos-rs-npm so maybe we can use that? We can also ask the typos team if there is an official npm package and if not whether it's on their radar.

I will look into using cspell at this moment my only problem is that there might be dictionary file if that is fine with you.

How would that be a problem? Can we implement this workflow with it?

By the way size of cspell is 203kb

That's totally fine.

@brillout
Copy link
Member Author

brillout commented Nov 23, 2024

Only obvious spelling issues

One issue of integrating spell checking in the CI is that it isn't clear how to handle false positives: upon a false positive the CI will be red. So we have to tell the spell checker that it's a false positive. I guess some kind of special comment like with Prettier (// prettier-ignore) and Biome (// biome-ignore) would be a simple and effective solution. Maybe also some kind of dictionary, but only if it stays very small (contributors should almost never have to update the dictionary).

So I'm inclined to go with a spell checker that has only few false positives.

Workflow

The reason I was mentioning using an npm package is because it would allow us to correct spelling mistakes by simply running an npm script.

That's what we're doing with formatting:

  • pnpm run format:check => throws an error if formatting is incorrect, used by the CI.
  • pnpm run format => fixes formatting issues, used by the user (e.g. Vike contributors).

I wonder if we can reproduce the same workflow for spell checking?

@IanWorley
Copy link

Yeah, at first I was going to use cspell, but the amount of manual dictionary entries seemed tedious at best. That’s why I chose typos-rs-npm instead—it seems far more maintainable than managing a massive custom dictionary with cspell.

@cr7pt0gr4ph7
Copy link
Contributor

For reference, here's the cSpell.json I used for #1983 (where unfortunately most changes didn't make it through) together with the Code Spell Checker VS Code extension.

The config should cover everything within vike/. docs/ is also covered, except for all the links containing GitHub usernames, where I gave up manually ignoring all those usernames halfway through (and a better approach would be to ignore those occurences using a general pattern anyway, which is also possible with that VS Code extension, though I didn't bother to in this case).

{
    "version": "0.2",
    "ignorePaths": [
        "**/node_modules/**"
    ],
    "dictionaryDefinitions": [],
    "dictionaries": [],
    "words": [
        "36masync",
        "36mconst",
        "36mexport",
        "36mfrom",
        "36mfunction",
        "36mimport",
        "36mreturn",
        "aaaaaba",
        "AaronBeaudoin",
        "Abramov's",
        "actiongroup",
        "Actix",
        "AFAICT",
        "aheissenberger",
        "Ahrefs",
        "alexrabin",
        "Alignable",
        "antd",
        "Aslemammad",
        "atrule",
        "AurelienLourot",
        "automagically",
        "bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq",
        "Bati",
        "batijs",
        "binedge",
        "birkskyum",
        "Blankeos",
        "brillout",
        "brunsten",
        "btih",
        "Bugsnag",
        "build-f7i251e0iwnw",
        "build-j95xb988fpln",
        "Bulma",
        "Burda",
        "burdaforward",
        "Chakra",
        "changeme",
        "chokidar",
        "chunk-DJBYDrsP",
        "cjsx",
        "crawlable",
        "ctsx",
        "cwabehiwqehqwueh",
        "cyco130",
        "daisyui",
        "dataendpointurl",
        "demonii",
        "disableautorun",
        "docpress",
        "dotenv",
        "ecosia",
        "ELIFECYCLE",
        "enableeagerstreaming",
        "esbuild",
        "evenodd",
        "fabioricali",
        "Fastify",
        "favicons",
        "flac",
        "Flightcontrol",
        "FOUC",
        "frontmatter",
        "gensync",
        "getbody",
        "getpage",
        "gonesurfing",
        "hapi",
        "hattip",
        "hemengke1997",
        "homescreens",
        "Hono",
        "httpresponse",
        "IIRC",
        "imagetools",
        "Immortalin",
        "importee",
        "importees",
        "inlang",
        "ipfs",
        "ipns",
        "jamesladd",
        "Jearce",
        "jeremypress",
        "jfif",
        "keepscrollposition",
        "Kenzo-Wada",
        "KHTML",
        "Laravel",
        "lewis-fidlers",
        "libvpx",
        "libx",
        "liefern",
        "llqijrlvr",
        "lolipop",
        "lourot",
        "luisfloat",
        "Macbook",
        "magne4000",
        "mantine",
        "marko",
        "mdast",
        "mdxeditor",
        "metafile",
        "mikew",
        "minifiers",
        "mjsx",
        "modulepreload",
        "msvideo",
        "mtsx",
        "navigations",
        "nextauth",
        "nextjs",
        "nextui",
        "nitedani",
        "nitedani's",
        "noexternal",
        "nojekyll",
        "Nprogress",
        "Nuxt",
        "onbeforeprerender",
        "onbeforerender",
        "onbeforeroute",
        "onclick",
        "onwarn",
        "openbittorrent",
        "opentrackr",
        "opral",
        "osseonews",
        "outro",
        "pagecontext",
        "pdanpdan",
        "phiberber",
        "phonzammi",
        "pinia",
        "Pixelatex",
        "pjpeg",
        "popstate",
        "Preact",
        "prerender",
        "prerendered",
        "primereact",
        "Prio",
        "pullstate",
        "quicktime",
        "quotepath",
        "Readables",
        "redirections",
        "renderpage",
        "Rollbar",
        "rollup",
        "royalswe",
        "ryanweal",
        "scaffolder",
        "shadcn",
        "signup",
        "solidjs",
        "sourcegraph",
        "stamppipe",
        "Strydom",
        "styleregistry",
        "stylesheet",
        "suppor",
        "Symfony",
        "tabspace",
        "tanstack",
        "tauri",
        "telefunc",
        "ThimoDEV",
        "transpiles",
        "transpiling",
        "trpc",
        "typesafe",
        "ultimateshadsform",
        "unpic",
        "unplugin",
        "untab",
        "Unternehmen",
        "urql",
        "useclientrouter",
        "useconfig",
        "usedata",
        "usepagecontext",
        "USEPOLLING",
        "vchirikov",
        "Vercel",
        "veryslow",
        "vike",
        "vike-react-antd",
        "vikejs",
        "Vilay",
        "vite",
        "vitejs",
        "vuetify",
        "vueuse",
        "Vuex",
        "Weltall",
        "wobsoriano",
        "yuva",
        "Zustand"
    ],
    "ignoreWords": [],
    "import": [],
    "enabled": true
}

@cr7pt0gr4ph7
Copy link
Contributor

@brillout What's your IDE of choice? It would probably make sense to use something in CI whose dictionaries/ignore comments/etc. are compatible with the IDE plugin.

@cr7pt0gr4ph7
Copy link
Contributor

cr7pt0gr4ph7 commented Nov 23, 2024

The reason I was mentioning using their an package is because it would allow us to correct spelling mistakes by simply running an npm script.

Based on the spelling errors I have seen in the source code, I don't now if a non contextual word-only spelling correction would work as intended, because it's not always the most similar "correct" word that was intended (which can often only be determined by taking context into account).

My recommendation would be an IDE extension to catch & fix errors as they appear, and a CI action as a safety net.

@brillout
Copy link
Member Author

@brillout What's your IDE of choice? It would probably make sense to use something in CI whose dictionaries/ignore comments/etc. are compatible with the IDE plugin.

Maybe I'm wrong but I feel like this would add too many restrictions on what spellchecker we can use. Especially considering that we already have a couple of restrictions.

Based on the spelling errors I have seen in the source code, I don't now if a non contextual word-only spelling correction would work as intended, because it's not always the most similar "correct" word that was intended (which can often only be determined by taking context into account).

Yea, the more typos the spellchecker catches the better. But I think we can already catch more than 90% of typos with a dumb spellcheck which I think it's good enough.

Having a good workflow is more important, the spellchecker shouldn't be too disruptive for contributors. That would be a no-go. And so far I think having to maintain a whitelist of unknown words is a no-go for that reason.

If we can find a very good spellchecker as well as a good workflow, then that'd be great but I ain't sure we'll find something perfect here.

My recommendation would be an IDE extension to catch & fix errors as they appear, and a CI action as a safety net.

Makes sense.

@IanWorley @cr7pt0gr4ph7 Btw. I was thinking one thing we could do is to implement our own npm package with typos-rs-npm as peer dependency. So we can add custom logic (e.g. for implementing // @typo-ignore) without having to maintain the spellchecking part. But let's see maybe we don't need this.

@IanWorley
Copy link

IanWorley commented Nov 26, 2024

@brillout Just a quick heads up just point out a bug with the typos package told by the maintainer should be up by sometime tomorrow

@dalisoft
Copy link

@IanWorley @brillout Try remove lockfile, node_modules and cache and re-install last version. It should work on Arch Linux (and other linux too). And all credits goes to @crate-ci for typos. I just made npm port for typos

@IanWorley
Copy link

IanWorley commented Dec 1, 2024

I will be firing out a pr soon I just had a build fail but that is due to ratelimit with typos so I will rerun the workflow and get this out to you @brillout
Minor Update I will Release something tomorrow might not fight this merge conflict tonight

@brillout
Copy link
Member Author

brillout commented Dec 1, 2024

👍 Looking forward to it.

ratelimit with typos

With crate-ci/typos? So it isn't offline / open source? Could the rate limit be an issue for us or do they give out free license keys for Open Source tools?

@dalisoft
Copy link

dalisoft commented Dec 1, 2024

@brillout Both crate-ci/typos and typos-rs-npm is free to use and license are permissive. Limit only for GitHub API which used by typos-rs-npm for pulling selected binary from GitHub Release

https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#primary-rate-limit-for-unauthenticated-users

@brillout
Copy link
Member Author

brillout commented Dec 1, 2024

Thank you for the prompt response! (FYI this is the workflow we aim for. I can see many other open source tools being interested in that.)

@dalisoft
Copy link

dalisoft commented Dec 1, 2024

I think it is achievable, not with all package managers (for typos-rs-npm, not yet) but it is possible

@brillout
Copy link
Member Author

brillout commented Dec 3, 2024

@dalisoft Have you considered publishing the package to npm? This would make using it a lot easier.

@brillout
Copy link
Member Author

brillout commented Dec 3, 2024

@dalisoft Have you considered publishing the package to npm? This would make using it a lot easier.

@epage @szepeviktor Is that something @crate-ci would be up for? That would be quite convenient, see rationale here.

See https://github.com/dalisoft/typos-rs-npm.

@epage
Copy link

epage commented Dec 3, 2024

We publish source and pre-built binaries to PyPI for easy integration with pre-commit. I'm not for abusing different package systems as a general distribution mechanism but your use case is more like a one-off pre-commit, so I'm open to it. I am unfamiliar with the npm ecosystem so I'm unlikely to set it up myself but if someone wants to contribute what is needed and tell my what account setup is needed, I'll take it. Without knowing more, my main requirement is it fits within our automated release process.

@dalisoft
Copy link

dalisoft commented Dec 4, 2024

@brillout Sorry for late reply and thanks again to interest. But i don't want publish to npm because it's too restrictive for me. The GITHUB_TOKEN limit comes not because of GitHub installation, it's because of fetching releases from GitHub. My library does not come with bundled typos package, it's fetched at postinstall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants