Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spell Checking #5305

Open
Tracked by #5346
rkusa opened this issue Jul 10, 2022 · 65 comments
Open
Tracked by #5346

Spell Checking #5305

rkusa opened this issue Jul 10, 2022 · 65 comments
Labels
enhancement [core label] potential extension Functionality that could be implemented as an extension (consider moving to community extensions) priority request A request from a stakeholder or influential user

Comments

@rkusa
Copy link

rkusa commented Jul 10, 2022

Is your feature request related to a problem? Please describe.
I regularely make a lot of typos in code comments, docs, but also in variable names. It is always annoying if pull-requests get post-poned just because of typos reviewers found.

Describe the solution you'd like
I find it tremendously helpful if Zed could spell checks my code. I personally rely a lot on this in e.g.:

Things the spell checker might check:

  • comments,
  • doc comments,
  • strings,
  • segments of a variable name (eg. in get_name/GetName/getName, it would check get and name).

Additional useful features:

  • The possibility to add words to a system-wide dictionary.
  • The possibility to add words to a project-specific dictionary.
  • The possibility to check multiple languages at the same time (e.g. for non-English code, variables are often in English, while strings and comments might contain text in another language).

I'd find it especially neat if the spell check would use native system APIs so that I can expect consistent behaviour between the editor and other apps on my system.

Screenshots

Sublime Text:

image

VSCode Code Spell Checker extension:

image

@rkusa rkusa added enhancement [core label] triage Maintainer needs to classify the issue labels Jul 10, 2022
@iamnbutler
Copy link
Member

My dream would be to embed something like Grammarly right in Zed :)

@rkusa
Copy link
Author

rkusa commented Dec 10, 2022

Just came along the following code spelling library written in Rust. Just dropping the link in case it would help: https://github.com/crate-ci/typos

@JosephTLyons JosephTLyons transferred this issue from zed-industries/community Jan 24, 2024
@jaydenseric
Copy link
Contributor

I just tried setting up Zed today and this was the showstopper issue that made me go back to VS Code. Spell check is a critical concern of an editor; it's generally too complex to enforce correct spelling cross-platform via CLI and CI with any degree of confidence and you may be contributing spelling errors to projects with tooling and CI not in your control. I am hopelessly dependent on spellcheck when writing hundreds of words worth of comments and docs per day, or when naming things like types, functions, and variables.

VS Code spell check is quite poor because it's via an extension (https://marketplace.visualstudio.com/items?itemName=streetsidesoftware.code-spell-checker) and not the native macOS spellcheck system, so the spell check UI is inconsistent with other apps and the learned words and not in sync with every other app on your system. But at least it has a solution.

There are non VS Code editors out there that have really nice native OS based spell check, so it's possible.

@SpyrosMourelatos
Copy link

SpyrosMourelatos commented Mar 4, 2024

I would prefer LanguageTool to Grammarly as it FOSS and supports more languages

@JosephTLyons JosephTLyons added the priority request A request from a stakeholder or influential user label Mar 10, 2024
@JosephTLyons
Copy link
Collaborator

JosephTLyons commented Mar 25, 2024

This feature feels like a rather large one to add to Zed, when considering all of the things you should be able to do with spell check. What does a good first pass look like? What would be the bare minimum needed to ship something useable?

Some unknowns:

  • Is there a Rust crate out there suitable for spell checking in the context of code?
  • Would we have some sort of setting for which file types to spell check, or, maybe a setting for which file types to not spell check.. or both? I think VS Code's spell check has these settings.
  • How do we want to surface the spelling errors? Is it just another red squiggly line and do we have a way to suggest a fix? I feel like for a first pass, we could simply display the potential matches in the hover, and a future follow-up could add the possibility to accept a suggestion for a typo.
  • VS Code provides spell checking for multiple languages, do we just ship English first? Maybe we'll be lucky and have a crate that can handle multiple languages.

Bonus points

Spell checking in Zed's chat editor.

Future AI ideas

A distant future goal might be to somehow leverage the supported AI models in Zed to fill suggestions for mispelled words, if the suggestions provided by some crate aren't the greatest.

@bajtos
Copy link

bajtos commented Mar 26, 2024

How do we want to surface the spelling errors? Is it just another red squiggly line and do we have a way to suggest a fix?

In VS Code, the Grammarly extension integrates with the "suggested fix" feature provided by language servers. I can use the same keyboard shortcut to fix ESlint violations and spelling/grammar mistakes.

@jansol
Copy link
Contributor

jansol commented Mar 26, 2024

Vale is an offline rule-based "prose linter" (spelling & style checker) with an official LSP implementation. It is also code-aware so it can check code comments and won't get confused by markdown or HTML. Seems like a great fit?

@arthur-st
Copy link

Testing Vale this week, and it works quite well on the CLI level. The configuration might be a bit tricky to set up, but should be fine sailing afterwards.

@phaynes

This comment was marked as off-topic.

@brandondrew

This comment was marked as off-topic.

@levlaz
Copy link

levlaz commented Jul 11, 2024

I'm trying to use zed to write docs in MD and MDX. The lack of a spell checker is really painful.

@albassort

This comment was marked as off-topic.

@phaynes
Copy link

phaynes commented Jul 23, 2024

Hi,

I have taken a first stab at creating a configurable full spell checker / grammar checker and proofing engine that integrates to Zed - example key bindings included, uses the OpenAI and Anthropic API's.

The markdown proofing engine is here, and I am finalising an initial baseline to part of a general publication engine from text - although I am starting with research papers.

Any and all feedback would be greatly appreciated. This is genuinely a first drop of the approach.

Philip

@florinpatrascu
Copy link

I truly appreciate the effort the team and contributors put into bringing this feature to life, but I have to ask: is there any chance of getting a spellchecker for Zed that doesn't rely on a remote service or require a local GPU?

@maxdeviant maxdeviant added the potential extension Functionality that could be implemented as an extension (consider moving to community extensions) label Jul 27, 2024
@jansol
Copy link
Contributor

jansol commented Jul 27, 2024

I truly appreciate the effort the team and contributors put into bringing this feature to life, but I have to ask: is there any chance of getting a spellchecker for Zed that doesn't rely on a remote service or require a local GPU?

The Vale extension does exactly this. It relies on a local dictionary (defined with plain text files) that it matches words and phrases against with plain old regular expressions. Unfortunately it currently advertises support for Markdown files (Vale itself also supports spellchecking comments in programming languages), and the language server was crashing a lot when used from zed last time I tried. Nobody really knew why, though.

maxdeviant added a commit to zed-industries/extensions that referenced this issue Sep 9, 2024
Add support for the [Typos Language
Server](https://github.com/tekumara/typos-lsp).

Typos is a spell checker, this extension will help having a bit of spell
checking available while waiting for
zed-industries/zed#5305

---------

Co-authored-by: Marshall Bowers <elliott.codes@gmail.com>
@camstuart
Copy link

+1 cspell would be my vote!

@notpeter
Copy link
Member

notpeter commented Sep 15, 2024

+1 cspell would be my vote!

I've created a dedicated issue on the extensions repo for cspell via LSP:

@JosephTLyons
Copy link
Collaborator

JosephTLyons commented Sep 20, 2024

Just popping in to say that someone has contributed a typos extension. I think we should leave this open though, as the underlying typos crate is intentionally low-false positive, so it doesn't catch as much as a normal spell checker.

SCR-20240920-nofy

See: zed-typos extension for more.

@mocenigo

This comment was marked as off-topic.

@rucoder
Copy link

rucoder commented Oct 18, 2024

Just popping in to say that someone has contributed a typos extension. I think we should leave this open though, as the underlying typos crate is intentionally low-false positive, so it doesn't catch as much as a normal spell checker.
SCR-20240920-nofy

See: zed-typos extension for more.

it works pretty good for me, but I cannot find a way to 'ignore' a word. Is there any configuration file for typos extension?

@bbb651
Copy link
Contributor

bbb651 commented Oct 18, 2024

Yes, see the typos reference and the README.
typos-lsp has an open issue to add an lsp action for it, and there's a draft PR to implement it.

@kakalot0008
Copy link

I installed the typos extension, but it's not working. How can I configure it?

@scorphus
Copy link

scorphus commented Nov 5, 2024

Hey, @kakalot0008. I got it working. Here's the snip of my settings.json:

// Zed settings
//
// For information on how to configure Zed, see the Zed
// documentation: https://zed.dev/docs/configuring-zed
//
// To see all of Zed's default settings without changing your
// custom settings, run `zed: open default settings` from the
// command palette (cmd-shift-p / ctrl-shift-p)
{
  // ...
  "lsp": {
    "typos": {
      "initialization_options": {
        // Diagnostic severity within Zed. "Error" by default, can be:
        // "Error", "Hint", "Information", "Warning"
        "diagnosticSeverity": "Hint",
      }
    }
  },
  // ...
}

Please let me know if that works for you!

@kakalot0008
Copy link

Hi, @scorphus. It's worked. Thanks for your setting.

@blopker
Copy link

blopker commented Dec 30, 2024

Hey all! I took some time to review all the solutions presented so far and wrote up a survey doc: https://gist.github.com/blopker/2a56b205eaaeb4f10ed7a4c2729c27c2 Would love any feedback people have.

The TLDR is that I think there are two ways to go here:

  1. Create a language server around CSpell. This has downsides though (in the above doc).
  2. Or create a code-aware language server around Helix's Spellbook. This would be more work, but I think a better outcome.

Thoughts?

@mocenigo
Copy link

mocenigo commented Dec 30, 2024

I would prefer LanguageTool to Grammarly as it FOSS and supports more languages

No, please. It is slow and uses a LOT of resources as I tried using ltex — that uses LanguageTool as a backend – with Zed and the experience was extremely frustrating.

@mocenigo
Copy link

I installed the typos extension, but it's not working. How can I configure it?

It corrects very few errors — because the developers want very few false positives. And only with some file types.

@mocenigo
Copy link

Hey all! I took some time to review all the solutions presented so far and wrote up a survey doc: https://gist.github.com/blopker/2a56b205eaaeb4f10ed7a4c2729c27c2 Would love any feedback people have.

The TLDR is that I think there are two ways to go here:

  1. Create a language server around CSpell. This has downsides though (in the above doc).
  2. Or create a code-aware language server around Helix's Spellbook. This would be more work, but I think a better outcome.

Thoughts?

I believe that Zed should have built in support for spell checking, not grammar. I think grammar, as well as style, should be a matter for more specialised tools, like the inline AI assistants.

This support should be integrated because the system must first recognise whether some character triggers the operation of a language server, and not interfere with the latter as long as it has control.

Since “spell checking” is part of the old 2024 timeline set by @nathansobo , I believe the plan looks like something along these lines.

@ygingras
Copy link

I tried the ltex extension for Zed and it does a really good job of figuring out that comments should be grammatically correct text while variable names should be only be spelled mostly correctly. It also does this for a bunch of programming languages. I'm not sure how it works, but I like that part a lot. I can also supply a custom dictionary on a per programming language basis. There is a pop-up to add words to my dictionary when I right click on an underlined word in Zed, but that does not work for me and the word stays underlined until I add the word to my dictionary file manually.

I would not want my code to be sent to a third party. Having the option to run spell checking completely local should be a pretty high priority.

I regularly write in french and in english, sometimes in the same document when I make a bilingually document, let's say a draft of a Meetup event, which does not support language tagging. example

Being able to quickly switch the spell checker from one language to the other is pretty important to me. Even better is Firefox that allows me to enable a bunch of dictionaries all at the same time. This way I don't have to switch, at the very small risk that I will introduce french words in the english parts of my document. I'm fine with that.

@phaynes
Copy link

phaynes commented Dec 31, 2024

Hey all! I took some time to review all the solutions presented so far and wrote up a survey doc: https://gist.github.com/blopker/2a56b205eaaeb4f10ed7a4c2729c27c2 Would love any feedback people have.

The TLDR is that I think there are two ways to go here:

  1. Create a language server around CSpell. This has downsides though (in the above doc).
  2. Or create a code-aware language server around Helix's Spellbook. This would be more work, but I think a better outcome.

Hi Bo,

Thank you for your considered analysis.

I think an underlying issue when it comes to code spell-checking is that there are probably several use cases that all fall under the banner of "code spell-checking".

Spell Checking as Auto Correct: There is definitely the use case where people want a fast inline spell-checker; for some, it is as an aid to typing.

However, as I touch type, any automatic inline spell-checking gets in the way, and thus performing spell check as a discrete step is perfectly fine.

Furthermore, I see the narrow scoping of spell check as problematic, since spelling is often associated with grammar (e.g., their/there, principle/principal). Being Australian, I am quite fussy about International English spelling (e.g., colour not color). While I produce all my written work in Zed, when proofing I find the ability to "choose your style" in Claude.ai as very helpful.

Zed already deals with the issue of AI keys and is able to use both local and remote LLMs. So while I accept your analysis of locality for "spellchecking as autocorrect", I don't see that it holds generally.

Furthermore, given that the coding ability of AIs is rapidly approaching science fiction levels—prompting, specification, and other high-level language-based tasks—I am struggling to find use cases where my development work is not connected to AIs, particularly in larger software engineering teams.

But on the other hand, this is not how everyone will work.

Philip

@jansol
Copy link
Contributor

jansol commented Dec 31, 2024

Daydreaming about the capabilities of a fictional AI is nice and all, but many organizations do in fact care about their data and explicitly prohibit sending it overseas for black box processing, at the very least in order to comply with local regulations such as the GDPR.

This means that if Zed has non-local spell checking, it does effectively not have spell checking. And if said spell checking can't be completely disabled, Zed can't be used at such orgs at all.

@brandondrew
Copy link

The TLDR is that I think there are two ways to go here:

  1. Create a language server around CSpell. This has downsides though (in the above doc).
  2. Or create a code-aware language server around Helix's Spellbook. This would be more work, but I think a better outcome.

Thoughts?

Wow, thanks for all the work you put into researching that!

Here's a summary that is slightly more detailed than your TLDR, but less detailed than the full gist:

  • There doesn't currently seem to be a portable code spell checker that would work for most people. All the current solutions either are editor-specific, do too much, or do too little.

  • Different people have different goals for the spell-checking, so there are open questions around scope, locality, languages, use of native APIs, libraries, configuration, and UI.

  • It seems best that a code spell checker not try to fix grammar issues, and that it not try to also cover prose.

  • Both for privacy and performance it seems best for the spell checker to keep all data local, and therefore to run locally.

  • It seems best to prioritize flexibility in programming languages, although supporting multiple natural languages should also be a goal.

  • Native spell checkers in operating systems are not necessarily well-suited for programming languages.

  • There don't seem to be any current solutions that match our needs.

    • Code Spell Checker for VSCode — no language server, thus not usable by Zed; written in Typescript thus not as performant as we might want;
    • Harper — explicitly focused on English prose, not code in multiple languages;
    • Language Tool — written in Java and memory-hungry;
    • Vale — more of a linter than a spell-checker; the complicated configuration that is required creates an unnecessary barrier to entry;
    • Typos — has a ton of false negatives, catching only a few errors because it's meant to be run in CI; uses a hardcoded 'typos' word list, creating a maintenance burden around supporting multiple natural languages;
    • Spellbook — not a complete solution, but could be a powerful building block; written in Rust, a Hunspell-compatible spell checking library, if someone writes a language server for it.

Best options:

  1. write a language server for CSpell: fastest route to a solution, but it comes with the JavaScript performance overhead;
  2. build a code spell checker around Spellbook. This would involve a bit more work since a language server needs to be created, along with working out the complexities around parsing code into words to feed into it.

@jansol
Copy link
Contributor

jansol commented Dec 31, 2024

  1. build a code spell checker around Spellbook. This would involve a bit more work since a language server needs to be created, along with working out the complexities around parsing code into words to feed into it.

I like this option -- I'd argue that it should not be done via a language server (and thus not as a plugin, at least not without adding some more API for plugins).

A plain dictionary-based spell checker could easily be hooked into the existing tree-sitter integration which gives it knowledge about keywords etc that need to be excluded, essentially for free. This also means it can check keywords against an English or other programming language-specific dictionary regardless of what natural language is used otherwise. And adding support for shipping dictionaries as plugins should be quite uncontroversial, especially if they use a well-known format to begin with.

I think a major requirement for this path would be having a context menu item for adding a new word to a local user dictionary as well as a view for managing said user dictionary (add/remove/edit words). The requirement for custom UI means again that this can't be done as an extension for the foreseeable time.

@brandondrew
Copy link

I think a major requirement for this path would be having a context menu item for adding a new word to a local user dictionary as well as a view for managing said user dictionary (add/remove/edit words). The requirement for custom UI means again that this can't be done as an extension for the foreseeable time.

While I agree a context menu would be good to have, the other parts of this work could proceed without waiting until we can create a context menu. We're already accustomed to editing config files for Zed without any UI around our preferences and settings. We can do the same thing with word lists until the time when it is possible to add a context menu.

@phaynes

This comment was marked as off-topic.

@mocenigo
Copy link

mocenigo commented Jan 2, 2025

@phaynes I am not sure most people would be comfortable with an editor that uses a few GB of RAM to spell check. Let us focus on the actual problem of spell checking, lightweight and using little memory — fancier AI based systems can always be implemented as add-ons, ie extensions. And be optional.

@jansol
Copy link
Contributor

jansol commented Jan 3, 2025

While I agree a context menu would be good to have, the other parts of this work could proceed without waiting until we can create a context menu.

Yes.

  • loading dictionaries
  • checking non-keyword tree-sitter nodes against the dictionary (or only specific captures? i.e. identifiers and/or string literals)
  • attaching "diagnostics" to misspelled nodes/words

are things that could be worked on right away by anyone. Extensions don't have access to the tree-sitter data so it would have to happen in the main Zed code base though.

Then once the basic functionality is there we can look into the UI side. And the Zed team will probably want to do their own UI design pass on it at that point, to keep things cohesive and in line with their vision.

@blopker
Copy link

blopker commented Jan 3, 2025

I would like to see a spell checker built into Zed. However, I'm also in favor of an extension, especially if it helps us get something working sooner. I'm often having to copy text around for spell checking, it's a productivity drain. An extension is also something I could bring into any LS capable editor, which I think would be huge for a lot of people.

I put a rough POC together that attempts to mix tree-sitter + word splitting heuristics + Spellbook into the beginnings of an LS: https://github.com/blopker/codebook. If someone is up to help me with this, let me know, but I'll need a lot of help with my Rust 😄

I found that tree-sitter is fast. Reusing Zed's data may not be that beneficial. Plus, having control over the tree-sitter queries is nice. Example here.

@brandondrew
Copy link

I'm often having to copy text around for spell checking, it's a productivity drain.

I'm not trying to dispute any of the points you're making, but while we wait for better support in Zed for checking spelling, you can have a file open in multiple editors at the same time. While I (obviously) haven't tried every single pair of editors, I've tried several and rarely (if ever) had problems. So you can do spell checking in XYZ Editor while you do most of your work in Zed. Not ideal, but much better than copying and pasting between apps.

@mocenigo
Copy link

mocenigo commented Jan 8, 2025

I found that tree-sitter is fast. Reusing Zed's data may not be that beneficial. Plus, having control over the tree-sitter queries is nice. Example here.

About this "reusing" — so there could be more copies of the tree-sitter data and tree-sitter internal structures for the same file? Since in the past for some cases the data structures of tree-sitter were exploding, I am not sure this is a good idea.

@blopker
Copy link

blopker commented Jan 10, 2025

I've been hacking away on the language server/extension path, and I'm at the stage now where I was able to fix a misspelling in my own README for this extension 😅.

image

Oops.

It's still a ways away from being usable though. The extension doesn't have any quick actions, just the message with suggestions in it. It also is a bit too aggressive with URLs. I think we'll need a dictionary of common jargon terms to ignore as well.

Since in the past for some cases the data structures of tree-sitter were exploding, I am not sure this is a good idea.

I'd guess many language servers are doing their own parsing? I'm not sure what the alternative is besides using Zed's structures, but I don't think there's a way for an extension to do that. I do think there are ways to minimize the impact in this case, like only spell checking on open and save.

Edit: Use more precise language - language servers, not extensions do the parsing.

@jansol
Copy link
Contributor

jansol commented Jan 10, 2025

Extensions don't do any parsing themselves, they merely ship a tree-sitter grammar for Zed to load. Extensions can also download and run language servers, and those of course parse the code... but they typically rely on a full-on compiler/interpreter for the language in question for that, not tree-sitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement [core label] potential extension Functionality that could be implemented as an extension (consider moving to community extensions) priority request A request from a stakeholder or influential user
Projects
None yet
Development

No branches or pull requests