-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spellchecking - how to give JS devs more control #166
Comments
I think that there are two cases here:
We've been avoiding the first scenario because it's really tricky. Taken today's limitations, the spell checking engine needs to be placed on the server side (Grammarly, WebSpellChecker) and then it needs to be integrated very well with the client side. The last part is especially problematic because one service needs to be integrated with various editors. It also requires custom UI as we're not able to control the native context menu well. So it's getting costly and messy. E.g. when I checked it in the past Grammarly was crashing Draft, Medium, ProseMirror (I think) and to some extent CKEditor 5 too. So, resolving this situation could be one thing. I guess that what everyone would like to see is a non-intrusive way to underline (with a squiggle) ranges of text. If that was possible without changing the DOM a lot of things would become much easier (e.g. custom IME implementations? ;)) The second scenario is what most of us choose – to integrate best with the native spell checker. For us (CKEditor 5) it means two things:
|
Talking about spell checking – I've just stumbled upon a case where Chrome's spell checker replaces |
so are you looking to be able to do something like |
We may also need a spellCheckOn function that behaves similarly. |
Just wanted to second @Reinmar's issue with spellcheck replacing The other big issue we have with spellcheck is that the little red squiggles re-render with a visible flashing if we update the DOM via React while typing. And the flashing is visible to the end user which is not a great experience. |
@ianstormtaylor "I'd think that spellcheck should only operate on the text node level, and not mess with the hierarchy." We discussed this earlier, and it turns out it's problematic for some browsers as they want to look at partially styled words as one word. So for example "fissh" ( |
@gked What would As I wrote in #166 (comment), there are two scenarios that we need to handle:
|
I disagree on that. In combination with a cancelable beforeinput event, this could make for something useful. For example, in a collaborative editor, one may want to disable spell checking for ranges of text that a collaborator already has checked and accepted. Or one may want to mark a range as dirty because it has not been checked before.
For Fidus Writer I implemented it using the open source LanguageTool which is simply bundling most of the open source dictionaries/grammar checkers there are out there. The size of that package is around 160MB for a set of mot major languages. Maybe in 10 years time that is something we can just embed in a webpage, but for now it still needs to be done server side. But the integration of that has to take place through a specialized connector in the JS editor that is aware of what is what -- for example, in the implementation I needed to make sure it ignores citations as well as text that has been tracked as deleted. If one wanted to create something that works on all JS editors without specialized plugins, one would have to have each of the JS editors implement some kind of serialization/deserialization method that ignores/hides content that should not be taken into consideration when doing the check.
That does not look good. Have you tried this on all browsers on all OSes? Last I checked on mobile, Android would let words go across such boundaries. |
An example of how complex it can get with a grammar checker: Citations in FW can basically be either shown as "(Einstein, 1932: p.34)" or "Einstein (1932: p.34)". The second case is used if it is used as subject/object of a sentence, such as "Even Einstein (1932: p. 34) thought this was a huge problem." But the "Einstein" needs to be part of the "citation" object, because depending on the citation style, it may be rendered as "Einstein", "A. Einstein", "Albert E.", etc. . So now the grammar checker sees this, and knowing nothing about the JS editor and how it renders things, It just sees "Even [CITATION] thought this was a huge problem." and it will complain about a missing subject in that sentence. |
👍 I haven't thought about such scenarios but that makes a lot of sense. So yes, 👍 for such a method.
Totally agree. Today (and I think that won't change) this should be done on the server side. But that itself doesn't change the picture of what happens on the client. It just happens slower.
I agree that in more specialised use cases the spell checker must be tightly integrated with the text editor itself. But:
|
The problem is that if it's 99% simple text but there is just one slightly more complex element that gets messed up on just one of the browser/OS combinations, that already makes it quite unusabffdle. Or if the end user with her/his Grammarly subscription is able to edit social media comments on most platforms correctly, but that one day he/she needs to hand in a school essay it messes up the entire paper 5 minutes before the deadline, it's highly annoying. I just tried Grammarly again for the first time in a few years. It seems to work in this issue, but in Gmail it just managed to delete an entire email I had typed already. That's not really helpful. What it does is it creates an element that overlays the existing textarea/CE with the same content but with marked ranges and then it hopes that this doesn't interfere with the JS app it's working on. This works in Github comments, but apparently doesn't always work in Gmail. However, server-based spell checking doesn't always make sense. Some pages may need to work while offline. And it doesn't really make sense to spend a lot of server resources on something that the browser can do already given that browsers OSes come bundled with spellcheckers. So a combination of these things should be helpful in all cases:
|
I still think if you constrain the problem to only operation on text nodes you end up with a better solution here that editors are more likely to be able to work with. For example, given this starting point: <strong>fi</strong>ssh If the browser decides it really needs to combine the letters under a single element, it can perform what amounts to a "remove" to erase <strong>fish</strong> (But it still hasn't modified or created any non text nodes.) But I think that's still not great. I think browsers should strive to not mess with which letters appear nested in which elements either even. Instead, in that case they should just do a diff to figure out they need to remove the extra <strong>fi</strong>sh This would leave the DOM in the most initial-like state, and would be what editors expect a spellcheck to do. |
FWIW too, the |
I think most (all?) of us on the JS side of things tend to think that way, but at least last we checked, browser people thought differently about this and felt like they needed to check "partially styled words" like that. That we came up, as a working compromise, to have a cancelable before input event and a way to enable/disable spell checking for specific ranges and/or to have a way to communicate to the browser that certain element boundaries are to be regarded as word boundaries. For example in Things may have changed now. It's interesting to hear @Reinmar say that they do only check within text nodes. I wonder if that also is the case in IMEs on mobile, for example.
I think that's not really the best solution, as it doesn't tell the JS what the semantic purpose of those operations is. Maybe the JS decides it wants to ignore the spell checking entirely. Or it wants to show an animation in the case of spell checking fixes. Yet those operations don't really tell it what the user asked for. Instead, it should create an atomic operation with a beforeinput event with the inputType set to "insertReplacementText". The target range should cover the entire range that the spell checking covers. That way the JS can cancel this beforeinput event and handle it exactly as it intended to do itself, figuring out how to deal with the strong element in there. |
@johanneswilm that makes sense about word boundaries. But I think the difference for me is that there's a distinction between how the spellcheck recognizes words, and how it treats existing markup when "fixing" the words. In my mind, there's nothing stopping them from recognizing words across inline elements, but still "fixing" the errors without changing the structure of those inline elements.
Right, right. I'm not talking about the actual events that fire. What I'm trying to point out is that it's ridiculous that browsers are editing the actual markup, instead of just changing the text nodes in response to these I don't understand why: <strong>fi</strong>ssh Would ever turn into: <b>fish</b> Or even: <strong>fish</strong> When it could just as easily be kept intact as: <strong>fi</strong>sh While still applying the "fix" for the spelling. I'm not arguing against the existing spec for |
I think the explanation here is that they serialize it all into plaintext before applying the spell checking. And then when applying the spell checking then don't really have an easy and effortless way of reapplying the styling. I think a lot of this is about cultural differences. And I'm not talking culture differences between countries, religions or socio-economic differences. Browser devs seem to have a different experience of how it's all used than JS devs who are into this area. The browser devs seem to mostly have in mind cases where very little or no JavaScript is being used and where elements use the semantic meaning they have and where browsers can by default do DOM modifications that are specific to the platform they are on. Say on an iPhone, the browser may move the entire word into the So the compromise we came up with was kind of that the browsers do whatever DOM modifications they need to do, but that those can be replaced by defining alternative behavior using the But opinions change and we can try to debate this again. And yes, the |
Edge and Firefox do not replace |
I stumble upon a visual markdown editor today that render everything to actual DOM elements, the tradeoff from not using a regular textarea was the lost of spellchecking 😞 there was a solution doe and it was do embed a large dictionary which i didn't want to do. |
Please we need a way to check if an element has misspelled words with JavaScript 😞 |
@erickponce could you please expand on scenarios you are looking to accomplish with onSpellError? |
Hello there @gked 😀 Thank's for the reply, basically the company I work has a chat app with a spellcheck feature that need's to block the user ability to send the message if there's a misspelled word. We had to implement a whole spellcheck system using backend and a lot of frontend hacks to be able to do that. On a side note, I personally don't like the idea of blocking the user ability to send the message, but... that's a whole other story right 😅? |
I think there two uses cases:
I agree with you but I want to include another difference:
Notes:
For info there a French open-source grammar checker (grammalecte) that perform Both spell AND grammar check on client side. The grammar checker runs in a Worker launched from the background script. (But only check french) |
We have previously discussed the possibility of letting JS have more direct access to the browser internal spell checker. We refrained from that due to security concerns because users may have added secret terms to their dictionaries.
Because JS editors do at least some of the DOM changes manually, the browser cannot know exactly which parts are still in need of spell checking and which parts have been spell checked already.
Also, the language of the spell checker is set by the browser and not the editor, and this does not always make sense. For example, a JS editor may know that the language of a specific text is French, but the browser still applies the user's standard dictionary English to the text and on a small mobile device it may be difficult even for the user to change the language used by the spell checker.
What is needed:
Some way for JavaScript to either execute spell checking itself using the dictionary resources present in the browser (possibly without any user customizations), OR
a way for JavaScript to communicate to the browser that a specific range of text either has been spell checked already OR that it is not in need of spell checking.
Additionally, it would be good if the JavaScript could tell the browser which spellchecker language to use on a specific range of text. Lower priority, but still nice to have, would be for the JavaScript to be able to tell the browser that certain terms are field-related terms and therefore not in need of spell checking.
Whatever we come up with should be extendable to also be able to cover grammar checkers some time in the future.
The text was updated successfully, but these errors were encountered: