Ignore markdown regions with code hints #53

supernovae · 2021-08-04T14:34:31Z

Is it possible to tell spellcheck to ignore the markdown hinted areas for language?

For example, if I have

```bash
kubectl get nodes
```

Can it just not spellcheck that bash statement? I have lots of awscli or kubectl or curl commands and right now that means 800+ misspelled words

facelessuser · 2021-08-04T15:10:00Z

Are you spell checking raw Markdown or are you converting it to HTML and then checking the HTML?

Often, I convert Markdown to HTML and then use CSS selectors to ignore code blocks and such. It is often easier for me to filter things in HTML. That is my general recommendation, but there are potentially other ways. PySpelling, which is used to filter the content, has a Markdown filter which basically converts the Markdown content to HTML. The HTML filter allows you to filter out tags with selectors. Granted, you must enable an extension to handle fenced code properly though as fenced code is not part of the spec (old school spec, I'm not sure about CommonMark).

The shipped extension that handles Markdown uses Python Markdown, which is an old school Markdown parser (not a CommonMark parser). For me, that is more than sufficient for my needs as the documentation I am often parsing also uses Python Markdown when I publish the documentation. If CommonMark is a requirement, a 3rd party extension can surely be created .

I can't really answer further without out knowing how you are attempting to spell check your Markdown.

supernovae · 2021-08-04T20:11:17Z

Thanks for the quick reply!

I'm just diving into this extension and trying to get it to work - my repo is here: https://github.com/supernovae/documentation and the config is here: https://github.com/supernovae/documentation/blob/main/.github/config/.spellcheck.yml

I noticed when the action is run, all of the code looks like its in

HTML - poking around to see what I can do... basically just want to ignore the bash code in markdown, but obviously its no longer markdown when It checks the html

facelessuser · 2021-08-04T20:35:44Z

@supernovae Okay, so a couple of things. I checked out the repo you pointed me at and attempted to run PySpelling. Now, just as an FYI, while I do follow this repo because it relies on PySpelling, I do not actually use this action. As I am very familiar with Python, I set up my own action directly using Python in my CI environments, I also am used to testing it locally as I am also the author of Pyspelling. So I do not use this action and the docker image wrapper it relies on. Not only that, I am the author of PySpelling, and am comfortable running it directly on my local machine to test and debug, which is what you are going to see below.

The config doesn't quite seem right. You can see the error below. My immediate thought is that if the action is running this and not throwing an error, maybe something is occurring in the action that suppresses this? Maybe the action is somehow preprocessing the config, losing the malformed content, and passing it to pyspelling. In this case, the malformed content is the ignore options and such.

➜  documentation git:(main) python3 -m pyspelling --config .github/config/.spellcheck.yml
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/Cellar/python@3.9/3.9.4/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 83, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 26, in main
    return run(
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 51, in run
    for results in spellcheck(
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 673, in spellcheck
    for result in spellchecker.run_task(task, source_patterns=sources):
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 311, in run_task
    self._build_pipeline(task)
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 255, in _build_pipeline
    raise ValueError(STEP_ERROR.format(str(step)))
ValueError: Pipline step in unexpected format: {'pyspelling.filters.html': None, 'comments': False, 'ignores': ['code', 'pre']}

So I fixed it by adding the appropriate indentation to the HTML filter's options:

matrix:
- name: Markdown
  aspell:
    ignore-case: true
    lang: en
  dictionary:
    wordlists:
    - .github/config/.wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

Now I got a proper run, and I see no misspelling in code blocks:

➜  documentation git:(main) ✗ python3 -m pyspelling --config .github/config/.spellcheck.yml
Misspelled words:
<htmlcontent> docs/misc/references.md: html>body>ul>li
--------------------------------------------------------------------------------
Responsiblity
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
--------------------------------------------------------------------------------
AzureFirewallSubnet
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
--------------------------------------------------------------------------------
VirtualAppliance
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/private-cluster.md: html>body>ol>li>p
--------------------------------------------------------------------------------
apiserverProfile
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/private-cluster.md: html>body>h2
--------------------------------------------------------------------------------
Adendum
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/egress-ipam-operator.md: html>body>ol>li>p
--------------------------------------------------------------------------------
kubeadmin
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>h1
--------------------------------------------------------------------------------
Diasaster
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>ol>li
--------------------------------------------------------------------------------
failback
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>ol>li
--------------------------------------------------------------------------------
Avoidence
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
--------------------------------------------------------------------------------
Availabily
focussed
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
--------------------------------------------------------------------------------
resiliance
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
--------------------------------------------------------------------------------
excercise
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/disaster-recovery/README.md: html>body>p
--------------------------------------------------------------------------------
eachother's
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Prequsites
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Preperation
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>blockquote>p
--------------------------------------------------------------------------------
indepth
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Queryier
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/federated-metrics/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
datasource
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/federated-metrics/user-defined.md: html>body>blockquote>p
--------------------------------------------------------------------------------
indepth
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/federated-metrics/user-defined.md: html>body>ol>li>p
--------------------------------------------------------------------------------
datasource
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/astronomer/README-public.md: html>body>ol>li>p
--------------------------------------------------------------------------------
HNuZ
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/astronomer/README-public.md: html>body>h2
--------------------------------------------------------------------------------
SCCs
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/astronomer/README-public.md: html>body>ol>li>p
--------------------------------------------------------------------------------
securityContext
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
dropdown
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
HNuZ
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>h2
--------------------------------------------------------------------------------
SCCs
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>p
--------------------------------------------------------------------------------
securityContext
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aro/astronomer/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
hacky
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/quickstart-rosa.md: html>body>h2
--------------------------------------------------------------------------------
Walkthrough
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/o11y/az-log-analytics.md: html>body>h1
--------------------------------------------------------------------------------
Analytics
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/o11y/az-log-analytics.md: html>body>p
--------------------------------------------------------------------------------
analytics
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/o11y/openshift-logging.md: html>body>ol>li>p
--------------------------------------------------------------------------------
recieving
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/quickstart-aro.md: html>body>h2
--------------------------------------------------------------------------------
Walkthrough
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/quickstart-aro.md: html>body>h2
--------------------------------------------------------------------------------
Adendum
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
--------------------------------------------------------------------------------
CNAME's
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
--------------------------------------------------------------------------------
enviroment
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
--------------------------------------------------------------------------------
homev
wafv
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>p
--------------------------------------------------------------------------------
importwizard
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/cloud-front.md: html>body>ol>li>ul>li
--------------------------------------------------------------------------------
referer
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/alb.md: html>body>blockquote>p
--------------------------------------------------------------------------------
premiumsupport
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
--------------------------------------------------------------------------------
sigs
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
--------------------------------------------------------------------------------
homev
wafv
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/alb.md: html>body>ol>li>p
--------------------------------------------------------------------------------
uncomment
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/README-complex.md: html>body>p
--------------------------------------------------------------------------------
CTONET
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/aws/waf/README-complex.md: html>body>blockquote>p
--------------------------------------------------------------------------------
premiumsupport
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/demos/gitops/README.md: html>body>h2
--------------------------------------------------------------------------------
Walkthrough
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/demos/gitops/README.md: html>body>ul>li
--------------------------------------------------------------------------------
Kustomize
kam
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Prequsites
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Preperation
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>blockquote>p
--------------------------------------------------------------------------------
indepth
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>h2
--------------------------------------------------------------------------------
Queryier
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/README.md: html>body>ol>li>p
--------------------------------------------------------------------------------
datasource
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/user-defined.md: html>body>blockquote>p
--------------------------------------------------------------------------------
indepth
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/federated-metrics/user-defined.md: html>body>ol>li>p
--------------------------------------------------------------------------------
datasource
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/sts-with-private-link/README.md: html>body>p
--------------------------------------------------------------------------------
plublic
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/sts-with-private-link/README.md: html>body>blockquote>p
--------------------------------------------------------------------------------
ARNs
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/rosa/sts/README.md: html>body>blockquote>p
--------------------------------------------------------------------------------
ARNs
--------------------------------------------------------------------------------

Misspelled words:
<htmlcontent> docs/ocp/common-images-namespace/README.md: html>body>p
--------------------------------------------------------------------------------
combersome
--------------------------------------------------------------------------------

!!!Spelling check failed!!!

facelessuser · 2021-08-04T20:37:16Z

So, that is how to fix your issue with PySpelling. Why is the action not straight-up failing with the malformed YAML? That I do not know and maybe something that this action repo may need to look into.

I assume that fixing the config will fix your issue, but I have not run your repo through the action in this repo.

facelessuser · 2021-08-04T20:49:05Z

Let me rephrase, I'm not seeing any misspellings in HTML code blocks. I see that you have not enabled any fenced code extensions, so give me a minute.

facelessuser · 2021-08-04T20:55:37Z

It must be converting the fenced code block into inline code blocks as you have no fenced extension enabled. But since inline code is ignored as well, it works just fine, so 🤷🏻 .

supernovae · 2021-08-04T23:39:01Z

I fixed the YAML formatting and 0'd out the wordlist and now I'm back up to 6000+ lines of output on spell-check. I had a fairly large wordlist in my branch I shared, if you 0 out the .github/config/.wordlist.txt you may replicate the output.

supernovae · 2021-08-04T23:40:01Z

The majority of the words failing are in a markdown ````code` block

facelessuser · 2021-08-05T04:01:03Z

@supernovae Okay, there are a couple of things here. I zeroed out the wordlist, just as you said you were doing. I found no actual actual pre block getting spell checked.

Now, I do not think I am running what you are running as your results mentioned docs/aro/clf-to-zure/README.md which is not found on the master branch of the repo you pointed me at, so I cannot test exactly what you are. This also means, I'm not confident the config I'm debugging is even the same as yours. So I cannot debug further without knowing I'm testing exactly what you are.

I got no misspellings in pre blocks until I enabled pymdownx.superfences. This makes sense as I imagine Python Markdown was parsing all of your fenced code blocks as inline code instead of fenced blocks. Once I did, I did see some, but the content was not actually in the code blocks even though the context reported it as such.

<li>
<p>Use <code>kubectl</code> to apply the <code>bgd-app.yaml</code> file
    <div class="highlight"><pre><span></span><code>kubectl apply -f documentation/modules/ROOT/examples/bgd-app/bgd-app.yaml
</code></pre></div>
    &gt;The bgd-app.yaml file defines several things, including the repo location for the <code>gitops-bgd-app</code> application<br>
    <img alt="screenshot of bgd-app-yaml" src="./bgd-app-yaml.png" /></p>
</li>

results

b'>The bgd-app.yaml file defines several things, including the repo location for the application '
Context: docs/demos/gitops/README.md: html>body>ol>li>div>pre
Misspelled words:
<htmlcontent> docs/demos/gitops/README.md: html>body>ol>li>div>pre
--------------------------------------------------------------------------------
bgd
repo
yaml
--------------------------------------------------------------------------------

So the context reporting had a bug, which I fixed locally.

b'>The bgd-app.yaml file defines several things, including the repo location for the application '
Context: docs/demos/gitops/README.md: html>body>ol>li
Misspelled words:
<htmlcontent> docs/demos/gitops/README.md: html>body>ol>li
--------------------------------------------------------------------------------
bgd
repo
yaml
--------------------------------------------------------------------------------

In short, I am not seeing it incorrectly parsing content within code blocks, though I did see some false reporting which I will have fixed in the next PySpelling. If yours is truly doing showing words in code blocks after specify that PySpelling should ignore them, there is something off in your config, but I cannot verify this as I don't think I'm even testing the same branch you are.

Additonally, while I think Python Markdown may parse your Markdown "good enough" to spell check (after enabling a fenced code extension), I do see that it doesn't quite parse everything exactly right as it has some expecations regarding formatting that some parsers (like CommonMark parsers) do not. I won't comment further on this as it appers to do well enough to avoid code blocks and such which appears to be the main concern.

Based on your results vs mine, I still think you have a config issue. I see  being targeted which makes no sense if you turned off comments. And comments appearing has nothing to do with the eariler mentioned bug. I think your config is still not right, but I cannot confirm as I don't even know what branch you are actually testing.

facelessuser · 2021-08-07T04:29:33Z

Reporting bug has been fixed and deployed

facelessuser · 2021-08-11T19:24:28Z

So, it's been about a week. I figure you've solved this as I haven't seen any new info to help debug this further. I know this action is still using an older PySpelling, so once updated, it shouldn't see the wrong context for HTML elements.

jonasbn · 2021-08-12T17:19:42Z

This is planned to be included in the upcoming 0.15.0 release.

byronmiller · 2021-08-13T01:06:55Z

Thanks for the updates! Sorry, been busy with work. I look forward to trying this out and again, I really appreciate your help here!

facelessuser · 2021-08-13T12:57:39Z

@byronmiller No worries. I was just following up to make sure there are no known bugs in PySpelling. These issues stay fresh in my mind for maybe a week (especially when they are filed on 3rd party repositories), then I'll forget about them unless I am pinged again 🙃. So, take your time.

Hopefully, when you get back to this, the issue turns out to just be a config use issue. I've fixed the only issue I could find in PySpelling, and it was mainly cosmetic.

jonasbn · 2021-08-15T10:34:40Z

Version 0.15.0 has just been uploaded to the marketplace

jonasbn · 2021-08-15T11:21:52Z

Hi @supernovae

I experienced a weird issue with release 0.15.0, so I mad a hotfix to patch the issue, so you should have a look at release 0.16.0 instead,

facelessuser mentioned this issue Aug 5, 2021

Fix XML context reporting facelessuser/pyspelling#144

Merged

jonasbn self-assigned this Aug 12, 2021

jonasbn added the bug Something isn't working label Aug 12, 2021

jonasbn added this to the 0.15.0 milestone Aug 12, 2021

jonasbn mentioned this issue Aug 15, 2021

Dependencies and docker multi stage #54

Merged

jonasbn closed this as completed Aug 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore markdown regions with code hints #53

Ignore markdown regions with code hints #53

supernovae commented Aug 4, 2021 •

edited

Loading

facelessuser commented Aug 4, 2021

supernovae commented Aug 4, 2021

facelessuser commented Aug 4, 2021

facelessuser commented Aug 4, 2021

facelessuser commented Aug 4, 2021

facelessuser commented Aug 4, 2021

supernovae commented Aug 4, 2021

supernovae commented Aug 4, 2021

facelessuser commented Aug 5, 2021

facelessuser commented Aug 7, 2021

facelessuser commented Aug 11, 2021

jonasbn commented Aug 12, 2021

byronmiller commented Aug 13, 2021

facelessuser commented Aug 13, 2021

jonasbn commented Aug 15, 2021

jonasbn commented Aug 15, 2021

Ignore markdown regions with code hints #53

Ignore markdown regions with code hints #53

Comments

supernovae commented Aug 4, 2021 • edited Loading

facelessuser commented Aug 4, 2021

supernovae commented Aug 4, 2021

facelessuser commented Aug 4, 2021

facelessuser commented Aug 4, 2021

facelessuser commented Aug 4, 2021

facelessuser commented Aug 4, 2021

supernovae commented Aug 4, 2021

supernovae commented Aug 4, 2021

facelessuser commented Aug 5, 2021

facelessuser commented Aug 7, 2021

facelessuser commented Aug 11, 2021

jonasbn commented Aug 12, 2021

byronmiller commented Aug 13, 2021

facelessuser commented Aug 13, 2021

jonasbn commented Aug 15, 2021

jonasbn commented Aug 15, 2021

supernovae commented Aug 4, 2021 •

edited

Loading