Skip to content

Commit 4ed5233

Browse files
committed
style(lint): go away linter
Signed-off-by: Laura Santamaria <nimbinatus@users.noreply.github.com>
1 parent c4b8e31 commit 4ed5233

File tree

2 files changed

+16
-14
lines changed

2 files changed

+16
-14
lines changed

.spellcheck-en-custom.txt

+2
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ backend
1515
backends
1616
benchmarking
1717
Bhandwaldar
18+
bikeshedding
1819
brainer
1920
Cappi
2021
checkpointing
@@ -214,6 +215,7 @@ SaaS
214215
safetensor
215216
safetensors
216217
Salawu
218+
Santamaria
217219
scalable
218220
SDG
219221
sdg

docs/taxonomy-revamp-2025.md

+14-14
Original file line numberDiff line numberDiff line change
@@ -8,22 +8,22 @@ status: proposed
88

99
Our taxonomy tree structure and knowledge/skill file structure was designed with upstream taxonomy submissions in mind. An end user working with a taxonomy locally using InstructLab has to follow all of those requirements, increasing complexity of their work.
1010

11-
The end user typically gets hung up on where to place the file in a massive file tree where sub-branches are not defined. For someone not working in the upstream taxonomy, this is basically bikeshedding[^bikeshed]. The only requirement for the SDG process is sorting things into `knowledge` and `skills`.
11+
The end user typically gets hung up on where to place the file in a massive file tree where sub-branches are not defined. For someone not working in the upstream taxonomy, this is basically bikeshedding[^bike shed]. The only requirement for the SDG process is sorting things into `knowledge` and `skills`.
1212

1313
The user experience of working with the `qna.yaml` file is poor for a handful of reasons:
1414

1515
- Most of the fields in the `qna.yaml` file are unnecessary save for use in the upstream taxonomy.
1616
- YAML is a notoriously complex, loose format with a lot of potholes.
1717
- YAML files of different specifications parse completely differently (e.g., 1.2 vs 1.1).
1818
- Note that PyYAML, our base tool [^tooling], parses YAML 1.1, not 1.2. There is a long way to go[^PyYAML] to support 1.2, which has been the latest spec since 2009. As such, even if someone were to search the Internet for a solution because they are not familiar with YAML, they likely will stumble across 1.2 solutions that don't work for 1.1.
19-
- There are at least 9 different ways to indicate a multi-line string in YAML[^9ways], depending on which block scalar indicator[^blockscalar] is used and which block chomping indicator[^blockchomping] is used (this does **not** count the indentation indicator[^blockindentation]!). Then there are double-quoted flow scalar multilines[^doublequotedflowscalar] and single-quoted flow scalar multilines[^singlequotedflowscalar], which can cause more problems.
19+
- There are at least 9 different ways to indicate a multi-line string in YAML[^9 ways], depending on which block scalar indicator[^block scalar] is used and which block chomping indicator[^block chomping] is used (this does **not** count the indentation indicator[^block indentation]!). Then there are double-quoted flow scalar multilines[^double quoted flow scalar] and single-quoted flow scalar multilines[^single quoted flow scalar], which can cause more problems.
2020
- The linting system, intended to ensure the YAML file is readable by the SDG process, adds more burden on the non-technical user.
2121
- The linter for YAML enforces an 80-character line length by default. That makes sense if you're working on code read from a terminal, but not to a typical end user used to working with rich text editors for a reading comprehension experience working with paragraphs.
2222
- The linter also complains about trailing whitespace, another common thing that the typical end user won't understand why everything is failing.
2323

2424
From a code perspective,
2525

26-
- We are already using JSON in the datamixing process in SDG[^datamixing].
26+
- We are already using JSON in the data mixing process in SDG[^data mixing].
2727
- Docling also exports JSON as input and output[^docling].
2828
- JSON is also much more friendly to UI work, which is a primary path we would like people to use.
2929

@@ -45,13 +45,13 @@ The process of writing question and answer sets also is more like writing readin
4545
- Write documentation and tutorials based on existing tutorials on writing reading comprehension questions and example answers for standardized exams.
4646
- Most people can understand reading to learn versus learning to read type questions. The new, streamlined schema that matches the most simple needs could help here along with a solid set of docs and tutorials on how to write reading comprehension sets. We could borrow heavily from the standard tutorials for writing standardized exams that are out there for free and already battle-tested.
4747

48-
[^bikeshed]: The story of the bikeshed is a common metaphor. The story goes that a group that is working on the approvals for the construction plan of a nuclear power plant gets stuck on what color to paint the bike shed at one of the entrances to the plant. Mutliple meetings are scheduled to hash out the issue of the color of the bike shed, with heated arguments. However, the rest of the plan for the power plant is not examined in detail or critiqued. People have an easier time evaluating and having an opinion on something that is as trivial as a bike shed's color when faced with complex decisions on other systems. https://en.wiktionary.org/wiki/bikeshedding
49-
[^9ways]: You can experience this issue in action with the interactive experience on https://yaml-multiline.info/.
50-
[^blockscalar]: https://yaml.org/spec/1.2.2/#81-block-scalar-styles
48+
[^bike shed]: The story of the bikeshed is a common metaphor. The story goes that a group that is working on the approvals for the construction plan of a nuclear power plant gets stuck on what color to paint the bike shed at one of the entrances to the plant. Multiple meetings are scheduled to hash out the issue of the color of the bike shed, with heated arguments. However, the rest of the plan for the power plant is not examined in detail or critiqued. People have an easier time evaluating and having an opinion on something that is as trivial as a bike shed's color when faced with complex decisions on other systems. [Wiktionary entry](https://en.wiktionary.org/wiki/bikeshedding)
49+
[^9 ways]: You can experience this issue in action with the interactive experience on [yaml-multiline.info](https://yaml-multiline.info/).
50+
[^block scalar]: [YAML Spec v1.2.2 on block scalar styles](https://yaml.org/spec/1.2.2/#81-block-scalar-styles)
5151
> YAML provides two block scalar styles, literal and folded. Each provides a different trade-off between readability and expressive power.
52-
[^blockchomping]: https://yaml.org/spec/1.2.2/#8112-block-chomping-indicator
52+
[^block chomping]: [YAML Spec v1.2.2 on block chomping indicators](https://yaml.org/spec/1.2.2/#8112-block-chomping-indicator)
5353
> Chomping controls how final line breaks and trailing empty lines are interpreted. YAML provides three chomping methods:
54-
[^blockindentation]: https://yaml.org/spec/1.2.2/#8111-block-indentation-indicator
54+
[^block indentation]: [YAML Spec v1.2.2 on block indentation indicators](https://yaml.org/spec/1.2.2/#8111-block-indentation-indicator)
5555
> Every block scalar has a content indentation level. The content of the block scalar excludes a number of leading spaces on each line up to the content indentation level.
5656
>
5757
> If a block scalar has an indentation indicator, then the content indentation level of the block scalar is equal to the indentation level of the block scalar plus the integer value of the indentation indicator character.
@@ -63,11 +63,11 @@ The process of writing question and answer sets also is more like writing readin
6363
>It is an error for any of the leading empty lines to contain more spaces than the first non-empty line.
6464
>
6565
>A YAML processor should only emit an explicit indentation indicator for cases where detection will fail.
66-
[^doublequotedflowscalar]: https://yaml.org/spec/1.2.2/#double-quoted-style
66+
[^double quoted flow scalar]: [YAML Spec v1.2.2 on the double-quoted flow scalar](https://yaml.org/spec/1.2.2/#double-quoted-style)
6767
> In a multi-line double-quoted scalar, line breaks are subject to flow line folding, which discards any trailing white space characters. It is also possible to escape the line break character. In this case, the escaped line break is excluded from the content and any trailing white space characters that precede the escaped line break are preserved. Combined with the ability to escape white space characters, this allows double-quoted lines to be broken at arbitrary positions.
68-
[^singlequotedflowscalar]: https://yaml.org/spec/1.2.2/#single-quoted-style
68+
[^single quoted flow scalar]: [YAML Spec v1.2.2 on the single-quoted flow scalar](https://yaml.org/spec/1.2.2/#single-quoted-style)
6969
> In addition, it is only possible to break a long single-quoted line where a space character is surrounded by non-spaces. [...] All leading and trailing white space characters are excluded from the content. Each continuation line must therefore contain at least one non-space character. Empty lines, if any, are consumed as part of the line folding.
70-
[^datamixing]: stuff
71-
[^docling]: https://ds4sd.github.io/docling/supported_formats/ notes docling supports JSON-serialized Docling Documents and Markdown as input and JSON and Markdown as outputs.
72-
[^PyYAML]: https://github.com/yaml/pyyaml/issues/486
73-
[^tooling]: https://github.com/instructlab/schema/blob/main/pyproject.toml#L27-L30 and
70+
[^data mixing]: stuff
71+
[^docling]: [The Docling documentation](https://ds4sd.github.io/docling/supported_formats/) notes docling supports JSON-serialized Docling Documents and Markdown as input and JSON and Markdown as outputs.
72+
[^PyYAML]: [yaml/pyyaml#486](https://github.com/yaml/pyyaml/issues/486)
73+
[^tooling]: [Our tooling dependencies](https://github.com/instructlab/schema/blob/main/pyproject.toml#L27-L30)

0 commit comments

Comments
 (0)