You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/taxonomy-revamp-2025.md
+3-2
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ The user experience of working with the `qna.yaml` file is poor for a handful of
15
15
- Most of the fields in the `qna.yaml` file are unnecessary save for use in the upstream taxonomy.
16
16
- YAML is a notoriously complex, loose format with a lot of potholes.
17
17
- YAML files of different specifications parse completely differently (e.g., 1.2 vs 1.1).
18
-
- Note that PyYAML, our base tool, parses YAML 1.1, not 1.2. There is a long way to go[^PyYAML] to support 1.2, which has been the latest spec since 2009. As such, even if someone were to search the Internet for a solution because they are not familiar with YAML, they likely will stumble across 1.2 solutions that don't work for 1.1.
18
+
- Note that PyYAML, our base tool[^tooling], parses YAML 1.1, not 1.2. There is a long way to go[^PyYAML] to support 1.2, which has been the latest spec since 2009. As such, even if someone were to search the Internet for a solution because they are not familiar with YAML, they likely will stumble across 1.2 solutions that don't work for 1.1.
19
19
- There are at least 9 different ways to indicate a multi-line string in YAML[^9ways], depending on which block scalar indicator[^blockscalar] is used and which block chomping indicator[^blockchomping] is used (this does **not** count the indentation indicator[^blockindentation]!). Then there are double-quoted flow scalar multilines[^doublequotedflowscalar] and single-quoted flow scalar multilines[^singlequotedflowscalar], which can cause more problems.
20
20
- The linting system, intended to ensure the YAML file is readable by the SDG process, adds more burden on the non-technical user.
21
21
- The linter for YAML enforces an 80-character line length by default. That makes sense if you're working on code read from a terminal, but not to a typical end user used to working with rich text editors for a reading comprehension experience working with paragraphs.
@@ -69,4 +69,5 @@ The process of writing question and answer sets also is more like writing readin
69
69
> In addition, it is only possible to break a long single-quoted line where a space character is surrounded by non-spaces. [...] All leading and trailing white space characters are excluded from the content. Each continuation line must therefore contain at least one non-space character. Empty lines, if any, are consumed as part of the line folding.
70
70
[^datamixing]: stuff
71
71
[^docling]: https://ds4sd.github.io/docling/supported_formats/ notes docling supports JSON-serialized Docling Documents and Markdown as input and JSON and Markdown as outputs.
0 commit comments