Skip to content

proposal: x/tools/cmd/godoc: GORDO enriched Go documentation format. #35947

Closed
@ohir

Description

@ohir

Proposal: GORDO enriched Go documentation format.

Author: Ohir Ripe [Wojciech S. Czarnecki]

Last updated: 2019/01/24

Discussion at https://golang.org/issue/35947

Related to: #7873, #16666, #35896, #18342, #25444 and other "rich format please" issues.

Abstract

GORDO (dʒɔrˈdo) stands for GO Rich DOcs

This proposal is a try to make godoc ecosystem robust enough to be a single documentation method that can serve also end-user programs and production services.

Background

Current state of Go's source documentation processing is good enough for documenting single implemented things, ie. functions, variables, constants. It falls short if one must convey a new idea, an unobvious implementation of an algorithm, or even just describe a sequence of events (no lists, sadly).

Godoc heuristic does not allow to keep overall (package) docs close to the source, as parts of docs from different files are merged in the lexical order of the source filenames. This makes almost impossible to document a chunk of API in the very file that defines it. (This proposal tackles this with "refid" identifiers that can be put on documentation parts then used to provide merging order and in-text references.)

Proposal

I propose using a lightweight annotations that allow plain text documentation to have styling and structure hints added by the author. Gordo annotations use 11 non-ascii characters that can be entered as ascii digraphs led by a semicolon:

 ┌───────────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬───┐
  character:    ˘    ´    ¨    ˉ    °    «    »    þ    ¶    §    •  esc
    digraph:   ;b   ;/   ;'   ;-   ;.   ;[   ;]   ;t   ;p   ;s   ;l   ;;
 └───────────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴───┘

(Users accustomed to chords may configure translation via a GORDOIC environment variable. See previous revisions for elaborate description of avaliable entry methods.)

Translation is done by the gofmt, then godoc recognizes and interprets these 11 characters according to specification laid out hereafter.

styling

 °  degree         °escape || back to normal   aka "dismiss" char
 ´  acute          ´italics´       ´italics°   𝑖𝑡𝑎𝑙𝑖𝑐𝑠
 ¨  diaeresis      ¨bold¨             ¨bold°   𝐛𝐨𝐥𝐝
 ˘  breve          ˘ibold˘    ˘bold+italics°   𝒃𝒐𝒍𝒅-𝒊𝒕𝒂𝒍𝒊𝒄𝒔
 ˉ  macron         ˉfixedˉ           ˉfixedˉ   fixed width span
 «» guillemets     «notable or related text»   p͜a͜y͜ ͜a͜t͜t͜e͜n͜t͜i͜o͜n͜ span

An emphasis (styled text) begins after either acute, diaeresis, or breve character - none followed by a degree - and ends at a breve, acute, or diaeresis of the other emphasis' start, or this emphasis stop. It ends also at a macron, at a left guillemet, or at a degree "dismiss" character. The 'fixed' and 'notable' spans begin and end only with their respective special characters so other three emphases can be used inside. An empty line ends all running emphases and spans.

Editing software may apply styles while keeping the syntax visible. In the final form a style is applied and syntax characters are hidden.

accessibility

For the screen-readers usage document author can make a style to convey a semantics hint.
Aria labels are introduced in the form of a short list with items starting at bullet-style digraphs.

In this document styles mean:

   •´ cited from other text´
   •¨ endpoint name¨
   •˘ call parameter˘
   •ˉ codeˉ

Seeing users will see this rendered as bulleted list with styled items, not-seeing will hear either a label text or audible hint when reader enters into labelled region. Note that regions are marked in the source, hence accessibility tools will be more useful at the terminal, too.

the refid

A short string identifier that can be attached to a section, paragraph, or quotable span:

 §  section        quotable section head   §(refid)
 ¶  pilcrow        quotable paragraph lead ¶(refid)
 »  rguillemet             « quotable span »(refid)

Refid strings are used to identify parts of the main documentation that can then be referenced elsewhere. Refid tagged part can then be quoted, linked to (in html output), and searched for by the go doc tool. Refids should not resemble godoc-searchable identifiers of the package's code, as go doc tool should allow to display a part of documentation pointed to by a refid. Refids should be short but informative.

structure

 «' lguillemet     quote here a text span, heading or item:
                       «'refid'    'quote in apostrophes'
                       «"refid"    "quote in double quotes"
                       «(refid)     use no quote characters

The «"refid" quote an internal link token always outputs its target's text put between quotation marks as seen after the «, or without if parenthesized «(refid) form was used. Console output always prints the refid in parentheses after the quotation, Html version outputs quoted text as a link to the place of origin instead. Eg. the source of:

    Annolex Editor  §(Sect 2)
    ... Please read «"Sect 2" for the primer.

 should output on the console:

    Annolex Editor (Sect 2)
    ... Please read "Annolex Editor" (Sect 2) for the primer.

 but in html it is expected to output a link:

    ✻ Annolex Editor
    ... Please read "͟A͟n͟n͟o͟l͟e͟x͟ ͟E͟d͟i͟t͟o͟r" for the primer.

lists

 •  bullet         •  bulleted list item
 •a                a) lettered list item
 •1                1. numbered list item
 þ  thorn        see: link/url list item
  • List items need to be given without blank lines inbetween.
  • List ends at an empty line as any other gordo introduced styling.
  • List items are recognized as such even if user-indented.
  • Console output imposes uniform indentation of lists.
  • Gofmt may impose uniform indentation of consecutive list items in the source.
    (Other gordo processors may allow for nesting though).
  • List item start (bullet or thorn) is recognized as such only if placed as the first printable in a line and followed by a space.

external links

 »þ        « link description »þ          // text description of
             þ somesite.tld/path/tolink   // an url listed below

External links are introduced via the « note ending in a »þ digraph. The url path — without protocol — must be given as an url list item (þ) in the last line of the paragraph. This line can be indented. Up to three »þ references can be present in a single paragraph, then all their respective url paths are given in separate lines below:

  in our «IEEE-ITSS Open Journal »þ and also on « our faculty »þ site.
     þ www.ieee-itss.org/oj-its
     þ www.ivt.ethz.ch

The final form of the output, including hypertext protocol used, is defined by the gordo processor. This specification only mandates that the plain text renderer — if used at all — removes gordo special characters and any superfluous space left after this removal — including spaces following the « of notable or link description span. Also, links rendered under the sentence should be given numerical indice and be prefixed with protocol:

  in our IEEE-ITSS Open Journal¹ and also on our faculty² site.
     ¹ https://www.ieee-itss.org/oj-its
     ² https://www.ivt.ethz.ch

Gordo processor can be configured on public www sites to render external links as indexed plain text urls to prevent link-spam.

table of contents, in order

Manual TOC is introduced either by a heading that starts with the "TOC" string, or one that have the "toc" refid set:

TOC — Table of Contents
Sisällysluettelo §(toc)

Manual TOC entries, in the form of •§ or •¶ digraphs follwed by a refid, are used to provide a display order. This allow documentation parts to be written close to the relevant code. Any section or paragraph not listed in a manual TOC is added at the end of generated TOC under the "Misc" top level heading.

   •§ refid         // a section head,    at the main level
   •¶ refid         // a paragraph lead,  at a subsection level
   •¶ "with spaces" //   use quotes if refid contains space

The rest of the line after refid is reserved for documentation housekeeping.

TOC list needs not to be consecutive. It is ok to have subheadings or even a paragraphs of text between parts of the list. (Eg. to have TOC divided by "experimental", "staged", "stable", and "deprecated" headings. Then docs maintainer may simply move a toc line between sections to mark its current
status
.)

The TOC imposing order on dispersed chunks of documentation is the crux of this proposal

With this implemented a documentation maintainer can be a separate role, and her edits go to the single file while many individual developers may write docs for their code only. Structure, distinguished spans and refids all are means for that ultimate goal. Styling is just a useful byproduct. One that completes the professional documentation process.

docs housekeeping

This should be a subject of other proposal but is provided here to explain reserved space of the toc-line.

During gofmt processing of the file that contains the TOC, toc lines are amended with a relative path to the file where refid was declared, a hash of code, and hash of related doc-comment. These hashes and paths are then checked by the local godoc instance. If (computed now) hash of code does not match one in the toc, and (computed now) hash of the doc-comment still matches, it is a strong signal that documentation diverged from the code (code was edited but its documentation was not). Generated output may then inform reader that documentation is possibly outdated.

toc-bar

A lone section heading with refid of "toc-bar C" will output (html) TOC as a block separated by the character C. Eg. §(toc-block ⬩) for this document would produce:

AbstractBackgroundProposalRationaleCompatibilityImplementationOpen issuespost scriptum

Order of the bar items is set by the §(toc) section.

console -toc

TOC and "toc-bar" sections are elided from the go doc -all tool output. The separate -toc flag lists all refids, and these refids can be used to select appropriate part of main documentation to show. Refids of places normally are printed in parentheses on the console, so user can follow them in the next invocation of go doc tool. Where output format allows for hypertext (linking), the manual TOC entries should be displayed though.

escapes

  • Doubled semicolon lead is always translated to a single dismiss that
    immediately disables translation of a next digraph:
    ;;;; => °;;, ;;;. => °;.
  • Any special character doubled is ordinary: As bolded ¨under 20°°C¨
  • One or more special characters following a dismiss character are ordinary:
    single macron: °ˉ, a digraph °»þ, or superiors °¹²³.
  • The "escape" function of dismiss character has higher priority than "end of style":
    ¨bolded °«¶ digraph¨
  • Degree character that has nothing to dismiss or escape is ordinary.
  • Degree character does not output if it has already been used to dismiss or escape.

Of all possible gordo "specials":

   °    ´    ¨    ˘    ˉ    «    »    •    þ    ¶    §   ´   ¨   ˘   ˉ   •   þ
  ;.   ;/   ;'   ;b   ;-   ;[   ;]   ;l   ;t   ;p   ;s   ¹   ²   ³   ¦   ¤   …
  •1   •a   •¶   •§   «'   «(   «"   «.   »þ   ¶(   §(  •´  •¨  •˘  •ˉ  »(

only guillemots, and superior numbers must be escaped, and degree — if styled. Other escapes are unlikely to be needed except for gordo-related docs.

Items of • ¤ … þ need escape only if are first, and are followed by a space. Section and paragraph out of their digraphs are ordinary. The Icelandic þ never may come before space, and the Old English script is not common in technical docs. Nor gordo digraphs are used in natural languages. None ascii digraphs are of valid Go code, too. It leaves: the styled degree, guillemots, and superior numbers ¹²³.

The «. digraph itself is an escape for a notable span that must start with one of "'(. Use two dots for span that should begin with a dot: «.. dot leaded notable span».

Rationale

Documentation that can be styled even with only bold and italics, and one that can be structured to fit the domain, may help package authors to be more precise and unambigous, and help documentation consumers to avoid misunderstandings. Now Go packages of just middle complexity often resort to external descriptions of their algorythms and api.

Not because their authors love to use yet other doc tools and are eager to do chores with keeping it synchronized. It is for the (lack of) godoc capabilities that restrict godoc uses to the standard libs. Or at best to the general-purpose Go libraries consumed by other Go code. Just for a lack of rudimentary emphases godoc-compliant documentation sources cannot be used to create user-facing documentation if said user is expected to be not a Go programmer.

This needs to change, as Go now is used to build really huge systems. End-users — admins and api-consuming developers — need documentation that is easy to browse and reflects all changes made to the just staged product.

Gordo allows package level documentation to be kept close to the code it describes and gives the author more control as to its shape and placement of its parts. This should ease us to maintain a well structured documentation being placed at the most relevant file and updated as related code changes.

Compatibility

Gordo uses no semantic constructs that can be mistaken for a technical text written in any language — neither natural nor formal. Out of all gordo "specials" only a few seldomly used non-ascii characters — degree, guillemots and three superscript numbers — may need to be escaped.

Nonetheless, as this proposal extends documentation source syntax, and this syntax parsing methods, there is a miniscule but non-zero possibility that gordo translation step may alter the visible html output of some existing documentation.

Even if this would happen, such a change would likely effect in the font decoration or size and would not affect the meaning.

Implementation

Enabling gordo annotations would need support from both gofmt and godoc. While implementation of basic formatting could be trivial, the real power of the proposed format and methods lie in the ability to make documentation both easy to skim at console and useable as an interactive manual in the browser. The last one needs working internal links between "quotable" and "quote" places implemented as well. Implementing this might need more resources, as implementing the toc-based documentation checks might too. But this work may benefit Go ecosystem as a whole and allow us to keep a single source of truth for both external (eg. grpc) api and for the code implementing it.

post scriptum

Someone whom I respect confessed recently:

I remember thinking that changing fmt.printf to fmt.Printf in my code was ugly, or at least jarring: to me, fmt.Printf didn’t look like Go, at least not the Go I had been writing. [...] I got used to it, and now it is fmt.printf that doesn’t look like Go to me.

Gordo may look unusual at first sight but I hope for its syntax to be regarded comfortable soon. Unlike styling syntax of markdown, and other markups used only to generate html, gordo stylings are barely noticeable in source, unless reader is wilfully scanning for the formatting hints. Structure annotations converse: are concise but stand out on the console.


Revisions

  • r2 [16 December 2019]
    • make ¹²³ as styling surrogates with default GORDOIC=us map enabling many
      national layouts' users to type gordo styling without learning new chords.
    • fix section/paragraph swap (US/EU differences kicked in)
    • explain that authors need almost no characters escaping
    • escape by prefix, so parser need not to look back
    • add unix xmodmap for us-ansi layout users
    • explain functionality of a toc section
    • degree is a dismiss by itself now
    • more elaborate Rationale
    • concise chords table
    • post scriptum added
  • r3 [23 January 2020]
  • r4 [24 January 2020]
    • Promote ascii digraphs to be a main entry method.
    • Remove most of the text related to entry methods and keyboard.
    • Add stress to the "ordering by toc" importance

Metadata

Metadata

Assignees

No one assigned

    Labels

    FrozenDueToAgeProposalToolsThis label describes issues relating to any tools in the x/tools repository.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions