Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: go/doc: add support for sections #44447

Open
dsnet opened this issue Feb 20, 2021 · 11 comments
Open

proposal: go/doc: add support for sections #44447

dsnet opened this issue Feb 20, 2021 · 11 comments

Comments

@dsnet
Copy link
Member

dsnet commented Feb 20, 2021

(This is a re-proposal of #18342)

Table of Contents

Problem Statement

The Go ecosystem has a set of tools broadly called "godoc" that produces humanly readable documentation for Go packages. Today, godoc implementations typically render all global declarations for constants, variables, functions, methods, and types in sorted order. While, there is some effort to correlate related declarations (e.g., a function that looks like a constructor with the type it produces, and methods with the receiver type), this minimal amount of grouping is often insufficient to adequately explain the functionality of a package at a quick glance.

Many other programming languages allow nesting of declarations (e.g., by declaring a class within another class, a static function within a class, or creating a namespace). These declarations provide other languages the ability to express a form a grouping with finer granularity that their respective godoc-like tool can make use of. Go has no such nesting mechanism, and so there is no language-specific way to express the grouping of related functionality.

The inability to specify grouping of functionality leads to godoc pages that are relatively unreadable. We do not propose changing the Go language in any way, but do propose that godoc provide support for user-defined sections for documentation purposes. This would allow package authors to group declarations that are related in functionality and to control the ordering of the sections themselves.

See the Examples below for how some packages become more readable with the use of sections.

Proposed Solution

Code is often written in a way where declarations that would be grouped together under a section for documentation is already located in proximity to each other within the source code itself. We propose that a special Section: marker in a top-level comment be used to signify the start of a section. All const, var, func, and type declarations below that marker will be considered part of that section.

The scope of a section extends until either:

  • the end of the source file, or
  • the next occurrence of a Section: marker.

The section marker syntax is designed to be lightweight and read naturally in source code. The syntax provides for a required heading and an optional description. For example:

// Section: My glorious heading
//
// Let me tell you more details about my awesome section
// in multiple lines of verbose text.

The heading is a string that immediately follows the Section: marker and must be one line. The description is optional and comprises of zero or more paragraphs (similar to package documentation) and must be preceded by a blank line. A godoc implementation may create an HTML anchor for the section heading. Thus, it is discouraged that authors change the heading lest they potentially break URLs to their sections on godoc. Note that this is already the case for "heading" lines supported by godoc today.

Sections with the exact same heading are treated as the same section. If multiple sections with the same heading each have a description, then the resulting section description in godoc will contain the concatenation of all paragraphs from each section (in the order they appear in the file and according to the lexicographical sorting of the source files). This practice is discouraged, but matches the behavior of when multiple source files each possess a package description (also discouraged practice).

When rendering a godoc page, declarations that do not fall under any explicit section are listed first, followed by all sections ordered lexicographically by the heading. There is no support for sub-sections, which can be accomplished by prefixing the heading with a section number to enforce a specific ordering (e.g., Section: 1. Main section, Section: 1.1. Sub-section, Section: 1.2. Sub-section, etc.).

Changes to the go/doc package

Most godoc implementations rely on the go/doc package to collect source code documentation from the Go AST for a given package. Adding support for sections to this package allows different godoc implementations to share logic for how to identify each section and only leaves each implementation responsible for how they decide to render the sections (or not).

These are the proposed changes to the doc package's external API:

 type Package struct {
+ 	Sections []*Section	
 }

 type Value struct {
+	Section *Section
 }

 type Type struct {
+	Section *Section
 }

 type Func struct {
+	Section *Section
 }

 type Example struct {
+	Section *Section
 }

+type Section struct {
+	Heading string
+	Doc     string
+
+	Consts   []*Value
+	Types    []*Type
+	Vars     []*Value
+	Funcs    []*Func
+	Examples []*Example
+}

The Package type has a new Sections field which is a list of all sections found in the package, sorted according to the section heading. For backwards compatibility, the Consts, Types, Vars, Funcs, and Examples fields remain unaffected by the presence of sections (lest the use of sections cause declarations to mysteriously disappear on godoc implementations that don't support sections).

The Value, Type, Func, and Example types each have a new Section field which is a pointer to the section that the declaration belongs to (or nil if it doesn't fall under any section).

The Section type is new and contains the required heading (in the Heading field) and the optional description (in the Doc field). Similar to the Package type, it contains a list of Consts, Types, Vars, Funcs, and Examples that belong within that section.

Changes to the go doc tool

The go doc tool is the primary way users view Go documentation on the command line. The implementation would be modified to make use of the new features provided by the go/doc package. The only effect of sections would be when the user prints documentation for the entire package. All other features of go doc would remain unchanged. Since the go doc tool and the go/doc package are released together, the tool can make use of the new package features in the same release.

Changes to the pkg.go.dev website

The pkg.go.dev website is increasingly becoming the de-facto portal to view Go documentation for modules and packages. We propose that the site be updated to support sections. It is unclear whether the backend implementation would wait until a release of the Go toolchain with the relevant go/doc package changes, or whether the implementation would vendor a pre-release version of the package.

Examples

encoding/binary

This shows how even small packages benefits from sections. In this situation, the ByteOrder type serves as documentation for what methods exist on the types for the LittleEndian and BigEndian variables. Unfortunately, the default ordering of godoc places these related declarations on opposite sides of godoc page, which greatly diminishes the clarity that ByteOrder, LittleEndian, and BigEndian are related. It is notable that these declarations are all co-located together in the source code.

google.golang.org/protobuf/proto

This is a recently released package, which had the opportunity to choose the best portions from the older proto package. Even though it avoids the cruft of the old proto package that grew organically over time, the new proto package can still benefit significantly from sections.

The lexicographical sorting of declarations does not make it clear what the primary functionality is and how they relate to one another.

  • The Size, Marshal, and Unmarshal functions are the primary serialization functionality and should appear together early on.
  • The Clone, Merge, Equal, Reset, and CheckInitialized functions are auxiliary functionality that should occur after Marshal and Unmarshal. Note that the default grouping of godoc unfortunately places Clone as a constructor of Message, when it is better grouped with Merge.
  • The Bool, Int32, Int64, Uint32, Uint64, Float32, Float64, and String are constructors for optional scalar types. Due to the lexicographical sorting of declarations, these are unfortunately interspersed among the other function declarations, greatly hindering readability.
  • The HasExtension, GetExtension, SetExtension, ClearExtension, and RangeExtensions function are related to proto2 extensions. Due to the different prefix, these are also unfortunately interspersed among the other function declarations. I was tempted to swap the prefix and suffix (e.g., ExtensionHas, ExtensionGet, etc.) so that the functions appear together in godoc. Authors really shouldn't have to play such word games.

For all the above groupings of declarations, they are all already co-located together in the source-code. This gives further evidence that documentation sections that best describe a package very often matches the implementation.

Related proposals

#25444: add support for hotlinks

When "sections" was first proposed in 2016, Russ counter-proposed with the idea of automatically turning exported names into links, which was subsequently accepted. I built a few prototypes for that feature, but never integrated it into godoc since the godoc ecosystem at the time was too fractured (i.e., needing to implement the same logic in multiple different godoc implementations) and also because modules did not exist (which is necessary to improve the accuracy of hotlinks).

In the years since, I've become increasingly convinced that "hotlinks" is troublesome:

  • False positives:

    • In English grammar, sentences usually start with an uppercase character. This has the unfortunate side effect of sometimes hotlinking the first word of a sentence as if it were referencing an exported identifier (when that's not the intent of the author).
    • Hotlinking is unable to distinguish between a reference to an exported declaration or a reference to something for which the declaration happens to be named after (or worse, something entirely unrelated). For example if the word IP occurs, is the referent the IP Go type or the IP protocol suite? While related, they are still fairly different things: the former represents just an IPv4 address, while the latter refers to an entire protocol suite.
    • In English grammar, the plural form of something often ends with an "s" suffix, where documentation might refer to Things and there is no Things declaration because the author is referring to a collection of Thing objects. In some cases a package might even have both Thing and Things declared. However, depending on the context, the word Things may sometimes refer to a collection of individual Thing objects or a single Things object. Hotlinking is unable to distinguish between this case. This is further complicated by English grammar rules not consistently adding an "s" suffix for plural forms, but has many exceptions (e.g., plural form of Tomato is Tomatoes).
    • Documentation often makes assumptions about the context to elide certain identifiers. For example, in documentation for Buffer.Read, it may simply reference the word Write, with the implicit assumption that Write references the Buffer.Write method and not some top-level Write function. Alternatively, the author may really have wanted it to reference the top-level Write function. The reference is ambiguous from the name alone, but context in the sentence often makes it clear to humans which is intended. In some cases, references to both a Write function and a Write method may occur together when the documentation tries to explain which to use in a given situation.
  • False negatives:

    • Hotlinking would ideally be able to detect references to all accessible methods on a type (e.g., Pipe.Read) and fields in a struct (e.g., Header.Name). However, most godoc implementations do not have the full type information available, and so may not have knowledge about certain possible references (e.g., a method or field promoted by embedding a type from another package). An example situation is embedding io.Reader in an interface declaration and referring to the Read method in the documentation. Even embedding of a type from the same package is challenging since it requires partially implementing portions of the Go type system in godoc.
    • Hotlinking would ideally be able to detect references to declarations in other packages. For example io.EOF should be linked to the io.EOF variable. One heuristic to detect such cases is if the identifier which looks like a package (e.g., io) happens to be imported by the current package and also if that package really does have that declaration (e.g., EOF in the io package). However, implementing this would require the ability to lookup a declaration in a different package. The existence of modules makes this possible, but it will be difficult for the go/doc package to provide this feature without substantial changes to its API and implementation.

In summary, I believe hotlinking goes against the philosophy of Go that "clear is better than clever". Hotlinking is very clever, but it cannot provide the right results all the time and in some cases may even actively mislead users. It is built on a set of heuristics (not rules) which may change over time, which further leads to a poor user experience where a godoc page is rendered as intended today, but renders differently (and maybe incorrectly) in the future due to changes to the hotlinking heuristics. Lastly, it incurs too much mental burden on package authors to think about whether a "exported" word they write will be hotlinked correctly or not. On the other hand, sections are simple, explicit, clear, and stable (relative to changes to godoc).

@rsc
Copy link
Contributor

rsc commented Feb 24, 2021

I am confused about what is being proposed here exactly.
Are you saying that new comments in the middle of a Go file control the order used in typeset documentation?
Can you give a short example of a full Go source file making use of this change?

@josharian
Copy link
Contributor

This might help with #44301.

@rsc
Copy link
Contributor

rsc commented Mar 10, 2021

Adding to minutes.
@dsnet, can you give a short example of a full Go source file making use of this change?

@dsnet
Copy link
Member Author

dsnet commented Apr 13, 2021

I filed #45533 as an improvement on the shortcomings of #25444. I think it's reasonable to put this proposal on hold if #45533 is accepted.

@rsc
Copy link
Contributor

rsc commented Apr 21, 2021

On hold for #45533.

@rsc
Copy link
Contributor

rsc commented Apr 21, 2021

Placed on hold.
— rsc for the proposal review group

@jsshapiro
Copy link

jsshapiro commented Feb 5, 2023

This is a stray thought, but it seems to me that the issue may be broader than documentation.

At the moment, a package is both a unit of import and a namespace. A package more or less needs to be a namespace, but it would sometimes be useful for it to have sub-namespaces. As an example drawn from something I'm playing with, the package "color" might usefully have name spaces

color // the top-level name space of the package
color.Model
color.Space
color.WhitePoint

If this were true, then it would become natural for godoc to organize documentation by namespace.

Though it could have been thought out better, Issue #20467 proposed something along these lines, and was rejected by @ianlancetaylor, who said that Go prefers to decompose concepts rather than nest them, and said words to the effect that replacing . with _ seems fine. In my opinion, his opinion confuses concept decomposition with (human) hierarchical organization of concepts, and consequently confuses units of import with units of naming.

One can divide packages into package hierarchies, but within commonly related sub-concepts this often leads to circular package import dependencies, while sub-namespaces within a package can be viewed as simultaneously co-defined and tolerant of circular reference. A sub-namespace is syntactic sugar. It's value lies in organizing the names for human consumption, which is an important criteria in programming language design.

What I am trying to express, I think, is that there seems to be a disconnect between human organization patterns and code organization patterns. Go's sparsity is something I admire, but there seem to be areas where Go has initially rejected approaches whose value has been well established in other contexts. While doxygen has serious flaws and a dubiously ambiguous specification, it also provides critical and useful expressiveness that seems to be creeping glacially into godoc one feature at a time. This feels somewhat similar.

@bcmills
Copy link
Contributor

bcmills commented Jul 17, 2023

On hold for #45533.

#45533 was merged into #51082, which was accepted and implemented.

@bcmills bcmills moved this from Hold to Incoming in Proposals Jul 17, 2023
@ianlancetaylor
Copy link
Member

We have headings now, so I'm not sure there is anything to do here. dsnet, is there something that should still be added? Thanks.

@dsnet
Copy link
Member Author

dsnet commented Jul 18, 2023

The main utility of sections isn't the ability to provide headings, but rather a way for the package author to explicitly group related Go declarations together.

@ianlancetaylor
Copy link
Member

Ah, thanks, I misunderstood the proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Incoming
Development

No branches or pull requests

7 participants