Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

associating data with functions, modules, and globals #3988

Closed
StefanKarpinski opened this issue Aug 8, 2013 · 104 comments
Closed

associating data with functions, modules, and globals #3988

StefanKarpinski opened this issue Aug 8, 2013 · 104 comments
Labels
docs This change adds or pertains to documentation
Milestone

Comments

@StefanKarpinski
Copy link
Member

There are a number of issues discussing documentation for Julia code (#762, #1619, #3407), but I'd like to separate this problem into two very distinct issues:

  1. Associating text from source files – both comments and source code – with functions, methods, modules, and global bindings.
  2. Interpreting and presenting this data to the world.

We keep getting bogged down in the combination of these two issues, but they can be tackled separately, and should, imo, remain decoupled – that is, the infrastructure for (1) should be reusable with different approaches to interpreting comments and different mechanisms for presenting documentation (help, sphinx, dexy, jocco, etc.).

This issue is for discussion of (1):

  • What we want to be able to associate with run-time objects like functions, methods, modules, and global bindings? It would be nice to have easy, queryable access to source code for things as well as inline comments associated with that source code.
  • How to associate that data with run-time objects? While it may be reasonable to have this kind of overhead in interactive situations, we also must be able to run programs non-interactively without paying that price.

Let's solve this first and then figure out how to interpret and present things.

@stevengj
Copy link
Member

stevengj commented Aug 8, 2013

cc: @johnmyleswhite, @lindahua, @ViralBShah

@staticfloat
Copy link
Member

I'm pretty excited about this; I think it'll make documentation easier as a first pass, but being able to attach data to functions in general can be used for quite a few neat things. The first time I wanted something like this was when I was developing the codespeed infrastructure; I wanted to annotate functions with metadata stating the name of the test that function ran, what units the resultant metric of that test would be, (Time, FLOPS, bytes/clock cycle, etc....) whether "less is better" for that particular unit, etc..... So I think whatever we come up has the opportunity to be somewhat more than the only analogue I can think of right now (Python's docstrings), which is just a single string of data. We have the chance to make the data we attach highly structured, in the sense that it can be manipulated by other julia code.

@JeffreySarnoff
Copy link
Contributor

regards to all .. imho ..

@StefanKarpinski writes this right

What we want to be able to associate with run-time objects like functions, methods, modules, and global bindings? It would be nice to have easy, queryable access to source code for things as well as inline comments associated with that source code.
How to associate that data with run-time objects? While it may be reasonable to have this kind of overhead in interactive situations, we also must be able to run programs non-interactively without paying that price.

In any creatively powerful software paradigm, and so certainly with Julia, there is available a dynamism that at once allows a design to run well, go fast reliably, and harvest the deep accurately and at another affords development, investigation and playfulness the robust power and makes perspective, conception, and insight readily accessible as a newly realized design that runs well, goes fast and is reliably accurate.

As Stefan notes, it is entirely reasonable and sound that Julia offer the language user each modality's respective advantage; that is more compelling than a requirement that they operate in mutual simultaneity.

@mlubin
Copy link
Member

mlubin commented Aug 9, 2013

+1

@Carreau
Copy link

Carreau commented Aug 9, 2013

To comment on 2) Interpreting and presenting this data to the world. and more in relation with IPython notebook/qtconsole/console that now can be used with IJulia, I want to point out that we had the discussion in IPython of enabling "rich" docstring. So as you integrated multimedia io (#3932) into IJulia core, maybe you could have the possibility of help() returning different mimetype for different frontend.

You are probably much more flexible in what you can do than us in IPython, and we will happily see what you come up with.

Be carefull though, with rich mimetype representation of documentation, doc may become a security issue (inject javascript in the notebook that can execute code in the kernel), but it can also be an advantage as you could also have executable or dynamic doc, like runable sample code. One thing we were not totally able to solve is how to have working cross-link in the live documentation in the notebook.

@JeffBezanson
Copy link
Member

similar to #2508

@stevengj
Copy link
Member

This issue has been neglected for too long. Let me make a concrete proposal to get the ball rolling. A basic starting point could be:

  • Julia should include a global dictionary-like object DOC::DocDict <: Association{Any,Any}.
    • The keys are any Julia object (typically of type Function or Method, although we'll also want to document other Julia objects.)
    • The values are any Julia type, and we will use the writemime machinery to convert this to various formats, e.g. reprmime("text/plain", DOC[x]) to get the text/plain documentation of x.

On top of this machinery, various pieces could be added:

  • DOC[f::Function] would look up the general documentation for f, analogous to our help now (we would still have a help function, it will just use DOC). DOC[m::Method] would look up the documentation for a specific method signature. To get all of the documentation for a function f, you would call [DOC[f], [DOC[m] for m in methods(f)].
  • Some kind of macro could be defined to make it easier to add documentation for functions of a given signature. e.g.
@doc foo: f =   # equivalent to DOC[f] = foo, i.e. documentation for f independent of any method signature
@doc bar: function f(....)  # equivalent to DOC[method signature for f(....)] = bar
     ....
end
  • Note that importing a module would execute all of its embedded DOC[foo] = bar statements, appending to the documentation.
  • We could easily implement a "noninteractive" mode in which @doc and DOC[foo] = bar do nothing, to eliminate any overhead of storing/updating DOC in production code.

The simplest documentation would be in the form of strings, for which only the text/plain representation is available. However, we could define types to encapsulate higher-level information and formatted text. For example:

  • DocDefinition(doc::Any, file::String, line::Integer, source::String, ....timestamp?....other?....) to store a documentation value doc along with metadata for a definition in a source file. The @doc macro could automatically use this wrapper type. One could define writemime(m::MIME, d::DocDefinition) = writemime(m, d.doc) to make this wrapper transparent.
  • Various formatted-text or other container types. e.g. Markdown(s::String) which interprets its argument as markdown with embedded LaTeX equations, and defines writemime(::MIME"text/x-markdown, x::Markdown) along with other output formats. So one would do e.g.
@doc Markdown("""
.....
"""): foo(...) = ....

or there could be a @docmd shortcut for this.

@JeffBezanson
Copy link
Member

I like the simplicity of this approach.

@velicanu might be interested.

@johnmyleswhite
Copy link
Member

This is a really great idea.

@velicanu
Copy link

This is interesting, I'll try to do it.

@stevengj
Copy link
Member

We also need some way of associating documentation with manual sections in a hierarchy (e.g. "Mathematical functions / Special functions / Bessel functions"). And in general we want a way to associate metadata with objects. One option, in line with the above proposal, would be to:

  • Define our own "MIME types" for any desired metadata. e.g. metadata/author for author string, or metadata/section for an @ delimited string of section names in descending order of specificity, e.g. "Bessel functions@Special functions@Mathematical functions.
  • Any DOC[x] value type that wants to provide any metadata could define the appropriate writemime function.
  • The @doc macro could accept metadata as keyword-like arguments:
@doc section="Documentation@Awesomeness" author="Alyssa P. Hacker" """ ..... docs .... """: somefunction(...) = ...

and would store them in a "metadata" Dict inside DocDefinition. mimewritable for DocDefinition would then return true for metadata MIME types corresponding to keys in the metadata Dict.

@stevengj
Copy link
Member

Some thought should go into the @doc macro syntax to make the resulting code as human-readable as possible. One annoyance with using a macro for this is that you can't simply insert linebreaks wherever you want without breaking the parsing. But if this seems to be a problem I suppose that we could add a new keyword/syntax to Julia that parses as @doc or some kind of document(expr, ...) function call.

@stevengj
Copy link
Member

@loladiro, is there any missing functionality in the above proposal compared to what is needed to implement the REPL help?

@Carreau
Copy link

Carreau commented Nov 24, 2013

Note that importing a module would execute all of its embedded DOC[foo] = bar statements, appending to the documentation.

Have you considered dooing so only at install time for libs ? I'm especially thinking that for library. One would probably like to build the all html doc at once when the library is installed, because of cross-links and everything
you might need to build the doc for the all lib at once. Also, in notebook, we can probably have a link in the pager that open file://path/to/julia/doc/module/function.html that is browsable (runnable ??) .

@stevengj
Copy link
Member

@Carreau, on top of this one can build various tools, e.g. a tool to import a module and build documentation in some format. As @StefanKarpinski said at the top of this thread, however, that is conceptually separate from the task of associating the data with the objects in the first place.

@Carreau
Copy link

Carreau commented Nov 24, 2013

@stevengj Sorry I wasn't clear, I was not worried about the external tool to build the doc, I was wondering about associating externally this back to the objects. Like an external way to add value to DOC::DocDict <: Association{Any,Any} but I guess you are right, this can be a layer on top of DOC.

@stevengj
Copy link
Member

I'm not sure what you mean by an "external way to add a value to DOC" ... any Julia program will be able to mutate the DOC contents.

@Carreau
Copy link

Carreau commented Nov 24, 2013

I might have misunderstood something, and will re-read, but global dictionary-like object made me though of a per-session object that dies with the interpreter, which can make sens in a interactive like environnement. This was comforted by the :

Note that importing a module would execute all of its embedded DOC[foo] = bar statements, appending to the documentation.

I was more thinking of a persisting database of those info (for example build at package installation time)
And at some point I can for example run a local html doc-build of JuMP that "register" with this database, so that when I do help(some-function-of-jump) it knows how to access this.

@mauro3
Copy link
Contributor

mauro3 commented Dec 6, 2013

The global dict DOC::DocDict proposed by @stevengj doesn't seem quite right to me. Shouldn't globals be avoided if possible? Why not put that info directly into the modules, methods and functions themselves? For instance, add a field data to the Function type and similarly to the Methods type. Let data be a dict or contain a field data.doc. That way help(fn) could get to it and the hierarchy information is easily available too. Other data could put into that dict as well, like e.g. the source code or the annotations @staticfloat mentioned above.

What's missing is the possibility to associate data with globals. Either make all globals containers with a data field too, or resort to a global dict for those.

@stevengj
Copy link
Member

stevengj commented Dec 6, 2013

What's wrong with a global in this context?

  • It's much easier to look up (and update) information in a single dictionary than in many. And a simpler implementation is easier to write, debug, and maintain.
  • One module can extend a method defined e.g. in Base, so it's not obvious that segregating the documentation for that method is desirable.
  • The hierarchy information is still easily available, because given any Method signature m, m.func.code.module gives the corresponding module (and given a module one can find the parent with module_parent.) It would be easy to add module information to the DOC dict for constants too if desired.
  • You want to be able to document things other than Function or Method, e.g. constants, types, and perhaps macros. So adding fields to Function and Method is not sufficient, as you point out. And if you have a "global dict" for constants, it only adds complexity to have a completely separate data structure for function and method documentation.
  • Adding fields and dictionaries to common types like Function adds runtime overhead. In something like Python this doesn't matter, but in Julia it is a big deal. You certainly don't want to slow down running code, and there should be a way to avoid storing the documentation entirely in production code.

What is the concrete disadvantage of a global dictionary that overcomes its advantages in simplicity and functionality? Blanket prejudice against globals is not persuasive.

@mauro3
Copy link
Contributor

mauro3 commented Dec 6, 2013

To me segregating the function/method metadata from the function is odd. There is plenty of (meta-)data already associated with methods/functions/modules (e.g. signature, module...), why treat the additional metadata differently? (See point 3 for the most important argument)

Say for instance, I have a method which is 'private' to my module, i.e. I don't export it. But I may still want to document it (for my own purpose) or I want to add other metadata like @staticfloat mentioned. Why should this metadata, which is private to a module, live in a global variable?

Comments to your points:

  1. There is not much difference in looking up/modifying things using fn.data.doc or DOC[fn]. Also usually one would use help(fn) or some other function which would work with either. Also, as I mentioned above, there is plenty data already associated with functions/modules/... so it must be possible to maintain such machinery.
  2. when extending a function with another method then that method is still contained in the generic function. So there is no segregation. Also, documentation writing will have to take into account multiple dispatch. I imagine that there should be some generic doc for the function, like + adds numbers; and specialized doc refining on that, like +(a::Integer, b::Rational) adds a + b and returns a Rational (a bit a stupid example).
  3. I think it would be awkward to get the namespacing right with a dict. Examples: DOC[:sin] should work and DOC[:(Base.sin)] should work too. Do you define it twice, or is only one valid? What if I do sinalias = sin; DOC[:sinalias]? What if two modules have a function of the same name? How is DOC updated after a using imports some names into the top level? All these namespace issues would come for free if the data was tacked onto the functions/methods/... This seems to me the most important argument against the dict.
  4. I think for type-metadata it would also be fine to add another field to the DataType datatype. To annotate instances of a type, say pi a convention could be to define a field like _doc and put the documentation there. That leaves us with macros, not sure about those. Are they a type themselves? What are they?
  5. I can't comment too much on performance. But I think in either approach it should be possible to tell the parser to fill the dict/field, if in the REPL, or not, if not interactive. Also, one could make the metadata immutable, that should help.

Well, either way, it will be good to have a way to associate metadata with functions etc., especially for docs.

@toivoh
Copy link
Contributor

toivoh commented Dec 6, 2013

Number 3 is actually not a problem, since you would use the function object itself etc. as the key: DOC[sin]. This works just the same way with namespacing as storing metadata inside the objects. Either approach will have trouble with macros however, since there doesn't seem to be any actual macro object to use as a key, or store metadata in.

@toivoh
Copy link
Contributor

toivoh commented Dec 6, 2013

Argh, github though that my 3. was a 1. I was talking about number 3, anyway.

@mauro3
Copy link
Contributor

mauro3 commented Sep 9, 2014

There was a lengthy discussion about help/documentation/etc on the mailing list recently, worth referencing here:
https://groups.google.com/d/msg/julia-users/aw--nSvNZR4/6MzxC9yuZe4J

Of course, no consensus was reached but a few interesting things were discussed:

  • string-based vs comment-based documentation
  • how complex or simple should it be
  • whether the documentation should become part of the AST

@StefanKarpinski
Copy link
Member Author

Jeff and I just talked about this today and a bare string literal in void context followed by a definition seems like the way to go. This should be lowered by the parser something like this:

"`frob(x)` frobs the heck out of `x`."

function frob(x)
  # commence frobbing
end

becomes the moral equivalent of this:

let doc = "`frob(x)` frobs the heck out of `x`."
  if haskey(__DOC__, :frob)
    __DOC__[:frob] *= doc
  else
    __DOC__[:frob] = doc
  end
end

function frob(x)
  # commence frobbing
end

Important points about this approach:

  1. parsing has no side-effects – the construction of the documentation structure still occurs when the code is actually evaluated, not when it is parsed.
  2. each module has its own const __DOC__ = Dict{Symbol,UTF8String} dictionary; this is important for reloading modules.
  3. This ends up just appending all the docs for a given name, including separate doc strings for a single generic function.

An open issue is how to handle adding methods to functions from other modules. Does the definition go into the current module's __DOC__ dict? What symbol is used for the doc key then?

[cross-posted from here]

@Carreau
Copy link

Carreau commented Sep 30, 2014

I don't like the fact that doc are before function but that's probably beeing use to python. Though for me it raises a side question.

When printing function source code, will it show docstring above function?
That is to say : Is the beginning of the function considered to be start of the docstring, or function keyword? (Like in links to github definition and others)

I'm asking cause it is one of the things that annoy me in js, which is inability to print docstring in repl when they are before function def.

I know it's a detail but just want to bring it up.

Envoyé de mon iPhone

Le 30 sept. 2014 à 08:16, Stefan Karpinski notifications@github.com a écrit :

Jeff and I just talked about this today and a bare string literal in void context followed by a definition seems like the way to go. This should be lowered by the parser something like this:

"frob(x) frobs the heck out of x."

function frob(x)

commence frobbing

end
becomes the moral equivalent of this:

let doc = "frob(x) frobs the heck out of x."
if haskey(DOC, :frob)
DOC[:frob] *= doc
else
DOC[:frob] = doc
end
end

function frob(x)

commence frobbing

end
Important points about this approach:

parsing has no side-effects – the construction of the documentation structure still occurs when the code is actually evaluated, not when it is parsed.
each module has its own const DOC = Dict{Symbol,UTF8String} dictionary; this is important for reloading modules.
This ends up just appending all the docs for a given name, including separate doc strings for a single generic function.
An open issue is how to handle adding methods to functions from other modules. Does the definition go into the current module's DOC dict? What symbol is used for the doc key then?

[cross-posted from here]


Reply to this email directly or view it on GitHub.

@jasongrout
Copy link

I don't like the fact that doc are before function but that's probably beeing use to python.

I agree. The thing I most look at documentation for is to see the signature and the one-line summary. Having those right next to each other (as in Python) is really nice. In this proposal, will they often be separated by a lot of detailed documentation?

One way around this is to reproduce the signature, as in the above example, which seems a bit silly given that the perfectly good signature (often with great type information) is available right at the start of the function. I guess another way around this is to put a one-line summary at the end of the docs, which seems weird to me.

@mauro3
Copy link
Contributor

mauro3 commented Sep 30, 2014

@StefanKarpinski, two things:

  • I think it would be better to use the actual object as the dict key and not a symbol. For instance when using a module, with just symbols it would be hard to figure out where the binding came from. This would also solve adding docs to other modules: it just adds to the __doc__ of the module where the entity is defined. (but macros would need to be treated separately)
  • why not work with a function, say setdoc! which does all of the parser-inserted code?

Your example would then look like:

"`frob(x)` frobs the heck out of `x`.":
function frob(x)
  # commence frobbing
end

becomes the moral equivalent of this:

function frob(x)
  # commence frobbing
end
setdoc!(frob,  "`frob(x)` frobs the heck out of `x`.")

Two more things:

  • the documentation for a module itself, would that go into its own __doc__ or in the __doc__ of the parent module?
  • I like the : (as I inserted into above example) as a way to bind the docstring to the thing following it, but that really is just another bikeshed.

@Carreau
Copy link

Carreau commented Sep 30, 2014

I guess that in generated docs the issue of signature/summary separation does not exist as things can be reordered. The issues arise only when reading source code.

Envoyé de mon iPhone

Le 30 sept. 2014 à 13:51, Jason Grout notifications@github.com a écrit :

I don't like the fact that doc are before function but that's probably beeing use to python.

I agree. The thing I most look at documentation for is to see the signature and the one-line summary. Having those right next to each other (as in Python) is really nice. In this proposal, will they be separated by a lot of detailed documentation?


Reply to this email directly or view it on GitHub.

@JeffBezanson
Copy link
Member

One motivation for this design is that it extends to simple variables, e.g.

"doc for X"
const X = ...

Supporting that also seems to preclude attaching the doc string to the object.
I imagine the actual lowering would produce something like setdoc!(current_module(), frob, :frob, string) so that setdoc! has enough information to do whatever might be necessary.

@StefanKarpinski
Copy link
Member Author

If you play around with this syntax in the presence of multiple dispatch, you can see immediately that the doc string inside approach just doesn't work: you want to have the doc string for a generic function before a series of method definitions for the same generic function, not inside one of the method definitions. We're also going to use this for things like globals, which don't have an "inside".

@JeffBezanson
Copy link
Member

Some other examples:

"doc for overall function f"
function f;    # possible syntax for defining a generic function without adding a method yet

"doc for g"
g(x) = 2x

# add docs to a function without defining anything
"doc for h"
h

@johnmyleswhite
Copy link
Member

What would this do?

"doc for a,b"
a,b = 1,2

@JeffBezanson
Copy link
Member

At first that might have to be a parse error, or we just ignore the doc string if we don't know how to attach it to the following expression.

@ssagaert
Copy link

It would be nice if one could refer to args by name in the doc. I find this really useful for more elaborate explanations. Like

"blabla @arg blabla"
function f(arg)
...
end

If you think @ clashes too much with macros then just use another character.

@kmsquire
Copy link
Member

Since these are presumably also going to be collected into external
documentation, will ordering matter? Dicts are unordered. Of course, that
implies that the order in files has meaning, which isn't necessarily the
case.

On Tuesday, September 30, 2014, Steven Sagaert notifications@github.com
wrote:

It would be nice if one could refer to args by name in the doc. I find
this really useful for more elaborate explanations. Like

"blabla @arg https://github.com/arg blabla"
function f(arg)
...
end

If you think @ clashes too much with macros then just use another
character.


Reply to this email directly or view it on GitHub
#3988 (comment).

@stevengj
Copy link
Member

See #8514; @StefanKarpinski's preference is that the documentation be inserted into external docs by some kind of {{myfunction}} manual template, so that they can be mixed with proper narrative documentation. That obviates the question of automatic ordering.

@ssagaert
Copy link

ssagaert commented Oct 7, 2014

It would be nice to have the possibility to refer to args by name in the doc string/comment like

“this is the doc for f. @x specifies the input”

function f(x)

….

end

I’ve used @x here because that’s what’s used in javadoc but if you think this clashes too much with macros then just take another character.

This can be really handy when you have longer doc with more detailed explanation.

Van: Jeff Bezanson
Verzonden: ‎dinsdag‎ ‎30‎ ‎september‎ ‎2014 ‎16‎:‎33
Aan: JuliaLang/julia
CC: Steven Sagaert

Some other examples:
"doc for overall function f"
function f; # possible syntax for defining a generic function without adding a method yet

"doc for g"
g(x) = 2x

add docs to a function without defining anything

"doc for h"
h


Reply to this email directly or view it on GitHub.

@ViralBShah
Copy link
Member

Is it fair to close this with the recent work on @doc and discuss specific details in separate issues?

Cc: @one-more-minute @MichaelHatherly

@ViralBShah
Copy link
Member

For reference: #8514

@MikeInnes
Copy link
Member

Yes, this issue seems to be basically concerned with the core doc system (storing + displaying metadata), which we have now. We do still want things like syntax, but now that those have their own issues this doesn't seem that relevant.

@MichaelHatherly
Copy link
Member

Yeah, the general concerns in this issue seem to be covered by @one-more-minute's recent work.

@mauro3
Copy link
Contributor

mauro3 commented Mar 11, 2015

close this

@pao
Copy link
Member

pao commented Mar 11, 2015

@ViralBShah looks like you performed a drive-by tag removal, but you didn't close--what is the remaining work here?

IanButterworth pushed a commit that referenced this issue Aug 11, 2024
…6cff6 (#55463)

Stdlib: Pkg
URL: https://github.com/JuliaLang/Pkg.jl.git
Stdlib branch: release-1.10
Julia branch: backports-release-1.10
Old commit: 45521a6e8
New commit: a4f26cff6
Julia version: 1.10.4
Pkg version: 1.10.0(Does not match)
Bump invoked by: @IanButterworth
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
JuliaLang/Pkg.jl@45521a6...a4f26cf

```
$ git log --oneline 45521a6e8..a4f26cff6
a4f26cff6 [release-1.10] Pkg.precompile: Handle when the terminal is very short (#3988)
```

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs This change adds or pertains to documentation
Projects
None yet
Development

No branches or pull requests