Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build documentation into cylc.flow.cfgspec modules #3253

Closed
oliver-sanders opened this issue Jul 29, 2019 · 14 comments · Fixed by #3559
Closed

build documentation into cylc.flow.cfgspec modules #3253

oliver-sanders opened this issue Jul 29, 2019 · 14 comments · Fixed by #3559
Assignees
Labels
doc Documentation
Milestone

Comments

@oliver-sanders
Copy link
Member

oliver-sanders commented Jul 29, 2019

Describe exactly what you would like to see in an upcoming release

Take the configuration documentation out of cylc-doc and write it into the source code so that the documentation can be auto-generated.

  • Ensures all configurations are documented and that no old configurations are accidentally left documented
  • Prevents docs getting out-of-sync with cylc-flow
  • Easier to work on configuration documentation in-place with cylc-flow commits

See "Auto-Documentation" section of cylc/cylc-doc#11

Tentatively added to the Cylc8 milestone to facilitate suite interface changes. Feel free to bump it back.

Additional context

The global spec must be loaded before pretty much every Cylc command so it is important not to make the SPECs too heavy.

Having said that would a little extra text cause problems?

  • If not do it!
  • If so investigate
    • Putting documentation in comments?
    • Putting documentation in another file?

Question/Decision

How should we encorporate documentation?

  1. In the SPEC e.g.
SPEC = {
    'meta': {
        'description': [
            VDR.V_STRING,
            '',
            '''
                A multi-line description of the suite. It can be retrieved
                at run time with the cylc show command.

                :type: multi-line string
                :default: (none)
            '''
        ],
  1. In comments (yuck, and good luck parsing that) e.g.
SPEC = {
    'meta': {
        'description': [VDR.V_STRING, '']
        #: A multi-line description of the suite. It can be retrieved
        #: at run time with the cylc show command.
        #:
        #: :type: multi-line string
        #: :default: (none)
  1. In a separate documentation file?
DOCSPEC = {
    'meta': (
        '''
        Section containing metadata items for this suite.
        Several items (title, description, URL) are pre-defined and
        are used by the GUI. Others can be user-defined and passed to
        suite event handlers to be interpreted according to your needs.
        For example, the value of a “suite-priority” item could determine
        how an event handler responds to failure events.
        ''',
        {
            'description': '''
                A multi-line description of the suite. It can be retrieved
                at run time with the cylc show command.

                :type: multi-line string
                :default: (none)
            ''',
  1. Something alltogether different e.g:
[meta]
    [[description]]
        type = VDR.V_STRING
        default = ''
        description = """
            A multi-line description of the suite. It can be retrieved at
            run time with the cylc show command.

            type: multi-line string
            default: (none)

Pull requests welcome!

@oliver-sanders oliver-sanders added this to the cylc-8.0.0 milestone Jul 29, 2019
@matthewrmshin
Copy link
Contributor

We can have a single SPEC, but reduce it to the essential Python data structure (or generate the data structure) at deployment/installation time? So the commands only loads the essential stuffs?

@hjoliver
Copy link
Member

hjoliver commented Aug 1, 2019

My two cents worth:

  1. 👍 - Python data structure already, no parsing required (except perhaps of string contents during doc generation, which doesn't really count)
  2. 👎 - parsing nightmare, as you suggest
  3. 👎 - OK but easy to get out of sync with the SPEC?
  4. 👎 - why would we have a human-readable config file, requiring parsing, for the SPEC?

If the extra text - quite a lot of it I suppose - does cause a problem, maybe @matthewrmshin's idea is good: at install time, dump a text-stripped version of the SPEC for use at run time.

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Aug 2, 2019

I might just have a quick experiment and see what kind of performance hit we get from method (1), if it's too small to care about, just do it? If it becomes a problem later, compile as Matt suggests.

@wxtim
Copy link
Member

wxtim commented Aug 16, 2019

Might we be able to use a @dataclass (possibly with sub-dataclasses) to replace this structure - that way each item can be given it's own docstring?

@dataclass
class Specification():
    """Description
         Blah blah blady blah blah bla!
    """
   meta: CylcMeta()

@dataclass
class CylcMeta():
    """
    A multi-line description of the suite. It can be retrieved
    at run time with the cylc show command.
    """
    description: [VDR.V_STRING, '']
    group: [VDR.V_STRING, '']
    title: [VDR.V_STRING, '']
    URL: [VDR.V_STRING, '']

@hjoliver
Copy link
Member

Sounds plausible @wxtim

@hjoliver
Copy link
Member

hjoliver commented Aug 18, 2019

Another option might be Python "properties" (a bit ungainly though?):

class Config(object):
    def __init__(self, thing=None):
        self._thing = thing

    @property
    def thing(self):
        """It's a Thing!"""
        return self._thing

if __name__ == "__main__":
    print(Config.thing.__doc__)

result:

$ python ./foo.py
It's a Thing!

@hjoliver
Copy link
Member

Attribute docstrings rejected years ago: https://www.python.org/dev/peps/pep-0224/

@hjoliver
Copy link
Member

PEP 257 Docstring Conventions says:

String literals occurring elsewhere in Python code may also act as documentation. They are not recognized by the Python bytecode compiler and are not accessible as runtime object attributes (i.e. not assigned to doc), but two types of extra docstrings may be extracted by software tools:

  1. String literals occurring immediately after a simple assignment at the top level of a module, class, or init method are called "attribute docstrings".

Does Sphinx support these?

@oliver-sanders
Copy link
Member Author

Does Sphinx support these?

Sphinx along with some other Python auto-doc systems do support attribute docstrings, but, they do so by implementing their own lightweight Python parsers which is somewhat messy.

@oliver-sanders
Copy link
Member Author

Might we be able to use a @DataClass (possibly with sub-dataclasses) to replace this structure

It is a possibility but it would change the way Cylc's config system works.

At present the cfgspec modules define the specification, the data storage is handled by the Config object.

dataclasses represent both the specification and the storage so would be a bit of a curveball. Perhaps the sort of thing we should look at with the Python API?

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Aug 19, 2019

OK so here's a really simple test to see what the impact of those docstrings might be.

I took the content of the suite config reference material as plain text and put it in a python file

        $ /usr/bin/time -v python file.py
	User time (seconds): 0.04
	System time (seconds): 0.01
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.11
	Maximum resident set size (kbytes): 6724

Then compared that to an empty file

        $ /usr/bin/time -v python blankfile.py
	Command being timed: "python blank.py"
	User time (seconds): 0.04
	System time (seconds): 0.01
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.10
	Maximum resident set size (kbytes): 6340

There is a 6724 - 6340 = 384kb increase in memory which isn't great, though this memory does not need to persist for the life of a suite.

However, the overheads are pretty small really.

@hjoliver
Copy link
Member

hjoliver commented Jan 30, 2020

@oliver-sanders - reading back through this issue we seemed to have agreed on your suggestion number-1., which was adding documentation in the spec dictionary. In light of that, can you explain the rationale for your non-dict POC/suggestion above, here in the issue?

@oliver-sanders
Copy link
Member Author

oliver-sanders commented Jan 30, 2020

can you explain the rationale for your non-dict POC/suggestion above

You managed to reply to a post after I had sneakily deleted it whilst I went back to think some more.

Short answer, we need to document both settings and sections and that will break the current dictionary approach.

Not quite sure which way I lean ATM, option 3 is actually pretty easy (I've already written the tree node stuff for another purpose, I tried it out with parsec, working) so it's about which is the best solution not which is the easiest.

BTW if you fancy messing around with context nodes try this gist: https://gist.github.com/oliver-sanders/6959c41341ed3e4ffca3061ef84b6b9c

Option 1 - Dictionaries

  • Settings and Sections are both dictionaries.
  • Sections get children attributes.
  • Keys (e.g. name, type, help, ...) don't get type checked.
SPEC = {
    'name': 'suite',
    'help': 'Configure a workflow',
    'children': [
        {
             'name': 'scheduling',
             'help': 'What and when',
             'chilren': [ ... ]
             # ^ whoops this typo will cause a hard-to-debug error
        }
    ]
}

# getting configs becomes a total mess
>>> SPEC['children']['scheduling']['children']['initial cycle point']['type']
'V_STRING'

# but we can use a function to get stuff
>>>> getc(SPEC, ['scheduling', 'initial cycle point'])['type']
'V_STRING'

Option 2 - Named Tuples

  • Sections and Settings become named tuples.
  • Sections get children attributes.
  • Arguments get checked (can't misspell)
  • Memory improvements vs dict.
  • Easy to tell between a section and setting by type
  • Can still serialise to JSON
SPEC = {
    'scheduling': Section(
        help: 'What and When',
        children={
            'initial cycle point': Setting(type=V_CYCLE_POINT, default=None),
        }
    )
}

# Still have the children mess
>>> SPEC['scheduling'].children['initial cycle point']['type']

Option 3 - Context Managers

  • Sections and settings become tree nodes.
  • Tree nodes are just named tuples with a couple of methods for iterating and formatting.
  • Memory improvements vs dict.
  • Easy to tell between a section and setting by type.
  • Can still serialise to JSON.
  • Could potentially parse from it.
  • Nodes are objects you can pass around in your code.
with Conf('suite', help='Configure a workflow') as SPEC:
    with Conf('scheduling', help='What and when'):
        Conf('initial cycle point', type=V_STRING, default=None)
        Conf('final cycle point', type=V_STRING, default=None)

>>> SPEC['scheduling']['dependencies']['initial cycle point'].type
'V_STRING'
>>> SPEC['a']
KeyError
>>> str(SPEC['scheduling'])
'[scheduling]'
>>> str(SPEC['scheduling'].children()])
['[scheduling]initial cycle point', '[scheduling]final cycle point']
>>> SPEC.tree()
[scheduling]  # What and when
    initial cycle point  # V_STRING
    final cycle point  # V_STRING
    [[dependencies]]  # Whatever

# Can easily serialise
>>> SPEC_DICT = dict(SPEC)
>>> SPEC_DICT
{
    'name': 'suite',
    'help': 'Configure a workflow',
    'children': [
        {
             'name': 'scheduling',
             'help': 'What and when',
             'children': [ ... ]
        }
    ]
}

# That said could easily go the other way too
>>> Conf(SPEC_DICT).tree()
[scheduling]  # What and when
    initial cycle point  # V_STRING
    final cycle point  # V_STRING
    [[dependencies]]  # Whatever

# Tree nodes are objects which know their position in the tree so can be passed around the program which could have bonuses.
>>> config.get(SPEC['scheduling']['initial cycle point'])
1234
>>> SPEC['scheduling']['some deprecated setting']
WARNING: making use of deprecated setting
>>> SPEC['scheduling']['some removed setting']
KeyError

Other Options

  • A parsec configuration.
  • Dataclasses (basically the same as dictionaries but with argument checking).

@hjoliver
Copy link
Member

Option 3 is pretty damned cool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants