Revamping the Documentation #1083

malefice · 2019-12-17T16:01:38Z

With the current way the provider method documentation is generated, the result lacks details and examples that will showcase the flexibility and capability of each provider method. I mentioned in #1074 about the idea of using sphinx.ext.autodoc to generate docs, and I already have some ideas on how to implement this, but I want to hear some thoughts and get some feedback.

Since build_docs already traverse through each Provider class and its methods, we can change build_docs to write something like this for each provider class. This will autogenerate docs from the individual docstrings (class level and method level). We can also use sphinx.ext.viewcode to put links to the source code.

.. autoclass:: {{provider_class}}
   :members: {{comma_separated_list_of_provider_methods}}
   :inherited-members:
   :undoc-members:

As for sample code, we can probably use the autodoc-process-docstring hook to parse through examples in the docstring, and basically do what build_docs currently already does when generating dummy samples. The huge difference is that in this case, we should be able to write as many examples as we want, and let the docs building process evaluate and prettify the output. Of course, it is easier said than done, but I think it is well worth the effort that will solve all documentation woes.

Of course if we go ahead with this style, we will need to write good docstrings for all existing provider methods, and we will also require contributors to write good docstrings moving forward. As for docstrings, we will also have to set a certain standard on how they should be written, e.g. reST vs Google vs numPy, and the format of the examples so we can write the parser.

Anyway thoughts and feedback?

The text was updated successfully, but these errors were encountered:

fcurella · 2019-12-17T16:21:24Z

Thank you so much for looking into this @malefice !

If we're going to require contributors to write docstrings, I'd like to have some kind of CI checking the PRs. Maybe we can add pydocstyle to our linting check on travis.

malefice · 2019-12-17T16:43:31Z

Yeah sure, I will look into that. Here is a screenshot of what each provider class can look like (without sample invocations for now).

malefice · 2019-12-19T09:46:40Z

So I already have some dirty proof of concept code for this one. In this PoC, I used the :sample: keyword(?) followed by the arguments to be passed as the format. If the random_int docstring has something like this:

    :sample: default
    :sample: min=123
    :sample: max='152='
    :sample: step=1
    :sample: min=124, max=125,
    aaaaa=123456,
    b=12345,
    :sample: min=1, max=100

Our custom docstring preprocessor will remove those lines from the docstring, generate samples based on those lines, and convert them into this which will be appended at the end of the docstring:

:examples:
>>> fake.random_int()
2092
>>> fake.random_int(min=123)
8189
>>> fake.random_int(step=1)
3098
>>> fake.random_int(min=1, max=100)
49

Notice that the third and fifth samples were automatically omitted because there are issues in the arguments lists. And the actual docs will look like this:

The PoC generates samples using eval under the hood after the argument list has passed some code element validation using the ast module. Combined with good code review practices, the impact of malicious code injection should be greatly neutered, if not completely gone, and there should not be crap like fake.random_int(max=100**100**100*100) DOS'ing CI systems and the RTD site.

Anyway, your thoughts on this @fcurella?

fcurella · 2019-12-19T15:51:59Z

I'd rather avoid using eval. I'm not against putting the expected result in the docstring as well. Maybe something like this:

.. example::
   :arguments:
   :result: 2092

.. example::
   :arguments: min=123
   :result: 8189

fcurella · 2019-12-19T15:56:52Z

An alternative syntax could be:

:sample: 2092
:sample min=123: 8189
:sample min=1 max=100: 49

malefice · 2019-12-19T17:01:58Z

I chose the eval route to ease the burden of both contributors and code reviewers. Not to mention, there are some methods that spew out output that is fairly long, and including the results in the docstrings will make things uglier than necessary. Personally, I also like the examples to be reproducible for a given seed, so if we do things automagically, we can ensure that the examples are seeded properly with the same documented seed.

I know eval, exec, and compile are potentially dangerous, but if done properly for a given use case, those functions can be pretty amazing. Before dismissing eval, please first hear how my PoC is roughly implemented.

The calls to eval are only made when the sphinx docs are being created, and every call to eval is always in a situation like this:

method = 'the provider method name'
kwargs = 'stuff parsed from :sample:'

fake = Faker()
command = 'fake.{method}({kwargs})'.format(method=method, kwargs=kwargs)
validator = CodeValidator(command)
if not validator.errors:
    eval(command, {'__builtins__': None}, {})
else:
    ignore_command_string()

The CodeValidator class here is a subclass of ast.NodeVisitor that calls ast.parse on the supplied command, and then perform some validation. The most important part of this validation is that it ensures that the command string can only have one instance of variable access, one instance of attribute access, and one instance of a function/method call. Since the command string is always of the form fake.{method}(), all those instances are automatically used up, and effectively, kwargs that contain function calls of any kind will fail the validation.

To beef the validation up further, you cannot even do any other operation. You cannot perform arithmetic, slice a list, or even evaluate True is True. The arguments can only strictly be literals, and even the allowed literals have been limited to numeric types, strings, bytes, lists, tuples, sets, dicts, True, False, None. With the way eval was called, even builtins are disabled.

Anyway, in summary, given that (a) eval will never be called by code that users will use, (b) there is a strict whitelist and usage quotas of allowed code elements in the command string, and (c) there is code review that protects the sole entry point to all eval calls, I think it is reasonably safe to use eval for this purpose.

fcurella · 2019-12-23T15:04:11Z

@malefice I'm ok with eval if we're pretty sure there's no way to break RTD, even accidentally.

malefice · 2019-12-26T06:07:04Z

Yeah, the validator class will definitely have tests, and I am actually already done with that. I will let you be the judge once I make the PR, though this will probably be after the Python 2.7 EOL cleanup. With the EOL just around 5 days away, it does not make sense for me to write this with Python 2.7 compatibility in mind.

malefice · 2019-12-26T15:25:29Z

In the current docs, each locale has links to every "provider category" module even if one has not been localized. Personal experience, I found it a bit confusing at first, since the current setup gives an impression that every "provider category" module has been localized for the listed locales.

@fcurella would you prefer the localized "provider category" modules to be presented this way? In my opinion at least, it is a lot clearer which modules have been localized.

fcurella · 2019-12-27T17:46:59Z

@malefice I find your solution much cleaner too. Go for it!

malefice · 2019-12-28T10:14:33Z

@fcurella I would like some feedback and suggestions regarding my work thus far before I proceed further. You can view the sample docs here, and if you want to see the preliminary (but not PoC dirty) code for this, you can view it here.

There are a lot of methods that still do not have docstrings, and that is really the bulk of the work for this initiative, so to give you an idea of what every page should eventually look like, I finished the docstrings for the barcode provider methods which you can view here.

fcurella · 2019-12-30T15:15:14Z

@malefice I've checked the code on your branch, it looks really nice ✨!

I don't really have anything I would change. LGTM 👍

malefice · 2019-12-31T11:05:52Z

Alright, so I will probably start writing some docstrings, but like I said earlier, I do not plan to make this Python 2.7 compatible since EOL is tomorrow. Shall we start tackling #1063?

EDIT after 16 days:
Nevermind. The 2 to 3 transition is finally finished! I will get this out as soon as I can.

malefice mentioned this issue Dec 20, 2019

Test Suite Housekeeping (Part 2) #1084

Merged

malefice mentioned this issue Mar 12, 2020

Provider method autodoc and sample autogeneration #1128

Merged

malefice mentioned this issue Mar 23, 2020

'Generator' object has no attribute ... #1139

Closed

fcurella closed this as completed in #1128 Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamping the Documentation #1083

Revamping the Documentation #1083

malefice commented Dec 17, 2019

fcurella commented Dec 17, 2019

malefice commented Dec 17, 2019

malefice commented Dec 19, 2019 •

edited

Loading

fcurella commented Dec 19, 2019

fcurella commented Dec 19, 2019 •

edited

Loading

malefice commented Dec 19, 2019 •

edited

Loading

fcurella commented Dec 23, 2019

malefice commented Dec 26, 2019

malefice commented Dec 26, 2019

fcurella commented Dec 27, 2019

malefice commented Dec 28, 2019 •

edited

Loading

fcurella commented Dec 30, 2019

malefice commented Dec 31, 2019 •

edited

Loading

Revamping the Documentation #1083

Revamping the Documentation #1083

Comments

malefice commented Dec 17, 2019

fcurella commented Dec 17, 2019

malefice commented Dec 17, 2019

malefice commented Dec 19, 2019 • edited Loading

fcurella commented Dec 19, 2019

fcurella commented Dec 19, 2019 • edited Loading

malefice commented Dec 19, 2019 • edited Loading

fcurella commented Dec 23, 2019

malefice commented Dec 26, 2019

malefice commented Dec 26, 2019

fcurella commented Dec 27, 2019

malefice commented Dec 28, 2019 • edited Loading

fcurella commented Dec 30, 2019

malefice commented Dec 31, 2019 • edited Loading

malefice commented Dec 19, 2019 •

edited

Loading

fcurella commented Dec 19, 2019 •

edited

Loading

malefice commented Dec 19, 2019 •

edited

Loading

malefice commented Dec 28, 2019 •

edited

Loading

malefice commented Dec 31, 2019 •

edited

Loading