Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamping the Documentation #1083

Closed
malefice opened this issue Dec 17, 2019 · 13 comments · Fixed by #1128
Closed

Revamping the Documentation #1083

malefice opened this issue Dec 17, 2019 · 13 comments · Fixed by #1128

Comments

@malefice
Copy link
Contributor

With the current way the provider method documentation is generated, the result lacks details and examples that will showcase the flexibility and capability of each provider method. I mentioned in #1074 about the idea of using sphinx.ext.autodoc to generate docs, and I already have some ideas on how to implement this, but I want to hear some thoughts and get some feedback.

Since build_docs already traverse through each Provider class and its methods, we can change build_docs to write something like this for each provider class. This will autogenerate docs from the individual docstrings (class level and method level). We can also use sphinx.ext.viewcode to put links to the source code.

.. autoclass:: {{provider_class}}
   :members: {{comma_separated_list_of_provider_methods}}
   :inherited-members:
   :undoc-members:

As for sample code, we can probably use the autodoc-process-docstring hook to parse through examples in the docstring, and basically do what build_docs currently already does when generating dummy samples. The huge difference is that in this case, we should be able to write as many examples as we want, and let the docs building process evaluate and prettify the output. Of course, it is easier said than done, but I think it is well worth the effort that will solve all documentation woes.

Of course if we go ahead with this style, we will need to write good docstrings for all existing provider methods, and we will also require contributors to write good docstrings moving forward. As for docstrings, we will also have to set a certain standard on how they should be written, e.g. reST vs Google vs numPy, and the format of the examples so we can write the parser.

Anyway thoughts and feedback?

@fcurella
Copy link
Collaborator

Thank you so much for looking into this @malefice !

If we're going to require contributors to write docstrings, I'd like to have some kind of CI checking the PRs. Maybe we can add pydocstyle to our linting check on travis.

@malefice
Copy link
Contributor Author

Yeah sure, I will look into that. Here is a screenshot of what each provider class can look like (without sample invocations for now).

image

@malefice
Copy link
Contributor Author

malefice commented Dec 19, 2019

So I already have some dirty proof of concept code for this one. In this PoC, I used the :sample: keyword(?) followed by the arguments to be passed as the format. If the random_int docstring has something like this:

    :sample: default
    :sample: min=123
    :sample: max='152='
    :sample: step=1
    :sample: min=124, max=125,
    aaaaa=123456,
    b=12345,
    :sample: min=1, max=100

Our custom docstring preprocessor will remove those lines from the docstring, generate samples based on those lines, and convert them into this which will be appended at the end of the docstring:

:examples:
>>> fake.random_int()
2092
>>> fake.random_int(min=123)
8189
>>> fake.random_int(step=1)
3098
>>> fake.random_int(min=1, max=100)
49

Notice that the third and fifth samples were automatically omitted because there are issues in the arguments lists. And the actual docs will look like this:

image

The PoC generates samples using eval under the hood after the argument list has passed some code element validation using the ast module. Combined with good code review practices, the impact of malicious code injection should be greatly neutered, if not completely gone, and there should not be crap like fake.random_int(max=100**100**100*100) DOS'ing CI systems and the RTD site.

Anyway, your thoughts on this @fcurella?

@fcurella
Copy link
Collaborator

I'd rather avoid using eval. I'm not against putting the expected result in the docstring as well. Maybe something like this:

.. example::
   :arguments:
   :result: 2092

.. example::
   :arguments: min=123
   :result: 8189

@fcurella
Copy link
Collaborator

fcurella commented Dec 19, 2019

An alternative syntax could be:

:sample: 2092
:sample min=123: 8189
:sample min=1 max=100: 49

@malefice
Copy link
Contributor Author

malefice commented Dec 19, 2019

I chose the eval route to ease the burden of both contributors and code reviewers. Not to mention, there are some methods that spew out output that is fairly long, and including the results in the docstrings will make things uglier than necessary. Personally, I also like the examples to be reproducible for a given seed, so if we do things automagically, we can ensure that the examples are seeded properly with the same documented seed.

I know eval, exec, and compile are potentially dangerous, but if done properly for a given use case, those functions can be pretty amazing. Before dismissing eval, please first hear how my PoC is roughly implemented.

The calls to eval are only made when the sphinx docs are being created, and every call to eval is always in a situation like this:

method = 'the provider method name'
kwargs = 'stuff parsed from :sample:'

fake = Faker()
command = 'fake.{method}({kwargs})'.format(method=method, kwargs=kwargs)
validator = CodeValidator(command)
if not validator.errors:
    eval(command, {'__builtins__': None}, {})
else:
    ignore_command_string()

The CodeValidator class here is a subclass of ast.NodeVisitor that calls ast.parse on the supplied command, and then perform some validation. The most important part of this validation is that it ensures that the command string can only have one instance of variable access, one instance of attribute access, and one instance of a function/method call. Since the command string is always of the form fake.{method}(), all those instances are automatically used up, and effectively, kwargs that contain function calls of any kind will fail the validation.

To beef the validation up further, you cannot even do any other operation. You cannot perform arithmetic, slice a list, or even evaluate True is True. The arguments can only strictly be literals, and even the allowed literals have been limited to numeric types, strings, bytes, lists, tuples, sets, dicts, True, False, None. With the way eval was called, even builtins are disabled.

Anyway, in summary, given that (a) eval will never be called by code that users will use, (b) there is a strict whitelist and usage quotas of allowed code elements in the command string, and (c) there is code review that protects the sole entry point to all eval calls, I think it is reasonably safe to use eval for this purpose.

@fcurella
Copy link
Collaborator

@malefice I'm ok with eval if we're pretty sure there's no way to break RTD, even accidentally.

@malefice
Copy link
Contributor Author

Yeah, the validator class will definitely have tests, and I am actually already done with that. I will let you be the judge once I make the PR, though this will probably be after the Python 2.7 EOL cleanup. With the EOL just around 5 days away, it does not make sense for me to write this with Python 2.7 compatibility in mind.

@malefice
Copy link
Contributor Author

In the current docs, each locale has links to every "provider category" module even if one has not been localized. Personal experience, I found it a bit confusing at first, since the current setup gives an impression that every "provider category" module has been localized for the listed locales.

@fcurella would you prefer the localized "provider category" modules to be presented this way? In my opinion at least, it is a lot clearer which modules have been localized.

image

image

@fcurella
Copy link
Collaborator

@malefice I find your solution much cleaner too. Go for it!

@malefice
Copy link
Contributor Author

malefice commented Dec 28, 2019

@fcurella I would like some feedback and suggestions regarding my work thus far before I proceed further. You can view the sample docs here, and if you want to see the preliminary (but not PoC dirty) code for this, you can view it here.

There are a lot of methods that still do not have docstrings, and that is really the bulk of the work for this initiative, so to give you an idea of what every page should eventually look like, I finished the docstrings for the barcode provider methods which you can view here.

@fcurella
Copy link
Collaborator

@malefice I've checked the code on your branch, it looks really nice ✨!

I don't really have anything I would change. LGTM 👍

@malefice
Copy link
Contributor Author

malefice commented Dec 31, 2019

Alright, so I will probably start writing some docstrings, but like I said earlier, I do not plan to make this Python 2.7 compatible since EOL is tomorrow. Shall we start tackling #1063?

EDIT after 16 days:
Nevermind. The 2 to 3 transition is finally finished! I will get this out as soon as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants