-
-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cross-reference links to parameter types #150
Conversation
0e6a90f
to
53cf0ff
Compare
Tokens of the type description that are determined to be "link-worthy" are enclosed in a new role called `xref_param_type`. This role when when processed adds a `pending_xref` node to the DOM. If these types cross-references are not resolved when the build ends, sphinx does not complain. This forgives errors made when deciding whether tokens are "link-worthy". And provided text from the type description is not lost in the processing, the only unwanted outcome is a type link (due to coincidence) when none was desired. Added two options: 1. numpydoc_xref_param_type 2. numpydoc_xref_aliases
e93218e
to
616c4b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few queries on unclear things.
Have you tried to find similar functionality in other doc builders? How do they specify this kind of thing?
numpydoc/xref.py
Outdated
QUALIFIED_NAME_RE = re.compile( | ||
# e.g int, numpy.array, ~numpy.array | ||
r'^' | ||
r'[~\.]?' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does beginning with . mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.class_in_current_module
numpydoc/xref.py
Outdated
r'(``.+?``)' | ||
) | ||
|
||
IGNORE = {'of', ' of ', 'either', 'or', 'with', 'in', 'default'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these settings be wrapped in a class, e.g. NumpyDocXrefSettings
so that the user can do the following in config:
numpydoc_xref_param_type = NumpyDocXrefSettings()
numpydoc_xref_param_type.ignore.update(['hello'])
or perhaps
numpydoc_xref_param_type.add_to_ignore(['hello'])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be configurable, however that set really doesn't affect the final outcome. It is just to minimise useless markup from being inserted.
numpydoc/xref.py
Outdated
# - join the results with the pattern | ||
|
||
# endswith ', optional' | ||
if param_type.endswith(', optional'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this strictly need to be at the end? is there a reason not to just include optional
in ignore?
numpydoc/xref.py
Outdated
param_type[:-10], | ||
xref_aliases) | ||
|
||
# Any sort of bracket '[](){}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need special casing aside from other punctuation? The only thing I see it doing is ignoring the first token (dict, list, tuple)...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to be precise in targeting the common patterns. I think loosening up would reduce the code.
numpydoc/xref.py
Outdated
return param_type | ||
|
||
|
||
def xref_param_type_role(role, rawtext, text, lineno, inliner, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we adding a new role, when we could just allow the user to specify a role? How does this differ from existing default_role
candidates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The role is an implementation detail (internal use). I'll add a detailed comment about it in the file.
numpydoc/xref.py
Outdated
return _split_and_apply_re(param_type, DOUBLE_QUOTE_SPLIT_RE) | ||
|
||
# Is splittable | ||
for splitter in [' or ', ', ', ' ']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change it to a regex (to accommodate the comment below), but I do not think it needs to be configurable. I think if type information used in the scientific python ecosystem gets more complicated, then other parts of this function will be found wanting too.
numpydoc/xref.py
Outdated
# Is splittable | ||
for splitter in [' or ', ', ', ' ']: | ||
if splitter in param_type: | ||
return _split_and_apply_str(param_type, splitter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we are doing this recursively for each delimiter rather than a complete tokenization step followed by handling each token?
numpydoc/xref.py
Outdated
end) | ||
|
||
# May have an unsplittable literal | ||
if '``' in param_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not similarly need to handle strings with spaces in them? I don't know where such are used, but I could certainly imagine it.
Also existing markup may include things like :ref:`blah blah <foo>`
which I don't think are currently handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
616c4b9
to
9b5b5f9
Compare
Yes, see comment |
Though I think the implementation catches nearly every bit of type information in numpy and scipy docstrings, it would help if someone can build any of the documentation to see it in action. I'm not capable of doing so. That would highlight any performance issues. |
By performance do you mean computational performance? I much prefer you not amending commits, so we can see the changes from review to review. We can squash upon merge. |
Yes. I saw a concern about it by someone from pandas. Sorry about the squash. Let me correct it. |
Regarding a test bed, I'll get Scikit-learn docs rendered at scikit-learn/scikit-learn#10421 |
- Also changed the role used to create links from `obj` to `class`.
9b5b5f9
to
0f5c33c
Compare
So here is rendered scikit-learn API reference: https://16406-843222-gh.circle-artifacts.com/0/doc/modules/classes.html Comparing these two runs in time: 11:40 became 12:50 (~8% increase). Not negligible, but not egregious if the links are helpful. Rendered docs are also larger: 26.3KB became 28.1KB (7% increase) for DBSCAN as a single example. Note that using the custom role places an I'm not sure that the numpydoc_xref_aliases is quite working as intended either: aliasing 'string' to 'str' means that 'str' will appear; 'sparse matrix' becomes 'sparse numpy.matrix' with the example aliases. I think you need some |
Btw, feel free to copy my PR to scikit-learn so you can play with it, and I'll close mine. |
[fixed some typos above] |
|
||
ROLE_SPLIT_RE = re.compile( | ||
# splits to preserve ReST roles | ||
r'(:\w+:`.+?(?<!\\)`)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't checked: is ` allowed in anchor text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But that regex will not catch pathological use use of backslashes and ticks.
numpydoc/xref.py
Outdated
``xref_param_type`` role. | ||
""" | ||
if param_type in xref_aliases: | ||
param_type = xref_aliases[param_type] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should perhaps have:
if QUALIFIED_NAME_RE.match(xref_aliases[param_type]):
return ':xref_param_type:`%s <%s>`' % (xref_aliases[param_type], param_type)
... or preprocess xref_aliases
Looking at some of the test times, there seems to be some variance. numpydoc changed output html markup, so the |
- No tildes in aliases, the keys are the titles. - Do not emphasize text - Fix bug, open brackets cannot be to the left of the quote that ends a role.
`nodes.inline` adds a `span` tag. `nodes.Text` adds no tag.
- Do not split singly quoted expressions The avoid edgecases that lead to bad rst markup. - Split only when there is a space after a comma. - Do not split on close brackets if they are followed by a linkable token.
I've created a demo to show how the cross-references generated for all type strings from some common packages. For each page, the top row shows the original string and the bottom row shows the cross-referenced result. |
Neat!
What's the best way for me to configure an ignored word (e.g. object)? is
there any way to try linking to terms as well?
…On 10 Jan 2018 2:13 am, "Hassan Kibirige" ***@***.***> wrote:
I've created a demo <https://has2k1.github.io/param-type-demo/index.html>
to show how the cross-references generated for all type strings from some
common packages.
For each page, the top row shows the original string and the bottom row
shows the cross-referenced result.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#150 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yNkzZOPrZeRaYTeye6307UIealyks5tI4IvgaJpZM4RPfj_>
.
|
Maybe expose the
The aliases dict can do terms. |
yes, I mean to link to any terms matched by default. I suppose we could
read in our glossary if needed.
|
That would not be fool proof, terms can have spaces and we don't deal well with spaces. |
btw the regex you appear to be using in your demo is over-generating at
least on a line with ': ' in it; it should require ' : ' at a minimum
…On 10 Jan 2018 11:50 am, "Hassan Kibirige" ***@***.***> wrote:
I suppose we could read in our glossary if needed
That would not be fool proof, terms can have spaces and we don't deal well
with spaces.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#150 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEz649lWFpIvJRj45YV7tJaabVgyqAFks5tJAk8gaJpZM4RPfj_>
.
|
The need to prefix module-local names with . is a bit annoying. Is there a
way to make this more automagical?
|
I think trailing punctuation is currently inhibiting rexognition. e.g.
"array."
…On 10 Jan 2018 12:05 pm, "Joel Nothman" ***@***.***> wrote:
The need to prefix module-local names with . is a bit annoying. Is there a
way to make this more automagical?
|
The dot-names do look weird when not enclosed in a class role, but I don't think there is room for cleverness. I saw the trialling punctuation thing, it may not be worth the trouble. |
I would like to be able to refer to neighbors.NearestNeighbors, but it
seems that it won't be linked unless I use sklearn.neighbors.NearestNeighbors.
How would you solve this?
There are certainly cases where I would like to be able to link multi-word
expressions... What solutions might be feasible?
I've taken a look at all of scikit-learn's type specs in your demo. It
looks pretty good, though most of the links currently produced are not very
informative (do I really want to link for str, list and dict?? for int and
float and double?). There are a few terms I would like unlinked (object,
type, matrix), and there are many that I would need to manually produce
links for: estimator, estimators, classifiers, Bunch, callable, generator,
cross-validation generator and similar, joblib.Memory, csc_matrix, sparse
matrix, booleans, DecisionTreeRegressor, number, scipy.sparse, RandomState,
pair, record array, seq, indexable, ?function, various specific class names
prefixed by module or not. You've also alerted me to some omissions in our
glossary, and non-standard uses of type specs in docstrings.
|
With thanks to the demo of numpy/numpydoc#150
With thanks to the demo of numpy/numpydoc#150
To cross-reference strings with spaces like Initially I thought the aliases dictionary would be small, but it now looks like we want it to be as instrumental as the Rosetta Stone. For a package like sklearn may be you can generate the On the whole, the parameter types across the whole scipython ecosystem could do with some attempt at standardisation. For example, entries like
Luckily those are the only complaints in the whole demo, and I think the related docs are not part of the user API. |
I don't think standardisation improves quality enough to justify
maintenance costs. Yes, were those docs being rendered, we would likely
have caught that issue and escaped or enclosed the offending terms.
Are you sure generating the aliases is the best way to go?
…On 10 January 2018 at 15:13, Hassan Kibirige ***@***.***> wrote:
To cross-reference strings with spaces like sparse matrix, it may be
useful to change the documentation to sparse-matrix.
Initially I thought the aliases dictionary would be small, but it now
looks like we want it to be as instrumental as the Rosetta Stone. For a
package like sklearn may be you can generate the
UpperCaseClassName alias entries from __all__ in the init files, you have
the benefit of good class naming conventions.
On the whole, the parameter types across the whole scipython ecosystem
could do with some attempt at standardisation. For example, entries like list,
length = len(coefs_) + len(intercepts_) where the code is not
double-quoted are not handled with nicely; those underscores make sphinx
complain
param-type-demo/source/sklearn-param-types.rst:3601: WARNING: Unknown target name: "coefs".
param-type-demo/source/sklearn-param-types.rst:3601: WARNING: Unknown target name: "intercepts".
Luckily those are the only complaints in the whole demo, and I think the
related docs are not part of the user API.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#150 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEz68Z7wKnXX9-205oWmfeSlBYeO1Ijks5tJDjjgaJpZM4RPfj_>
.
|
For sklearn, yes. I think it approaches the auto-magic that you wished for, i.e leaving out the dot. The naming conventions seem to be strictly followed and the classes are imported for the user API in a uniform manner. |
With thanks to the demo of numpy/numpydoc#150
The set of words to ignore is now configurable; the variation in building time conceals any performance impact; and that checks off my boxes. As the default is set to enable this feature, maybe a notice to other projects! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether I'll be able to give this another full review soon
``True`` by default. | ||
numpydoc_xref_aliases : dict | ||
Mappings to fully qualified paths (or correct ReST references) for the | ||
aliases/shortcuts used when specifying the types of parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention no spaces allowed
I am 👍 in principle to the functionality, but don't have the bandwidth to do a code review. |
I'm not certain it should be enabled by default, so that's something else
for reviewers to comment on
…On 17 Jan 2018 2:24 am, "Thomas A Caswell" ***@***.***> wrote:
I am 👍 in principle to the functionality, but don't have the bandwidth
to do a code review.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#150 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yd2FkY19GxFeF9_y1M7rKVE6e_Hks5tLL8egaJpZM4RPfj_>
.
|
I have wanted something like this for a while. It would be great to resurrect this.
The way that the And ideally a Does that sound tractable? Any interest in coming back to this? I have recently been working on the SciPy docs, and maintain docs for MNE-Python, which follows a lot of the same conventions as |
I agree this would be safer as opt-in. |
Okay, let me try it with MNE and SciPy and see what happens |
@has2k1 can you rebase on latest |
Whether to create cross-references for the parameter types in the | ||
``Parameters``, ``Other Parameters``, ``Returns`` and ``Yields`` | ||
sections of the docstring. | ||
``True`` by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's opt-in, then this docstring should be changed to say "False
by default"
@@ -146,6 +152,9 @@ def setup(app, get_doc_object_=get_doc_object): | |||
app.add_config_value('numpydoc_show_inherited_class_members', True, True) | |||
app.add_config_value('numpydoc_class_members_toctree', True, True) | |||
app.add_config_value('numpydoc_citation_re', '[a-z0-9_.-]+', True) | |||
app.add_config_value('numpydoc_xref_param_type', True, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the default is still True, so it's opt-out currently.
Tokens of the type description that are determined to be "link-worthy"
are enclosed in a new role called
xref_param_type
. This role whenwhen processed adds a
pending_xref
node to the DOM. If these typescross-references are not resolved when the build ends, sphinx does
not complain. This forgives errors made when deciding whether tokens
are "link-worthy". And provided text from the type description is not
lost in the processing, the only unwanted outcome is a type link (due
to coincidence) when none was desired.
Added two options:
numpydoc_xref_param_type
numpydoc_xref_aliases
closes #57