Skip to content

ENH: Add configuration option for parameter cross-referencing #295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 1, 2021

Conversation

rossbar
Copy link
Contributor

@rossbar rossbar commented Aug 18, 2020

The automatic cross-referencing in the param_type is a nice feature. However, activating this feature in a large project with many docstring (e.g. numpy) introduces many new broken links as all terms in the parameter-type lines are wrapped in the default :obj: role if they are not explicitly mentioned in either the numpydoc_xref_aliases dict or the numpydoc_xref_ignore set. Here's an example from the test suite:

>>> from numpydoc.xref import make_xref, DEFAULT_LINKS
>>> param_type = '(..., M, N) array_like'  # Line to be parsed
>>> terms_to_ignore = {'of', 'with', 'default', 'optional'}
>>> make_xref(param_type, DEFAULT_LINKS, terms_to_ignore)
'(..., :obj:`M`, :obj:`N`) :term:`numpy:array_like`'

Notice how M and N are wrapped in an :obj: role. If the user wants to prevent this, they have to add 'M' and 'N' to numpydoc_xref_ignore (terms_to_ignore in the above example). For large libraries with many different parameter type descriptions, adding all of the terms that are to be ignored can be a big undertaking that prevents the use of this feature.

This PR introduces a new config parameter (tentatively called numpydoc_xref_wrap_all - open to suggestions) that toggles the wrapping of "unrecognized" terms in the :obj: role. If the user sets this value to False, then only values that are explicitly named in numpydoc_xref_aliases (and not in numpydoc_xref_ignore) are replaced. This allows for incremental adoption of the cross-referencing feature without requiring that users have fully-specified mappings/sets with terms to replace/ignore to prevent the introduction of broken links. The default value of numpydoc_xref_wrap_all=True keeps the current behavior so that users who are already using this feature are unaffected.

Add a kwarg to make_xref to toggle the automatic wrapping of every term
not in xref_ignore in an :obj: role.
@codecov-commenter
Copy link

codecov-commenter commented Aug 18, 2020

Codecov Report

Merging #295 into master will increase coverage by 0.46%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #295      +/-   ##
==========================================
+ Coverage   93.11%   93.57%   +0.46%     
==========================================
  Files           7        7              
  Lines        1263     1261       -2     
==========================================
+ Hits         1176     1180       +4     
+ Misses         87       81       -6     

@larsoner
Copy link
Collaborator

This PR introduces a new config parameter (tentatively called numpydoc_xref_wrap_all - open to suggestions)

Another option would be to allow numpydoc_xref_ignore = True to mean "ignore anything not in numpydoc_xref_aliases". Or if that's not clear enough, something like numpydoc_xref_ignore = 'all' or something could work, too.

Overall +1 on this idea/PR

@rossbar rossbar changed the title ENH: Add configuration parameter for parameter cross-referencing ENH: Add configuration option for parameter cross-referencing Aug 19, 2020
@rossbar
Copy link
Contributor Author

rossbar commented Aug 21, 2020

Thanks for the feedback! I gave the numpydoc_xref_ignore = set or 'all' solution a try in f2f1867. On the plus side there's one fewer configuration value to keep track of, though describing the overloading of the xref_ignore make the parameter description a bit more verbose.

I also had to add some input checking for numpydoc_xref_ignore to make sure the parameter is handled correctly based on when it's a set vs. string. For now I put that checking in make_xref, but this necessitated a few new variable names due to the fact that make_xref is called recursively. A better solution (I think) would be to do this check further upstream (i.e. when the config is first read) but I wasn't 100% sure which sphinx signal was appropriate to hook into.

LMK what you think of this alternative. Also, if it makes review easier, I'm happy to submit f2f1867 in a separate PR so that it's easier to compare the two alternatives.

@larsoner
Copy link
Collaborator

A better solution (I think) would be to do this check further upstream (i.e. when the config is first read) but I wasn't 100% sure which sphinx signal was appropriate to hook into.

The two would be config-inited or builder-inited, but we already connect to those:

https://github.com/numpy/numpydoc/blob/master/numpydoc/numpydoc.py#L237-L240

Maybe we don't need the recursive stuff, not sure. In any case, I don't see any earlier event we could connect to. So feel free to try to make it work with update_config (which we connect to) already somehow. If you can't, the changes here already aren't too bad / seem acceptable

@rossbar
Copy link
Contributor Author

rossbar commented Sep 29, 2020

So feel free to try to make it work with update_config (which we connect to) already somehow.

I've revisited this a bit and haven't been able to come up with a good way to keep from modifying make_xref directly. I had hoped that the configuration information would be enough to avoid calling make_xref altogether for some configurations, but the recursive re parsing in make_xref was a blocker to this approach.

I'd like to submit this for review in its current form. I did some very crude profiling to make sure the modifications to make_xref didn't slow it down too much:

from numpydoc.tests.test_xref import data, xref_aliases, xref_ignore
from numpydoc.xref import make_xref
param_descrs = [s.split('\n')[0] for s in data.strip().split('\n\n')]
%%timeit
for param in param_descrs:
    make_xref(param_descrs[0], xref_aliases, xref_ignore)

Which gave:

676a8d4:
529 µs ± 8.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

f2f1867:
545 µs ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I also built the numpy documentation with both versions of numpydoc and saw no appreciable difference in build time between the two.

Copy link
Collaborator

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, +1 for merge

If nobody else looks in the next couple of days feel free to ping me for a merge!

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me too, in it goes. Thanks @rossbar and @larsoner

@rgommers rgommers merged commit 8275f4e into numpy:master Jan 1, 2021
@jarrodmillman jarrodmillman added this to the 1.2.0 milestone Jan 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants