Add cross-reference links to parameter types #150

has2k1 · 2017-12-30T02:54:39Z

Tokens of the type description that are determined to be "link-worthy"
are enclosed in a new role called xref_param_type. This role when
when processed adds a pending_xref node to the DOM. If these types
cross-references are not resolved when the build ends, sphinx does
not complain. This forgives errors made when deciding whether tokens
are "link-worthy". And provided text from the type description is not
lost in the processing, the only unwanted outcome is a type link (due
to coincidence) when none was desired.

Added two options:

numpydoc_xref_param_type
numpydoc_xref_aliases

closes #57

Tokens of the type description that are determined to be "link-worthy" are enclosed in a new role called `xref_param_type`. This role when when processed adds a `pending_xref` node to the DOM. If these types cross-references are not resolved when the build ends, sphinx does not complain. This forgives errors made when deciding whether tokens are "link-worthy". And provided text from the type description is not lost in the processing, the only unwanted outcome is a type link (due to coincidence) when none was desired. Added two options: 1. numpydoc_xref_param_type 2. numpydoc_xref_aliases

jnothman

A few queries on unclear things.

Have you tried to find similar functionality in other doc builders? How do they specify this kind of thing?

jnothman · 2018-01-08T01:39:37Z

numpydoc/xref.py

+QUALIFIED_NAME_RE = re.compile(
+    # e.g int, numpy.array, ~numpy.array
+    r'^'
+    r'[~\.]?'


what does beginning with . mean?

.class_in_current_module

jnothman · 2018-01-08T01:42:37Z

numpydoc/xref.py

+    r'(``.+?``)'
+)
+
+IGNORE = {'of', ' of ', 'either', 'or', 'with', 'in', 'default'}


Should these settings be wrapped in a class, e.g. NumpyDocXrefSettings so that the user can do the following in config:

numpydoc_xref_param_type = NumpyDocXrefSettings() numpydoc_xref_param_type.ignore.update(['hello'])

or perhaps

numpydoc_xref_param_type.add_to_ignore(['hello'])

It can be configurable, however that set really doesn't affect the final outcome. It is just to minimise useless markup from being inserted.

jnothman · 2018-01-08T01:46:19Z

numpydoc/xref.py

+    #   - join the results with the pattern
+
+    # endswith ', optional'
+    if param_type.endswith(', optional'):


Does this strictly need to be at the end? is there a reason not to just include optional in ignore?

jnothman · 2018-01-08T01:52:32Z

numpydoc/xref.py

+            param_type[:-10],
+            xref_aliases)
+
+    # Any sort of bracket '[](){}'


Why does this need special casing aside from other punctuation? The only thing I see it doing is ignoring the first token (dict, list, tuple)...?

I tried to be precise in targeting the common patterns. I think loosening up would reduce the code.

jnothman · 2018-01-08T01:53:46Z

numpydoc/xref.py

+    return param_type
+
+
+def xref_param_type_role(role, rawtext, text, lineno, inliner,


why are we adding a new role, when we could just allow the user to specify a role? How does this differ from existing default_role candidates?

The role is an implementation detail (internal use). I'll add a detailed comment about it in the file.

jnothman · 2018-01-08T01:56:24Z

numpydoc/xref.py

+        return _split_and_apply_re(param_type, DOUBLE_QUOTE_SPLIT_RE)
+
+    # Is splittable
+    for splitter in [' or ', ', ', ' ']:


This should be configurable.

I'll change it to a regex (to accommodate the comment below), but I do not think it needs to be configurable. I think if type information used in the scientific python ecosystem gets more complicated, then other parts of this function will be found wanting too.

jnothman · 2018-01-08T01:58:30Z

numpydoc/xref.py

+    # Is splittable
+    for splitter in [' or ', ', ', ' ']:
+        if splitter in param_type:
+            return _split_and_apply_str(param_type, splitter)


Is there a reason we are doing this recursively for each delimiter rather than a complete tokenization step followed by handling each token?

jnothman · 2018-01-08T02:00:46Z

numpydoc/xref.py

+                        end)
+
+    # May have an unsplittable literal
+    if '``' in param_type:


Do we not similarly need to handle strings with spaces in them? I don't know where such are used, but I could certainly imagine it.

Also existing markup may include things like :ref:`blah blah <foo>` which I don't think are currently handled.

Nice catch.

has2k1 · 2018-01-08T10:28:16Z

Have you tried to find similar functionality in other doc builders? How do they specify this kind of thing?

Yes, see comment

has2k1 · 2018-01-08T10:36:16Z

Though I think the implementation catches nearly every bit of type information in numpy and scipy docstrings, it would help if someone can build any of the documentation to see it in action. I'm not capable of doing so. That would highlight any performance issues.

jnothman · 2018-01-08T10:48:16Z

That would highlight any performance issues.

By performance do you mean computational performance?

I much prefer you not amending commits, so we can see the changes from review to review. We can squash upon merge.

has2k1 · 2018-01-08T10:52:20Z

By performance do you mean computational performance?

Yes. I saw a concern about it by someone from pandas.

Sorry about the squash. Let me correct it.

jnothman · 2018-01-08T10:53:58Z

Regarding a test bed, I'll get Scikit-learn docs rendered at scikit-learn/scikit-learn#10421

- Also changed the role used to create links from `obj` to `class`.

jnothman · 2018-01-08T12:25:02Z

So here is rendered scikit-learn API reference: https://16406-843222-gh.circle-artifacts.com/0/doc/modules/classes.html
Here's an example before this PR: https://16387-843222-gh.circle-artifacts.com/0/doc/modules/classes.html.

Comparing these two runs in time: 11:40 became 12:50 (~8% increase). Not negligible, but not egregious if the links are helpful.

Rendered docs are also larger: 26.3KB became 28.1KB (7% increase) for DBSCAN as a single example. Note that using the custom role places an <em> around every term that is not linked. This increases page size, and may affect rendering if use_blockquotes=True, or if stylesheets are not default.

I'm not sure that the numpydoc_xref_aliases is quite working as intended either: aliasing 'string' to 'str' means that 'str' will appear; 'sparse matrix' becomes 'sparse numpy.matrix' with the example aliases. I think you need some <display text> in your :xref...: markup.

jnothman · 2018-01-08T12:25:38Z

Btw, feel free to copy my PR to scikit-learn so you can play with it, and I'll close mine.

jnothman · 2018-01-08T12:27:07Z

[fixed some typos above]

jnothman · 2018-01-08T12:38:25Z

numpydoc/xref.py

+
+ROLE_SPLIT_RE = re.compile(
+    # splits to preserve ReST roles
+    r'(:\w+:`.+?(?<!\\)`)'


I haven't checked: is ` allowed in anchor text?

Yes. But that regex will not catch pathological use use of backslashes and ticks.

jnothman · 2018-01-08T12:38:55Z

numpydoc/xref.py

+        ``xref_param_type`` role.
+    """
+    if param_type in xref_aliases:
+        param_type = xref_aliases[param_type]


This should perhaps have:

if QUALIFIED_NAME_RE.match(xref_aliases[param_type]): return ':xref_param_type:`%s <%s>`' % (xref_aliases[param_type], param_type)

... or preprocess xref_aliases

has2k1 · 2018-01-08T16:03:03Z

Looking at some of the test times, there seems to be some variance. numpydoc changed output html markup, so the <em> is redundant.

- No tildes in aliases, the keys are the titles. - Do not emphasize text - Fix bug, open brackets cannot be to the left of the quote that ends a role.

`nodes.inline` adds a `span` tag. `nodes.Text` adds no tag.

- Do not split singly quoted expressions The avoid edgecases that lead to bad rst markup. - Split only when there is a space after a comma. - Do not split on close brackets if they are followed by a linkable token.

has2k1 · 2018-01-09T15:13:51Z

I've created a demo to show how the cross-references generated for all type strings from some common packages.

For each page, the top row shows the original string and the bottom row shows the cross-referenced result.

jnothman · 2018-01-09T21:43:54Z

Neat! What's the best way for me to configure an ignored word (e.g. object)? is there any way to try linking to terms as well?

…

On 10 Jan 2018 2:13 am, "Hassan Kibirige" ***@***.***> wrote: I've created a demo <https://has2k1.github.io/param-type-demo/index.html> to show how the cross-references generated for all type strings from some common packages. For each page, the top row shows the original string and the bottom row shows the cross-referenced result. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#150 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yNkzZOPrZeRaYTeye6307UIealyks5tI4IvgaJpZM4RPfj_> .

has2k1 · 2018-01-10T00:42:34Z

What's the best way for me to configure an ignored word (e.g. object)?

Maybe expose the IGNORED set as an option. As the final html no longer has markup for tokens that do not get linked, the original motivation for set is not as strong.

is there any way to try linking to terms as well?

The aliases dict can do terms.

jnothman · 2018-01-10T00:47:00Z

yes, I mean to link to any terms matched by default. I suppose we could read in our glossary if needed.

has2k1 · 2018-01-10T00:50:03Z

I suppose we could read in our glossary if needed

That would not be fool proof, terms can have spaces and we don't deal well with spaces.

jnothman · 2018-01-10T01:00:41Z

btw the regex you appear to be using in your demo is over-generating at least on a line with ': ' in it; it should require ' : ' at a minimum

…

On 10 Jan 2018 11:50 am, "Hassan Kibirige" ***@***.***> wrote: I suppose we could read in our glossary if needed That would not be fool proof, terms can have spaces and we don't deal well with spaces. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#150 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz649lWFpIvJRj45YV7tJaabVgyqAFks5tJAk8gaJpZM4RPfj_> .

jnothman · 2018-01-10T01:06:01Z

The need to prefix module-local names with . is a bit annoying. Is there a way to make this more automagical?

jnothman · 2018-01-10T01:21:33Z

I think trailing punctuation is currently inhibiting rexognition. e.g. "array."

…

On 10 Jan 2018 12:05 pm, "Joel Nothman" ***@***.***> wrote: The need to prefix module-local names with . is a bit annoying. Is there a way to make this more automagical?

has2k1 · 2018-01-10T01:44:50Z

The dot-names do look weird when not enclosed in a class role, but I don't think there is room for cleverness.

I saw the trialling punctuation thing, it may not be worth the trouble.

jnothman · 2018-01-10T01:50:33Z

I would like to be able to refer to neighbors.NearestNeighbors, but it seems that it won't be linked unless I use sklearn.neighbors.NearestNeighbors. How would you solve this? There are certainly cases where I would like to be able to link multi-word expressions... What solutions might be feasible? I've taken a look at all of scikit-learn's type specs in your demo. It looks pretty good, though most of the links currently produced are not very informative (do I really want to link for str, list and dict?? for int and float and double?). There are a few terms I would like unlinked (object, type, matrix), and there are many that I would need to manually produce links for: estimator, estimators, classifiers, Bunch, callable, generator, cross-validation generator and similar, joblib.Memory, csc_matrix, sparse matrix, booleans, DecisionTreeRegressor, number, scipy.sparse, RandomState, pair, record array, seq, indexable, ?function, various specific class names prefixed by module or not. You've also alerted me to some omissions in our glossary, and non-standard uses of type specs in docstrings.

With thanks to the demo of numpy/numpydoc#150

has2k1 · 2018-01-10T04:13:23Z

To cross-reference strings with spaces like sparse matrix, it may be useful to change the documentation to sparse-matrix.

Initially I thought the aliases dictionary would be small, but it now looks like we want it to be as instrumental as the Rosetta Stone. For a package like sklearn may be you can generate the
UpperCaseClassName alias entries from __all__ in the init files, you have the benefit of good class naming conventions.

On the whole, the parameter types across the whole scipython ecosystem could do with some attempt at standardisation. For example, entries like list, length = len(coefs_) + len(intercepts_) where the code is not double-quoted are not handled with nicely; those underscores make sphinx complain

param-type-demo/source/sklearn-param-types.rst:3601: WARNING: Unknown target name: "coefs".
param-type-demo/source/sklearn-param-types.rst:3601: WARNING: Unknown target name: "intercepts".

Luckily those are the only complaints in the whole demo, and I think the related docs are not part of the user API.

jnothman · 2018-01-10T05:04:28Z

I don't think standardisation improves quality enough to justify maintenance costs. Yes, were those docs being rendered, we would likely have caught that issue and escaped or enclosed the offending terms. Are you sure generating the aliases is the best way to go?

…

On 10 January 2018 at 15:13, Hassan Kibirige ***@***.***> wrote: To cross-reference strings with spaces like sparse matrix, it may be useful to change the documentation to sparse-matrix. Initially I thought the aliases dictionary would be small, but it now looks like we want it to be as instrumental as the Rosetta Stone. For a package like sklearn may be you can generate the UpperCaseClassName alias entries from __all__ in the init files, you have the benefit of good class naming conventions. On the whole, the parameter types across the whole scipython ecosystem could do with some attempt at standardisation. For example, entries like list, length = len(coefs_) + len(intercepts_) where the code is not double-quoted are not handled with nicely; those underscores make sphinx complain param-type-demo/source/sklearn-param-types.rst:3601: WARNING: Unknown target name: "coefs". param-type-demo/source/sklearn-param-types.rst:3601: WARNING: Unknown target name: "intercepts". Luckily those are the only complaints in the whole demo, and I think the related docs are not part of the user API. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#150 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz68Z7wKnXX9-205oWmfeSlBYeO1Ijks5tJDjjgaJpZM4RPfj_> .

has2k1 · 2018-01-10T07:03:31Z

Are you sure generating the aliases is the best way to go?

For sklearn, yes. I think it approaches the auto-magic that you wished for, i.e leaving out the dot. The naming conventions seem to be strictly followed and the classes are imported for the user API in a uniform manner.

With thanks to the demo of numpy/numpydoc#150

has2k1 · 2018-01-12T07:01:54Z

The set of words to ignore is now configurable; the variation in building time conceals any performance impact; and that checks off my boxes.

As the default is set to enable this feature, maybe a notice to other projects!

CC: @charris, @rgommers, @tacaswell, @jorisvandenbossche

jnothman

Not sure whether I'll be able to give this another full review soon

jnothman · 2018-01-16T07:36:05Z

doc/install.rst

+  ``True`` by default.
+numpydoc_xref_aliases : dict
+  Mappings to fully qualified paths (or correct ReST references) for the
+  aliases/shortcuts used when specifying the types of parameters.


Mention no spaces allowed

tacaswell · 2018-01-16T15:24:13Z

I am 👍 in principle to the functionality, but don't have the bandwidth to do a code review.

jnothman · 2018-01-16T21:46:33Z

I'm not certain it should be enabled by default, so that's something else for reviewers to comment on

…

On 17 Jan 2018 2:24 am, "Thomas A Caswell" ***@***.***> wrote: I am 👍 in principle to the functionality, but don't have the bandwidth to do a code review. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#150 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yd2FkY19GxFeF9_y1M7rKVE6e_Hks5tLL8egaJpZM4RPfj_> .

larsoner · 2019-01-12T04:54:48Z

I have wanted something like this for a while. It would be great to resurrect this.

I would like to be able to refer to neighbors.NearestNeighbors, but it seems that it won't be linked unless I use sklearn.neighbors.NearestNeighbors. How would you solve this?

The way that the autolink role in Sphinx handles this is that it uses the currentmodule, and moves up from there. So if you were documenting something in sklearn.linear_model, you could call it neighbors.NearestNeighbors, as it would search first sklearn.linear_model.neigbors.NearestNeighbors, which does not exist, and then move on to sklearn.neighbors.NearestNeighbors, which does.

And ideally a ~neighbors.NearestNeighbors would also work, stripping out everything before the last . so rendering as NearestNeighbors in the doc.

Does that sound tractable? Any interest in coming back to this? I have recently been working on the SciPy docs, and maintain docs for MNE-Python, which follows a lot of the same conventions as sklearn, so I'd be happy to test this out and review.

larsoner · 2019-01-12T05:09:34Z

I'm not certain it should be enabled by default

I agree this would be safer as opt-in.

has2k1 · 2019-01-12T14:00:40Z

@larsoner, it is already opt-in and the ~ works as expected. I have had this feature working without any issues in plotnine for about a year.

larsoner · 2019-01-12T18:35:36Z

Okay, let me try it with MNE and SciPy and see what happens

larsoner · 2019-01-12T20:11:47Z

@has2k1 can you rebase on latest master to get rid of the conflicts?

larsoner · 2019-01-12T20:14:12Z

doc/install.rst

+  Whether to create cross-references for the parameter types in the
+  ``Parameters``, ``Other Parameters``, ``Returns`` and ``Yields``
+  sections of the docstring.
+  ``True`` by default.


If it's opt-in, then this docstring should be changed to say "False by default"

larsoner · 2019-01-12T20:24:01Z

numpydoc/numpydoc.py

    app.add_config_value('numpydoc_show_inherited_class_members', True, True)
    app.add_config_value('numpydoc_class_members_toctree', True, True)
    app.add_config_value('numpydoc_citation_re', '[a-z0-9_.-]+', True)
+    app.add_config_value('numpydoc_xref_param_type', True, True)


Looks like the default is still True, so it's opt-out currently.

has2k1 force-pushed the xref-param-type branch 2 times, most recently from 0e6a90f to 53cf0ff Compare January 1, 2018 06:05

has2k1 force-pushed the xref-param-type branch 2 times, most recently from e93218e to 616c4b9 Compare January 1, 2018 06:40

jnothman reviewed Jan 8, 2018

View reviewed changes

has2k1 force-pushed the xref-param-type branch from 616c4b9 to 9b5b5f9 Compare January 8, 2018 10:26

jnothman mentioned this pull request Jan 8, 2018

WIP try out numpydoc#150 (cross-reference links in param types) scikit-learn/scikit-learn#10421

Closed

Changes after review

0f5c33c

- Also changed the role used to create links from `obj` to `class`.

has2k1 force-pushed the xref-param-type branch from 9b5b5f9 to 0f5c33c Compare January 8, 2018 11:01

jnothman reviewed Jan 8, 2018

View reviewed changes

Fix issues with generated markup for param types

f926c46

- No tildes in aliases, the keys are the titles. - Do not emphasize text - Fix bug, open brackets cannot be to the left of the quote that ends a role.

has2k1 mentioned this pull request Jan 8, 2018

WIP try out numpydoc#150 (cross-reference links in param types) scikit-learn/scikit-learn#10426

Closed

has2k1 added 2 commits January 8, 2018 11:24

Use nodes.Text instead of nodes.inline

a6ddef6

`nodes.inline` adds a `span` tag. `nodes.Text` adds no tag.

Split on comma-space, care with brackets & quotes

aaa48d7

- Do not split singly quoted expressions The avoid edgecases that lead to bad rst markup. - Split only when there is a space after a comma. - Do not split on close brackets if they are followed by a linkable token.

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Jan 10, 2018

DOC clean up assorted type specifications

b435786

With thanks to the demo of numpy/numpydoc#150

jnothman mentioned this pull request Jan 10, 2018

[MRG + 1] DOC clean up assorted type specifications scikit-learn/scikit-learn#10441

Merged

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Jan 10, 2018

DOC add entries missing from glossary

aa27619

With thanks to the demo of numpy/numpydoc#150

jnothman mentioned this pull request Jan 10, 2018

[MRG] DOC add entries missing from glossary scikit-learn/scikit-learn#10442

Merged

jnothman added a commit to scikit-learn/scikit-learn that referenced this pull request Jan 10, 2018

[MRG] DOC add entries missing from glossary (#10442)

c7c14be

With thanks to the demo of numpy/numpydoc#150

Make the ignore set for xrefs an option

1843f5e

jnothman reviewed Jan 16, 2018

View reviewed changes

DOC: No spaces allowed in xref_aliases keys

4dd6657

FHaase mentioned this pull request Dec 11, 2018

DOC: Use official numpydoc extension pandas-dev/pandas#24098

Merged

larsoner mentioned this pull request Jan 12, 2019

WIP: Add parameter class links #57

Closed

larsoner reviewed Jan 12, 2019

View reviewed changes

larsoner mentioned this pull request Jan 14, 2019

ENH: Xref param type #197

Merged

5 tasks

rgommers added the type: Enhancement label Apr 21, 2019

rgommers closed this in #197 Apr 21, 2019

		return param_type


		def xref_param_type_role(role, rawtext, text, lineno, inliner,

Uh oh!

Add cross-reference links to parameter types #150

Add cross-reference links to parameter types #150

Uh oh!

Conversation

has2k1 commented Dec 30, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

has2k1 commented Jan 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

has2k1 commented Jan 8, 2018

Uh oh!

jnothman commented Jan 8, 2018

Uh oh!

has2k1 commented Jan 8, 2018

Uh oh!

jnothman commented Jan 8, 2018

Uh oh!

jnothman commented Jan 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jan 8, 2018

Uh oh!

jnothman commented Jan 8, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

has2k1 commented Jan 8, 2018

Uh oh!

has2k1 commented Jan 9, 2018

Uh oh!

jnothman commented Jan 9, 2018 via email

Uh oh!

has2k1 commented Jan 10, 2018

Uh oh!

jnothman commented Jan 10, 2018 via email

Uh oh!

has2k1 commented Jan 10, 2018

Uh oh!

jnothman commented Jan 10, 2018 via email

Uh oh!

jnothman commented Jan 10, 2018 via email

Uh oh!

jnothman commented Jan 10, 2018 via email

Uh oh!

has2k1 commented Jan 8, 2018 •

edited

Loading

jnothman commented Jan 8, 2018 •

edited

Loading