Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cross-reference links to parameter types #150

Closed
wants to merge 7 commits into from

Conversation

has2k1
Copy link
Contributor

@has2k1 has2k1 commented Dec 30, 2017

Tokens of the type description that are determined to be "link-worthy"
are enclosed in a new role called xref_param_type. This role when
when processed adds a pending_xref node to the DOM. If these types
cross-references are not resolved when the build ends, sphinx does
not complain. This forgives errors made when deciding whether tokens
are "link-worthy". And provided text from the type description is not
lost in the processing, the only unwanted outcome is a type link (due
to coincidence) when none was desired.

Added two options:

  1. numpydoc_xref_param_type
  2. numpydoc_xref_aliases

closes #57

@has2k1 has2k1 force-pushed the xref-param-type branch 2 times, most recently from 0e6a90f to 53cf0ff Compare January 1, 2018 06:05
Tokens of the type description that are determined to be "link-worthy"
are enclosed in a new role called `xref_param_type`. This role when
when processed adds a `pending_xref` node to the DOM. If these types
cross-references are not resolved when the build ends, sphinx does
not complain. This forgives errors made when deciding whether tokens
are "link-worthy". And provided text from the type description is not
lost in the processing, the only unwanted outcome is a type link (due
to coincidence) when none was desired.

Added two options:

   1. numpydoc_xref_param_type
   2. numpydoc_xref_aliases
@has2k1 has2k1 force-pushed the xref-param-type branch 2 times, most recently from e93218e to 616c4b9 Compare January 1, 2018 06:40
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few queries on unclear things.

Have you tried to find similar functionality in other doc builders? How do they specify this kind of thing?

numpydoc/xref.py Outdated
QUALIFIED_NAME_RE = re.compile(
# e.g int, numpy.array, ~numpy.array
r'^'
r'[~\.]?'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does beginning with . mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.class_in_current_module

numpydoc/xref.py Outdated
r'(``.+?``)'
)

IGNORE = {'of', ' of ', 'either', 'or', 'with', 'in', 'default'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these settings be wrapped in a class, e.g. NumpyDocXrefSettings so that the user can do the following in config:

numpydoc_xref_param_type = NumpyDocXrefSettings()
numpydoc_xref_param_type.ignore.update(['hello'])

or perhaps

numpydoc_xref_param_type.add_to_ignore(['hello'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be configurable, however that set really doesn't affect the final outcome. It is just to minimise useless markup from being inserted.

numpydoc/xref.py Outdated
# - join the results with the pattern

# endswith ', optional'
if param_type.endswith(', optional'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this strictly need to be at the end? is there a reason not to just include optional in ignore?

numpydoc/xref.py Outdated
param_type[:-10],
xref_aliases)

# Any sort of bracket '[](){}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need special casing aside from other punctuation? The only thing I see it doing is ignoring the first token (dict, list, tuple)...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to be precise in targeting the common patterns. I think loosening up would reduce the code.

numpydoc/xref.py Outdated
return param_type


def xref_param_type_role(role, rawtext, text, lineno, inliner,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we adding a new role, when we could just allow the user to specify a role? How does this differ from existing default_role candidates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The role is an implementation detail (internal use). I'll add a detailed comment about it in the file.

numpydoc/xref.py Outdated
return _split_and_apply_re(param_type, DOUBLE_QUOTE_SPLIT_RE)

# Is splittable
for splitter in [' or ', ', ', ' ']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it to a regex (to accommodate the comment below), but I do not think it needs to be configurable. I think if type information used in the scientific python ecosystem gets more complicated, then other parts of this function will be found wanting too.

numpydoc/xref.py Outdated
# Is splittable
for splitter in [' or ', ', ', ' ']:
if splitter in param_type:
return _split_and_apply_str(param_type, splitter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we are doing this recursively for each delimiter rather than a complete tokenization step followed by handling each token?

numpydoc/xref.py Outdated
end)

# May have an unsplittable literal
if '``' in param_type:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not similarly need to handle strings with spaces in them? I don't know where such are used, but I could certainly imagine it.

Also existing markup may include things like :ref:`blah blah <foo>` which I don't think are currently handled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 8, 2018

Have you tried to find similar functionality in other doc builders? How do they specify this kind of thing?

Yes, see comment

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 8, 2018

Though I think the implementation catches nearly every bit of type information in numpy and scipy docstrings, it would help if someone can build any of the documentation to see it in action. I'm not capable of doing so. That would highlight any performance issues.

@jnothman
Copy link
Member

jnothman commented Jan 8, 2018

That would highlight any performance issues.

By performance do you mean computational performance?

I much prefer you not amending commits, so we can see the changes from review to review. We can squash upon merge.

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 8, 2018

By performance do you mean computational performance?

Yes. I saw a concern about it by someone from pandas.

Sorry about the squash. Let me correct it.

@jnothman
Copy link
Member

jnothman commented Jan 8, 2018

Regarding a test bed, I'll get Scikit-learn docs rendered at scikit-learn/scikit-learn#10421

- Also changed the role used to create links from `obj` to `class`.
@jnothman
Copy link
Member

jnothman commented Jan 8, 2018

So here is rendered scikit-learn API reference: https://16406-843222-gh.circle-artifacts.com/0/doc/modules/classes.html
Here's an example before this PR: https://16387-843222-gh.circle-artifacts.com/0/doc/modules/classes.html.

Comparing these two runs in time: 11:40 became 12:50 (~8% increase). Not negligible, but not egregious if the links are helpful.

Rendered docs are also larger: 26.3KB became 28.1KB (7% increase) for DBSCAN as a single example. Note that using the custom role places an <em> around every term that is not linked. This increases page size, and may affect rendering if use_blockquotes=True, or if stylesheets are not default.

I'm not sure that the numpydoc_xref_aliases is quite working as intended either: aliasing 'string' to 'str' means that 'str' will appear; 'sparse matrix' becomes 'sparse numpy.matrix' with the example aliases. I think you need some <display text> in your :xref...: markup.

@jnothman
Copy link
Member

jnothman commented Jan 8, 2018

Btw, feel free to copy my PR to scikit-learn so you can play with it, and I'll close mine.

@jnothman
Copy link
Member

jnothman commented Jan 8, 2018

[fixed some typos above]


ROLE_SPLIT_RE = re.compile(
# splits to preserve ReST roles
r'(:\w+:`.+?(?<!\\)`)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked: is ` allowed in anchor text?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But that regex will not catch pathological use use of backslashes and ticks.

numpydoc/xref.py Outdated
``xref_param_type`` role.
"""
if param_type in xref_aliases:
param_type = xref_aliases[param_type]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should perhaps have:

if QUALIFIED_NAME_RE.match(xref_aliases[param_type]):
    return ':xref_param_type:`%s <%s>`' % (xref_aliases[param_type], param_type)

... or preprocess xref_aliases

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 8, 2018

Looking at some of the test times, there seems to be some variance. numpydoc changed output html markup, so the <em> is redundant.

- No tildes in aliases, the keys are the titles.
- Do not emphasize text
- Fix bug, open brackets cannot be to the left of the quote that
  ends a role.
`nodes.inline` adds a `span` tag. `nodes.Text` adds no
tag.
- Do not split singly quoted expressions

The avoid edgecases that lead to bad rst markup.

- Split only when there is a space after a comma.
- Do not split on close brackets if they are followed by
  a linkable token.
@has2k1
Copy link
Contributor Author

has2k1 commented Jan 9, 2018

I've created a demo to show how the cross-references generated for all type strings from some common packages.

For each page, the top row shows the original string and the bottom row shows the cross-referenced result.

@jnothman
Copy link
Member

jnothman commented Jan 9, 2018 via email

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 10, 2018

What's the best way for me to configure an ignored word (e.g. object)?

Maybe expose the IGNORED set as an option. As the final html no longer has markup for tokens that do not get linked, the original motivation for set is not as strong.

is there any way to try linking to terms as well?

The aliases dict can do terms.

@jnothman
Copy link
Member

jnothman commented Jan 10, 2018 via email

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 10, 2018

I suppose we could read in our glossary if needed

That would not be fool proof, terms can have spaces and we don't deal well with spaces.

@jnothman
Copy link
Member

jnothman commented Jan 10, 2018 via email

@jnothman
Copy link
Member

jnothman commented Jan 10, 2018 via email

@jnothman
Copy link
Member

jnothman commented Jan 10, 2018 via email

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 10, 2018

The dot-names do look weird when not enclosed in a class role, but I don't think there is room for cleverness.

I saw the trialling punctuation thing, it may not be worth the trouble.

@jnothman
Copy link
Member

jnothman commented Jan 10, 2018 via email

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Jan 10, 2018
jnothman added a commit to jnothman/scikit-learn that referenced this pull request Jan 10, 2018
@has2k1
Copy link
Contributor Author

has2k1 commented Jan 10, 2018

To cross-reference strings with spaces like sparse matrix, it may be useful to change the documentation to sparse-matrix.

Initially I thought the aliases dictionary would be small, but it now looks like we want it to be as instrumental as the Rosetta Stone. For a package like sklearn may be you can generate the
UpperCaseClassName alias entries from __all__ in the init files, you have the benefit of good class naming conventions.

On the whole, the parameter types across the whole scipython ecosystem could do with some attempt at standardisation. For example, entries like list, length = len(coefs_) + len(intercepts_) where the code is not double-quoted are not handled with nicely; those underscores make sphinx complain

param-type-demo/source/sklearn-param-types.rst:3601: WARNING: Unknown target name: "coefs".
param-type-demo/source/sklearn-param-types.rst:3601: WARNING: Unknown target name: "intercepts".

Luckily those are the only complaints in the whole demo, and I think the related docs are not part of the user API.

@jnothman
Copy link
Member

jnothman commented Jan 10, 2018 via email

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 10, 2018

Are you sure generating the aliases is the best way to go?

For sklearn, yes. I think it approaches the auto-magic that you wished for, i.e leaving out the dot. The naming conventions seem to be strictly followed and the classes are imported for the user API in a uniform manner.

jnothman added a commit to scikit-learn/scikit-learn that referenced this pull request Jan 10, 2018
@has2k1
Copy link
Contributor Author

has2k1 commented Jan 12, 2018

The set of words to ignore is now configurable; the variation in building time conceals any performance impact; and that checks off my boxes.

As the default is set to enable this feature, maybe a notice to other projects!

CC: @charris, @rgommers, @tacaswell, @jorisvandenbossche

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether I'll be able to give this another full review soon

``True`` by default.
numpydoc_xref_aliases : dict
Mappings to fully qualified paths (or correct ReST references) for the
aliases/shortcuts used when specifying the types of parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention no spaces allowed

@tacaswell
Copy link
Contributor

I am 👍 in principle to the functionality, but don't have the bandwidth to do a code review.

@jnothman
Copy link
Member

jnothman commented Jan 16, 2018 via email

@larsoner
Copy link
Collaborator

I have wanted something like this for a while. It would be great to resurrect this.

I would like to be able to refer to neighbors.NearestNeighbors, but it seems that it won't be linked unless I use sklearn.neighbors.NearestNeighbors. How would you solve this?

The way that the autolink role in Sphinx handles this is that it uses the currentmodule, and moves up from there. So if you were documenting something in sklearn.linear_model, you could call it neighbors.NearestNeighbors, as it would search first sklearn.linear_model.neigbors.NearestNeighbors, which does not exist, and then move on to sklearn.neighbors.NearestNeighbors, which does.

And ideally a ~neighbors.NearestNeighbors would also work, stripping out everything before the last . so rendering as NearestNeighbors in the doc.

Does that sound tractable? Any interest in coming back to this? I have recently been working on the SciPy docs, and maintain docs for MNE-Python, which follows a lot of the same conventions as sklearn, so I'd be happy to test this out and review.

@larsoner
Copy link
Collaborator

I'm not certain it should be enabled by default

I agree this would be safer as opt-in.

@has2k1
Copy link
Contributor Author

has2k1 commented Jan 12, 2019

@larsoner, it is already opt-in and the ~ works as expected. I have had this feature working without any issues in plotnine for about a year.

@larsoner
Copy link
Collaborator

Okay, let me try it with MNE and SciPy and see what happens

@larsoner
Copy link
Collaborator

@has2k1 can you rebase on latest master to get rid of the conflicts?

Whether to create cross-references for the parameter types in the
``Parameters``, ``Other Parameters``, ``Returns`` and ``Yields``
sections of the docstring.
``True`` by default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's opt-in, then this docstring should be changed to say "False by default"

@@ -146,6 +152,9 @@ def setup(app, get_doc_object_=get_doc_object):
app.add_config_value('numpydoc_show_inherited_class_members', True, True)
app.add_config_value('numpydoc_class_members_toctree', True, True)
app.add_config_value('numpydoc_citation_re', '[a-z0-9_.-]+', True)
app.add_config_value('numpydoc_xref_param_type', True, True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the default is still True, so it's opt-out currently.

@larsoner larsoner mentioned this pull request Jan 14, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants