Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intersphinx delegation to domains #8929

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

jakobandersen
Copy link
Contributor

@jakobandersen jakobandersen commented Feb 24, 2021

Feature or Bugfix

  • Feature
  • Bugfix
  • Refactoring

Purpose

Currently, Intersphinx is performing lookups in inventories purely by string comparison. Unfortunately this does not work in general. For C and especially C++ more complex processing needs to be done.

As an overall goal for Intersphinx: provide cross-referencing to external projects as if those projects were merely a part of the current project. In order to do this I believe Intersphinx must delegate most processing of inventories and lookup to the domains. This means both a new inventory format is needed and changes to lookup. This PR tries to address the latter. See the bottom for notes on the inventory format.

Detail

  • When loading inventories, group entries by domain and object type, and package data only related to Intersphinx in a new class sphinx.util.inventory.InventoryItemSet. This class is private for Intersphinx.
  • Intersphinx maintains domain storage of arbitrary type, initialized by a clone of a new domain variable initial_intersphinx_inventory. This mirrors the ordinary domain storage through initial_data.
  • Data is then given to the domains through a new method intersphinx_add_entries_v2(), where the domain stores in the given domain storage. The v2 in the name refers to the current version of objects.inv (the v1 is converted to v2).
  • When Intersphinx gets the missing-reference event it will use two new domain methods intersphinx_resolve_xref. The previously filled domain storage is given here as well.
  • The domain must perform lookup and will end up with a (possibly empty) sphinx.util.inventory.InventoryItemSet which must be returned (the domains doesn't know that it is this class, just some arbitrary data that it got from Intersphinx earlier). Intersphinx will then perform disambiguation between duplicates across the inventories, and create the resulting nodes.

The old lookup code in intersphinx.py is now the default implementation in the Domain base class for backwards compatibility. The domain-specific things for std and py have been spliced out (though in a slightly haxy manner).

As a kind of proof-of-concept I have implemented custom handling in the C domain, where it reuses the ordinary symbol tables classes and lookup procedures. A test for lookup in relation to anonymous types illustrates that more lookups will work with this implementation.

Relates

Next steps

We should make a new inventory format where domains can store symbols in whichever format they like: https://github.com/orgs/sphinx-doc/discussions/12204

@jakobandersen
Copy link
Contributor Author

Rebased to resolve conflicts. @tk0miya, do you by chance have time to review this before we get too close to the v4 release?

@tk0miya
Copy link
Member

tk0miya commented Mar 13, 2021

Oh, sorry. I overlooked your PR at all.

I completely agree with you. It's difficult to resolve cross-references only on the intersphinx module and common inventory data structure. So the resolution process should be passed to the domains.

In my thought, it is better to export the whole of the env object (or the whole of the domain object) as an inventory database and import it as a real env (or domain) object on intersphinx. I think it's similar to this PR, but no new methods are needed. What do you think?

# image code
inventory_v3 = fetch_remote_inventory(url)
env = BuildEnvironment.from_inventory(inventory_v3)
cpp_domain = env.get_domain('cpp')
cpp_domain.resolve_xref(pending_xref)

@jakobandersen
Copy link
Contributor Author

That would be delightfully simply, though I think we would still need domain-specific methods for the loading. The inventory may be generated by an old version, so BuildEnvironment.from_inventory() would need to delegate to the domains so they can do the migration of the data into the current environment. And we would of course need to support the current inventory formats as well.

As for lookup: the current intersphinx lookup does some extra tricks, like splitting of an inventory name and perform targeted lookup, or when multiple inventories have the same entry, then disambiguate it in a deterministic way.
I know the C and C++ domains, and I assume the rest, does not directly support this kind of lookup. So in this way each inventory must be kept separate and intersphinx must in turn perform lookup. In a sense this is an extension of what is already going on: try projects in some predefined order, with the current project being the first.
Though, I'm a bit unsure if this is a good idea, as best candidate lookup will not be possible. In C++ a target T<int> may be resolved to a class template specialization template<> class T<int>, but if no such specialization has been defined, it will resolve to the primary template template<typename T> class T (if it exists). If the inventories are not merged, with built-in disambiguation, this will not work in general as the first inventory might have the primary template, while a latter the specialization. However, this is probably a quite niche use case.

@jakobandersen jakobandersen force-pushed the intersphinx_delegation branch 6 times, most recently from 1754c28 to 991f63d Compare April 14, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants