Standardize and refactor doc extensions #352
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
spacier.extensions
module, which includes functionality for "registering" functions that return sets of doc extensions using explosion'scatalogue
pkgextract
andtext_stats
subpkgs to leverage the new setuptextacy.set_doc_extensions(NAME)
, whereNAME
is a string like"extract"
or, optionally, a specific subset like"extract.basics"
which registers the corresponding subset of extensionsextensions
modulespacier.extensions
, core extensions (Doc._.preview
andDoc._.meta
) moved tospacier.core
, and a couple of bag-oriented functions moved toextract.bags
, with corresponding extensions settable viatextacy.set_doc_extensions("extract")
ortextacy.set_doc_extensions("extract.bags")
to_tokenized_text()
function + method extension, which was a bit of an odd-ball and was originally meant for interoperability w/gensim
Motivation and Context
This aspect of
textacy
was inconsistent and awkward. It had been developed in pieces over multiple major versions ofspacy
, and never really reconsidered. The current setup is more explicit and configurable from a user's perspective, rather than implicit / one-size-fits-all.How Has This Been Tested?
Types of changes
Checklist: