Releases: ContextLab/hypertools
v0.8.0 (February, 2022)
updates to .geo
file format
Hypertools now saves DataGeometry
objects using the pickle
file format internally, rather than HDF5. With improvements made to the built-in pickle
module since Hypertools's initial release, this now generally results in smaller files that save and load more quickly. It also allows us to no longer depend on deepdish
, which has compatibility issues with various pandas
objects, doesn't offer pre-built wheels for more recent Python versions, and is largely no longer maintained.
If you need to load .geo
files from the old format, hypertools.load
now accepts a keyword-only argument, legacy
. Install deepdish
if necessary, and pass legacy=True
to load older DataGeometry
objects. You can then .save()
them to convert them to the new format.
improvements to example datasets
All example data files have been upgraded to the new file format. Additionally, the three pre-trained scikit-learn Pipeline
s Hypertools provides (wiki_model
, nips_model
, and sotus_model
) have been recreated from scratch using a newer scikit-learn version, better text preprocessing, and updated CountVectorizer
and LatentDirichletAllocation
parameters that result in overall better models.
The example DataGeometry
objects associated with these three models (wiki
, nips
, and sotus
) have been updated accordingly, and additionally now use IncrementalPCA
as their default reducer
s, resulting in faster, deterministic transform outputs.
To use the new models and datasets, upgrade Hypertools to v0.8.0 (pip install -U hypertools
) and remove the local cache of old versions ([[ -d ~/hypertools_data ]] && rm ~/hypertools_data/*
). Older versions of Hypertools will continue to use the old example data.
Other improvements
- Hypertools is now compatible with Python 3.9! This release is also compatible in principle with Python 3.10, but
numba
does not yet support Python 3.10, so certain dependencies will fail to install. - Hypertools now works with newer scikit-learn versions! The updates above to the example datasets make Hypertools fully compatible with recent scikit-learn releases (
>=0.24
). This should make Hypertools easier to use in Colaboratory notebooks and more flexible in general. If you need to use an older scikit-learn version, pip-installhypertools<0.8.0
. - Hypertools now works with newer Matplotlib versions! Recent updates to
matplotlib
's plotting backends were causing Hypertools's plotting interface to fail on import. We've fixed these bugs and maintained backwards compatibility with older (deprecated) interactive plotting backends as well.
Other assorted changes
- failures when loading example datasets and
.geo
files now raiseHypertoolsIOError
with clearer error messages - specifying a
compression
when saving aDataGeometry
object raises aFutureWarning
- CI tests now run with Python 3.6 -- 3.9, use
mamba
for faster environment setup, and generate more verbose output - dependencies and code required for Python 2/3 compatibility have been removed
- various code causing
RuntimeWarning
s has been fixed
v0.7.0 (June 2021)
Control over matplotlib backend & various bug fixes
New features:
- Create animated plots in an environment with a non-interactive
matplotlib
plotting backend set, without disrupting the global plotting backend
- Create non-animated, interactive plots for easy inspection of data using the new
interactive
keyword argument
- Set the plotting backend for a single plot using the new
mpl_backend
keyword argument, and easily switch between backends within a single Python interpreter session, IPython kernel, and even Jupyter notebook cell
- Use the new
hypertools.set_interactive_backend
function to change the backend for all future plots, or use it as a context manager to temporarily switch to a different backend. You can also use this to create multiple animated/interactive plots simultaneously.
- use
hypertools
's backend adjustments to control behavior of other plotting libraries
- Set the
$HYPERTOOLS_BACKEND
environment variable to permanently set your preferred plotting backend for non-IPython environments
NB: Currently supported backends include TkInter, GTK, wxPython, Qt4, Qt5, Cocoa (aka MacOSX; MacOS only), notebook/nbAgg (Jupyter notebooks only), and ipympl/widget (Jupyter notebooks only). 3D and interactive plots may not render properly in Colab notebooks due to security restrictions imposed by the Colaboratory platform.
Bug fixes
- importing
hypertools
in a notebook no longer creates phantom Python processes, issues warnings when TkInter isn't installed, fails ifmatplotlib.pyplot
was imported first, or silently changes the plotting backend (fixes #242) - creating 3D plots with
hypertools
no longer alters the globalmatplotlib.rcParams
object (fixes #243) hypertools
can now be imported for non-plotting-related uses in environments without a compatible GUI without throwing an error- IPython's TAB-completion no longer triggers a full import of
hypertools
or improperly sets the plotting backend based on the subprocess's environment - require
scikit-learn<0.24
(full spec:scikit-learn>=0.19.1,!=0.22,<0.24
) to avoid bug when loading pre-trainedDataGeometry
objects due to renamed sklearn module
v0.6.3 (October 2020)
dependency-related updates
- allow
scikit-learn>0.22
.scikit-learn==0.22.0
contains a bug that affects theCountVectorizer
vocabulary. This has been fixed in0.23.0
. - require
umap-learn>=0.4.6
. We previously avoided a bug inumap-learn<=0.4.5
by installing a pre-release version from GitHub. This has now been fixed inumap-learn==0.4.6
- Beginning with
seaborn==0.11.0
, "dark" color palettes are returned in reverse order from how they were previously. This difference in behavior will be reflected inhypertools
, but we've changed the defaultcmap
inhypertools._shared.helpers.vals2colors
to a non-dark palette for consistent default behavior. - Added tests for Python 3.8
v0.6.2 (December 2019)
minor patch that enables dependencies not hosted on PyPI to install properly
setup.py
's setup command is now a custom class that inherits fromsetuptools.command.install.install
, runs the regular installation process, then pip-installs UMAP from its GitHub URL at a pre-release commit hash. This is completely equivalent to manually runningpip install git+<URL>
, but takes the burden of having to do so off of end-users.- removed URL from
requirements.txt
, added a comment in its place - added
MANIFEST.IN
file to includerequirements.txt
- updated minimum Python version listed on PyPI page to 3.5 to reflect that Python 3.4 support was dropped in v0.5.1 (August 2018)
This version is tagged as 0.6.2
to keep the versioning here and on PyPI consistent. The fix intended to be 0.6.1
was unsuccessful on TestPyPI, and PyPI does not allow removing and reuploading an existing version.
v0.6.0 (December 2019)
Updates to hypertools.reduce
- fixed bug when to passing a dictionary of parameters to the
reduce
argument that would result in those parameters being overwritten - added some basic support for passing custom embedding models
- added a warning when resolving conflicts between
hypertools
arguments and model-specific arguments
Other changes
- dropped support for Python 2.7
- fixed bug in Travis tests
- replaced depreciated
pandas.DataFrame
method inhypertools.tools.df2mat
- require installing UMAP from the GitHub repository due to bug fix not released yet.
- updated
setup.py
to comply with PEP 508 guidelines for installing external dependencies - added unit test for
hypertools.reduce
bug fix - removed some unused imports and commented-out code
- removed outdated pages from readthedocs
- readthedocs build is now Python 3-based
- build folder is ignored by default when installing from GitHub repository in editable mode
v0.5.1 (August 2018)
- added flake8 to travis tests
- refactored some of procrustes function code
- removed support for python 3.4
- removed hdbscan from dependencies (still can be used if installed manually)
Code cleanup (thanks @dwillmer!):
- Changed string comparisons from if x is 'str' to if x == 'str'; the former is an identity comparison, not equality. It happens to be true for some strings because of string interning, but == should always be used for normal comparisons.
- Removed unused arguments from _draw function - return_data and others weren't used in the function body.
- Removed unreachable code in normalize function (branch criteria could never be True).
- Separated out the multiply-nested function calls in DataGeometry class for clarity.
- Changed comparisons of the formif type(x) is list to if isinstance(x, list); The former doesn't return True for subclasses, so isinstance should always be used.
- Set unused loop variables to _.
- Removed unused imports.
- Ensured all imports are at the top of the file (except lazy / circular ones)
- Ensure 2 blank lines above functions/classes (PEP8), the code looks a bit weird without this.
- Fixed typo repect -> respect, was copy-pasted in multiple docstrings.
- Removed redundant pass before error raise
v0.5.0 (April 2018)
Enhancements:
Plotting and transforming text data
hyp.plot
now supports plotting text data. Simply pass a string, list of strings or list of lists of strings and the text will be transformed using a semantic model and plotted. By default, the text will be fit to a topic model (LDA) fit to a selection of wikipedia pages.- A new
vectorizer
argument inhyp.plot
to specify a text vectorizer. Currently supportsCountVectorizer,
TfidfVectorizer`, or class instances (fit or unfit) of these models. - A new
semantic
argument inhyp.plot
that specifies the semantic model to use to transform text. Current supportsLatentDirichletAllocation
,NMF
, or class instances (fit or unfit) of these models. - A new
corpus
argument inhyp.plot
that allows the user to specify text to fit a semantic model. Can be 'wiki', 'nips', 'sotus' or a custom list of text. - Enhanced
hyp.format_data
function that takes data in various forms (numpy array, dataframe, str, or list of str, or mixed list) and returns them in a standard format (a list of numpy arrays). This function can be used to transform text data using a semantic model.
New algorithms
- A new clustering algorithm HDBSCAN (thanks @lmcinnes!) e.g.
hyp.plot(data, cluster='HDBSCAN')
- A new dimensionality reduction algorithm UMAP (thanks @lmcinnes!) e.g.
hyp.plot(data, reduce='UMAP')
New parameters
- A new
size
param to resize figure e.g.hyp.plot(data, size=[10,8])
- A new
ax
param to add figure to existing axis e.g.hyp.plot(data, ax=ax)
New text examples
- A new dataset of NIPS papers e.g.
hyp.load('nips')
(from kaggle) - A new dataset of selected wikipedia pages e.g.
hyp.load('wiki')
- A new dataset of State of the Union text from 1989-2017. Can be loaded as
hyp.load('sotus')
(from kaggle)
API changes
In hyp.plot
changed group
arg to hue
(group will still be supported but depreciated in a coming release).
- Removed deprecated
describe_pca
function. Please use more general function,describe
.
Bugs fixed
- When using
chemtrails
inhyp.plot
, the entire timeseries would appear for the first few seconds of an animation and then dissapear. - The legend colors did not align with the data when using the
fmt
orcolor
args. - When grouping with group/hue arg, labels were not reshuffled.
- Fixed bug in describe function where correlations between data and reduced data would asymptote < 1.
NOTE: If you have been using the development version of 0.5.0, please clear your
data cache (/Users/yourusername/hypertools_data).
v0.4.2 (December 2017)
- fixed bug in plot function where software would crash if reduce was specified as
dict
- added tutorials to readthedocs
v0.4.1 (November 2017)
- exposed format_data which formats numpy array, pandas df or mixed list in list of numpy arrays(hypertools.tools.format_data)
- added tests for the function to format_data
- added documentation to format_data
v0.4.0 (October 2017)
Enhancements -
- A new class: DataGeometry with methods for plotting, transforming new data and saving
Support for loading *.geo objects - A new function: analyze to perform combinations of transformations
- A new function: describe for characterizing the loss of information due to dimensionality reduction algs
- In-memory caching of time-intensive reduce, align and describe operations
- New syntax for reduce function: model and model_params are now passed as a dictionary using the reduce arg
- New clustering models added to the cluster function: MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, and SpectralClustering
- Moved major functions (normalize, align, reduce, cluster, load) to main level (i.e. hyp.load instead of hyp.tools.load, but the latter will still work)
Deprecations -
- A deprecation warning is thrown for the following align arguments: normalize, ndims, and method
- A deprecation warning is thrown for the following reduce arguments: model, model_params, align, and normalize
- A deprecation warning is thrown for the following cluster arguments: ndims
- A deprecation warning is thrown for the describe_pca function (replaced by describe)
Bugs -
- fixed #148 bug in hyp.plot where figure would be rendered despite setting
show=False
(thanks @chaseWilliams !) - fixed a bug where n_clusters would not override group, even though a warning message said it would
- fixed a bug where
hyp.plot
would quit if any kwargs were not the same length as the number of arrays in the list of input data.
Minor -
- added brainiak toolbox citation and github link to align.py docstring
- added additional details and fixed typos in align.py docstring
- Upgraded seaborn requirement to 8.1
- updated all examples/docs with new syntax changes
- added new tests for new features