12 Feb 03:29

564c1d4

v0.8.0 (February, 2022) Latest

Latest

updates to `.geo` file format

Hypertools now saves DataGeometry objects using the pickle file format internally, rather than HDF5. With improvements made to the built-in pickle module since Hypertools's initial release, this now generally results in smaller files that save and load more quickly. It also allows us to no longer depend on deepdish, which has compatibility issues with various pandas objects, doesn't offer pre-built wheels for more recent Python versions, and is largely no longer maintained.

If you need to load .geo files from the old format, hypertools.load now accepts a keyword-only argument, legacy. Install deepdish if necessary, and pass legacy=True to load older DataGeometry objects. You can then .save() them to convert them to the new format.

improvements to example datasets

All example data files have been upgraded to the new file format. Additionally, the three pre-trained scikit-learn Pipelines Hypertools provides (wiki_model, nips_model, and sotus_model) have been recreated from scratch using a newer scikit-learn version, better text preprocessing, and updated CountVectorizer and LatentDirichletAllocation parameters that result in overall better models.

The example DataGeometry objects associated with these three models (wiki, nips, and sotus) have been updated accordingly, and additionally now use IncrementalPCA as their default reducers, resulting in faster, deterministic transform outputs.

To use the new models and datasets, upgrade Hypertools to v0.8.0 (pip install -U hypertools) and remove the local cache of old versions ([[ -d ~/hypertools_data ]] && rm ~/hypertools_data/*). Older versions of Hypertools will continue to use the old example data.

Other improvements

Hypertools is now compatible with Python 3.9! This release is also compatible in principle with Python 3.10, but numba does not yet support Python 3.10, so certain dependencies will fail to install.
Hypertools now works with newer scikit-learn versions! The updates above to the example datasets make Hypertools fully compatible with recent scikit-learn releases (>=0.24). This should make Hypertools easier to use in Colaboratory notebooks and more flexible in general. If you need to use an older scikit-learn version, pip-install hypertools<0.8.0.
Hypertools now works with newer Matplotlib versions! Recent updates to matplotlib's plotting backends were causing Hypertools's plotting interface to fail on import. We've fixed these bugs and maintained backwards compatibility with older (deprecated) interactive plotting backends as well.

Other assorted changes

failures when loading example datasets and .geo files now raise HypertoolsIOError with clearer error messages
specifying a compression when saving a DataGeometry object raises a FutureWarning
CI tests now run with Python 3.6 -- 3.9, use mamba for faster environment setup, and generate more verbose output
dependencies and code required for Python 2/3 compatibility have been removed
various code causing RuntimeWarnings has been fixed

Assets 4

15 Jun 19:22

paxtonfitzpatrick

v0.7.0

e7b7446

v0.7.0 (June 2021)

Control over matplotlib backend & various bug fixes

New features:

Create animated plots in an environment with a non-interactive matplotlib plotting backend set, without disrupting the global plotting backend
Create non-animated, interactive plots for easy inspection of data using the new interactive keyword argument
Set the plotting backend for a single plot using the new mpl_backend keyword argument, and easily switch between backends within a single Python interpreter session, IPython kernel, and even Jupyter notebook cell
Use the new hypertools.set_interactive_backend function to change the backend for all future plots, or use it as a context manager to temporarily switch to a different backend. You can also use this to create multiple animated/interactive plots simultaneously.
use hypertools's backend adjustments to control behavior of other plotting libraries
Set the $HYPERTOOLS_BACKEND environment variable to permanently set your preferred plotting backend for non-IPython environments

NB: Currently supported backends include TkInter, GTK, wxPython, Qt4, Qt5, Cocoa (aka MacOSX; MacOS only), notebook/nbAgg (Jupyter notebooks only), and ipympl/widget (Jupyter notebooks only). 3D and interactive plots may not render properly in Colab notebooks due to security restrictions imposed by the Colaboratory platform.

Bug fixes

importing hypertools in a notebook no longer creates phantom Python processes, issues warnings when TkInter isn't installed, fails if matplotlib.pyplot was imported first, or silently changes the plotting backend (fixes #242)
creating 3D plots with hypertools no longer alters the global matplotlib.rcParams object (fixes #243)
hypertools can now be imported for non-plotting-related uses in environments without a compatible GUI without throwing an error
IPython's TAB-completion no longer triggers a full import of hypertools or improperly sets the plotting backend based on the subprocess's environment
require scikit-learn<0.24 (full spec: scikit-learn>=0.19.1,!=0.22,<0.24) to avoid bug when loading pre-trained DataGeometry objects due to renamed sklearn module

Assets 4

02 Oct 21:38

paxtonfitzpatrick

v0.6.3

9ac3dc1

v0.6.3 (October 2020)

dependency-related updates

allow scikit-learn>0.22. scikit-learn==0.22.0 contains a bug that affects the CountVectorizer vocabulary. This has been fixed in 0.23.0.
require umap-learn>=0.4.6. We previously avoided a bug in umap-learn<=0.4.5 by installing a pre-release version from GitHub. This has now been fixed in umap-learn==0.4.6
Beginning with seaborn==0.11.0, "dark" color palettes are returned in reverse order from how they were previously. This difference in behavior will be reflected in hypertools, but we've changed the default cmap in hypertools._shared.helpers.vals2colors to a non-dark palette for consistent default behavior.
Added tests for Python 3.8

Assets 2

18 Dec 23:00

paxtonfitzpatrick

v0.6.2

eca7cff

v0.6.2 (December 2019)

minor patch that enables dependencies not hosted on PyPI to install properly

setup.py's setup command is now a custom class that inherits from setuptools.command.install.install, runs the regular installation process, then pip-installs UMAP from its GitHub URL at a pre-release commit hash. This is completely equivalent to manually running pip install git+<URL>, but takes the burden of having to do so off of end-users.
removed URL from requirements.txt, added a comment in its place
added MANIFEST.IN file to include requirements.txt
updated minimum Python version listed on PyPI page to 3.5 to reflect that Python 3.4 support was dropped in v0.5.1 (August 2018)

This version is tagged as 0.6.2 to keep the versioning here and on PyPI consistent. The fix intended to be 0.6.1 was unsuccessful on TestPyPI, and PyPI does not allow removing and reuploading an existing version.

Assets 2

18 Dec 19:14

paxtonfitzpatrick

v0.6.0

4808f9e

v0.6.0 (December 2019)

Updates to `hypertools.reduce`

fixed bug when to passing a dictionary of parameters to the reduce argument that would result in those parameters being overwritten
added some basic support for passing custom embedding models
added a warning when resolving conflicts between hypertools arguments and model-specific arguments

Other changes

dropped support for Python 2.7
fixed bug in Travis tests
replaced depreciated pandas.DataFrame method in hypertools.tools.df2mat
require installing UMAP from the GitHub repository due to bug fix not released yet.
updated setup.py to comply with PEP 508 guidelines for installing external dependencies
added unit test for hypertools.reduce bug fix
removed some unused imports and commented-out code
removed outdated pages from readthedocs
readthedocs build is now Python 3-based
build folder is ignored by default when installing from GitHub repository in editable mode

Assets 2

02 Aug 02:01

andrewheusser

v0.5.1

1bd82af

v0.5.1 (August 2018)

added flake8 to travis tests
refactored some of procrustes function code
removed support for python 3.4
removed hdbscan from dependencies (still can be used if installed manually)

Code cleanup (thanks @dwillmer!):

Changed string comparisons from if x is 'str' to if x == 'str'; the former is an identity comparison, not equality. It happens to be true for some strings because of string interning, but == should always be used for normal comparisons.
Removed unused arguments from _draw function - return_data and others weren't used in the function body.
Removed unreachable code in normalize function (branch criteria could never be True).
Separated out the multiply-nested function calls in DataGeometry class for clarity.
Changed comparisons of the formif type(x) is list to if isinstance(x, list); The former doesn't return True for subclasses, so isinstance should always be used.
Set unused loop variables to _.
Removed unused imports.
Ensured all imports are at the top of the file (except lazy / circular ones)
Ensure 2 blank lines above functions/classes (PEP8), the code looks a bit weird without this.
Fixed typo repect -> respect, was copy-pasted in multiple docstrings.
Removed redundant pass before error raise

Assets 2

18 Apr 20:29

andrewheusser

v0.5.0

c267b3d

v0.5.0 (April 2018)

Enhancements:

Plotting and transforming text data

hyp.plot now supports plotting text data. Simply pass a string, list of strings or list of lists of strings and the text will be transformed using a semantic model and plotted. By default, the text will be fit to a topic model (LDA) fit to a selection of wikipedia pages.
A new vectorizer argument in hyp.plot to specify a text vectorizer. Currently supports CountVectorizer, TfidfVectorizer`, or class instances (fit or unfit) of these models.
A new semantic argument in hyp.plot that specifies the semantic model to use to transform text. Current supports LatentDirichletAllocation, NMF, or class instances (fit or unfit) of these models.
A new corpus argument in hyp.plot that allows the user to specify text to fit a semantic model. Can be 'wiki', 'nips', 'sotus' or a custom list of text.
Enhanced hyp.format_data function that takes data in various forms (numpy array, dataframe, str, or list of str, or mixed list) and returns them in a standard format (a list of numpy arrays). This function can be used to transform text data using a semantic model.

New algorithms

A new clustering algorithm HDBSCAN (thanks @lmcinnes!) e.g. hyp.plot(data, cluster='HDBSCAN')
A new dimensionality reduction algorithm UMAP (thanks @lmcinnes!) e.g. hyp.plot(data, reduce='UMAP')

New parameters

A new size param to resize figure e.g. hyp.plot(data, size=[10,8])
A new ax param to add figure to existing axis e.g. hyp.plot(data, ax=ax)

New text examples

A new dataset of NIPS papers e.g. hyp.load('nips') (from kaggle)
A new dataset of selected wikipedia pages e.g. hyp.load('wiki')
A new dataset of State of the Union text from 1989-2017. Can be loaded as hyp.load('sotus') (from kaggle)

API changes
In hyp.plot changed group arg to hue (group will still be supported but depreciated in a coming release).

Removed deprecated describe_pca function. Please use more general function, describe.

Bugs fixed

When using chemtrails in hyp.plot, the entire timeseries would appear for the first few seconds of an animation and then dissapear.
The legend colors did not align with the data when using the fmt or color args.
When grouping with group/hue arg, labels were not reshuffled.
Fixed bug in describe function where correlations between data and reduced data would asymptote < 1.

NOTE: If you have been using the development version of 0.5.0, please clear your
data cache (/Users/yourusername/hypertools_data).

Assets 2

11 Dec 20:35

andrewheusser

v0.4.2

aba8574

v0.4.2 (December 2017)

fixed bug in plot function where software would crash if reduce was specified as dict
added tutorials to readthedocs

Assets 2

19 Nov 17:23

andrewheusser

v0.4.1

640745b

v0.4.1 (November 2017)

exposed format_data which formats numpy array, pandas df or mixed list in list of numpy arrays(hypertools.tools.format_data)
added tests for the function to format_data
added documentation to format_data

Assets 2

12 Oct 21:37

andrewheusser

v0.4.0

aed7808

v0.4.0 (October 2017)

Enhancements -

A new class: DataGeometry with methods for plotting, transforming new data and saving
Support for loading *.geo objects
A new function: analyze to perform combinations of transformations
A new function: describe for characterizing the loss of information due to dimensionality reduction algs
In-memory caching of time-intensive reduce, align and describe operations
New syntax for reduce function: model and model_params are now passed as a dictionary using the reduce arg
New clustering models added to the cluster function: MiniBatchKMeans, AgglomerativeClustering, Birch, FeatureAgglomeration, and SpectralClustering
Moved major functions (normalize, align, reduce, cluster, load) to main level (i.e. hyp.load instead of hyp.tools.load, but the latter will still work)

Deprecations -

A deprecation warning is thrown for the following align arguments: normalize, ndims, and method
A deprecation warning is thrown for the following reduce arguments: model, model_params, align, and normalize
A deprecation warning is thrown for the following cluster arguments: ndims
A deprecation warning is thrown for the describe_pca function (replaced by describe)

Bugs -

fixed #148 bug in hyp.plot where figure would be rendered despite setting show=False (thanks @chaseWilliams !)
fixed a bug where n_clusters would not override group, even though a warning message said it would
fixed a bug where hyp.plot would quit if any kwargs were not the same length as the number of arrays in the list of input data.

Minor -

added brainiak toolbox citation and github link to align.py docstring
added additional details and fixed typos in align.py docstring
Upgraded seaborn requirement to 8.1
updated all examples/docs with new syntax changes
added new tests for new features

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updates to `.geo` file format

improvements to example datasets