All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Contributor-facing features:
- Added an analytical test for SteinThinning, and associated documentation in
tests.unit.test_solvers
. (#842) - Added an analytical test for
KernelHerding.refine
on an existing coreset. (#870) - Added benchmarking scripts:
- MNIST (train a classifier on coreset of training data, test on testing data) (#802)
- Blobs (generate synthetic data using
sklearn.datasets.make_blobs
and compare MMD and KSD metrics) (#802) - David (extract pixel locations and values from an image and plot coresets side by side for visual benchmarking) (#880)
- Pounce (extract frames from a video and use coreset algorithms to select the best frames) (#892)
- Benchmarking results added on documentation.(#803)
benchmark
dependency group for benchmarking dependencies. (#888)example
dependency group for running example scripts. (#909)- Added a method
SquaredExponentialKernel.get_sqrt_kernel
which returns a square root kernel for the squared exponential kernel. (#883) - Added a new coreset algorithm Kernel Thinning. (#915)
- Added (loose) lower bounds to all direct dependencies. (#920)
MMD.compute
no longer returnsnan
. (#855)- Corrected an implementation error in
coreax.solvers.CaratheodoryRecombination
, which caused numerical instability when using eitherCaratheodoryRecombination
orTreeRecombination
on GPU machines. (#874, see also #852 and #853) KernelHerding.refine
correctly computes a refinement of an existing coreset. (#870)- Pylint pre-commit hook is now configured as the Pylint docs recommend. (#899)
- Type annotations so that core coreax package passes Pyright. (#906)
- Type annotations so that the example scripts pass Pyright. (#921)
- Moved coverage and performance data from GitHub gist to coreax-metadata repo. (#887)
- [BREAKING CHANGE] Equinox dependency version is changed from
<0.11.8
to>=0. 11.5
. (#898) - [BREAKING CHANGE] The
jaxtyping
version is now lower bounded atv0.2.31
to enablecoreax.data.Data
jaxtyping compatibility.
- Added an analytical test for RPCholesky, and associated documentation in
tests.unit.test_solvers
. (#822) - Added a unit test for RPCholesky to check whether the coreset has duplicates. (#836)
- Enabled
jaxtyping
compatible type hinting forcoreax.data.Data
, to indicate the expected type and shape of aData
objectsData.data
array attribute. For exampleBool[Data, "n d"]
indicatesData.data
should be ann d
array of bools.
RPCholesky.reduce
incoreax.solvers.coresubset
now computes the iteration step correctly. (#825)RPCholesky.reduce
incoreax.solvers.coresubset
now does not produce duplicate points in the coreset.(#836)- Fixed the example
examples.david_map_reduce_weighted
to prevent errors when downsampling is enabled, and to make it run faster. (#821) - Build includes sub-packages. (#845)
- Test dependency from
opencv-python
toopencv-python-headless
. (#848) - Updated installation instructions in README. (#848)
0.3.0 - [YANKED]
Yanked due to build failure.
- Added Kernel Stein Discrepancy divergence in
coreax.metrics.KSD
.(#659) - Added the
coreax.solvers.recombination
module, which provides the following new solvers:RecombinationSolver
: an abstract base class for recombination solvers.CaratheodoryRecombination
: a simple deterministic approach to solving recombination problems.TreeRecombination
: an advanced deterministic approach that utilisesCaratheodoryRecombination
, but is faster for solving all but the smallest recombination problems.(#504)
- Added supervised coreset construction algorithm in
coreax.solvers.GreedyKernelPoints
.(#686) - Added
coreax.kernels.PowerKernel
to replace repeated calls ofcoreax.kernels.ProductKernel
within the**
magic method ofcoreax.kernel.ScalarValuedKernel
.(#708) - Added scalar-valued kernel functions
coreax.kernels.PoissonKernel
andcoreax.kernels.MaternKernel
.(#742) - Added
progress_bar
attribute tocoreax.score_matching.SlicedScoreMatching
to enable or disable tqdm progress bar terminal output. Defaults to disabled (False
).(#761) - Added analytical tests for kernel herding, and associated documentation in
tests.unit.test_solvers
.(#794) - Added CI workflow for performance testing.
- Added array dimensions to type annotations using jaxtyping.(#746)
- Added integration test for
coreax.solver.recombination.TreeRecombination
.(#798)
- Fixed
MapReduce
incoreax.solvers.composite.py
to keep track of the indices.(#779) - Fixed negative weights on
coreax.weights.qp
.(#698)
- Refactored
coreax.inverses.py
functionality intocoreax.least_squares.py
:coreax.inverses.RegularisedInverseApproximator
replaced bycoreax.least_squares.RegularisedLeastSquaresSolver
.coreax.inverses.LeastSquaresApproximator
replaced bycoreax.least_squares.MinimalEuclideanNormSolver
.coreax.inverses.RandomisedEigendecompositionApproximator
replaced bycoreax.least_squares.RandomisedEigendecompositionSolver
.(#700)
- Refactoring of
coreax.kernel.py
intocoreax.kernels
sub-package:kernels.util.py
holds utility functions relating to kernels e.g.median_heuristic
.kernels.base.py
holds the base kernel classScalarValuedKernel
(renamed fromKernel
), as well as the base composite classesUniCompositeKernel
(renamed fromCompositeKernel
),DuoCompositeKernel
(renamed fromPairedKernel
) and the derived duo-composite kernelsAdditiveKernel
andProductKernel
coreax.kernels.scalar_valued.py
holds all currently implemented scalar valued kernels e.g.SquaredExponentialKernel
. (#708)
- Refactored
coreax.weights.py
to make weight solvers generic on data type.(#709)
coreax.weights.MMD
- deprecated alias forcoreax.weights.MMDWeightsOptimiser
; deprecated since version 0.2.0.(#784)coreax.weights.SBQ
- deprecated alias forcoreax.weights.SBQWeightsOptimiser
; deprecated since version 0.2.0.(#784)coreax.util.squared_distance_pairwise
- deprecated alias forcoreax.util.pairwise(squared_distance)
; deprecated since version 0.2.0.(#784)coreax.util.pairwise_difference
- deprecated alias forcoreax.util.pairwise(difference)
; deprecated since version 0.2.0.(#784)
- All uses of
coreax.kernel.Kernel
should be replaced withcoreax.kernels.base.ScalarValuedKernel
.(#708) - All uses of
coreax.kernel.UniCompositeKernel
should be replaced withcoreax.kernels.base.CompositeKernel
.(#708) - All uses of
coreax.kernel.PairedKernel
should be replaced withcoreax.kernels.base.DuoCompositeKernel
.(#708) - All uses of
coreax.kernel.AdditiveKernel
should be replaced withcoreax.kernels.base.AdditiveKernel
.(#708) - All uses of
coreax.kernel.ProductKernel
should be replaced withcoreax.kernels.base.ProductKernel
.(#708) - All uses of
coreax.kernel.LinearKernel
should be replaced withcoreax.kernels.scalar_valued.LinearKernel
.(#708) - All uses of
coreax.kernel.PolynomialKernel
should be replaced withcoreax.kernels.scalar_valued.PolynomialKernel
.(#708) - All uses of
coreax.kernel.SquaredExponentialKernel
should be replaced withcoreax.kernels.scalar_valued.SquaredExponentialKernel
.(#708) - All uses of
coreax.kernel.ExponentialKernel
should be replaced withcoreax.kernels.scalar_valued.ExponentialKernel
.(#708) - All uses of
coreax.kernel.RationalQuadraticKernel
should be replaced withcoreax.kernels.scalar_valued.RationalQuadraticKernel
.(#708) - All uses of
coreax.kernel.PeriodicKernel
should be replaced withcoreax.kernels.scalar_valued.PeriodicKernel
.(#708) - All uses of
coreax.kernel.LocallyPeriodicKernel
should be replaced withcoreax.kernels.scalar_valued.LocallyPeriodicKernel
.(#708) - All uses of
coreax.kernel.LaplacianKernel
should be replaced withcoreax.kernels.scalar_valued.LaplacianKernel
.(#708) - All uses of
coreax.kernel.SteinKernel
should be replaced withcoreax.kernels.scalar_valued.SteinKernel
.(#708) - All uses of
coreax.kernel.PCIMQKernel
should be replaced withcoreax.kernels.scalar_valued.PCIMQKernel
.(#708) - All uses of
coreax.util.median_heuristic
should be replaced withcoreax.kernels.util.median_heuristic
.(#708)
- Pyright to development tools (code does not pass yet)
- Nitpicks in documentation build
- Incorrect package version number
- Augmented unroll parameter to be consistent with block size in MMD metric
- Badge to README to show code coverage percentage.
- Support for Python 3.12.
- Added a deterministic, iterative, and greedy coreset algorithm which targets the
Kernelised Stein Discrepancy via
coreax.solvers.coresubset.SteinThinning
. - Added a stochastic, iterative, and greedy coreset algorithm which approximates the Gramian of a given kernel function
via
coreax.solvers.coresubset.RPCholesky
. - Added
coreax.util.sample_batch_indices
that allows one to sample an array of indices for batching. - Added kernel classes
coreax.kernel.AdditiveKernel
andcoreax.kernel.ProductKernel
that allow for arbitrary composition of positive semi-definite kernels to produce new positive semi-definite kernels. - Added additional kernel functions:
coreax.kernel.Linear
,coreax.kernel.Polynomial
,coreax.kernel.RationalQuadratic
,coreax.kernel.Periodic
,coreax.kernel.LocallyPeriodic
. - Added capability to approximate the inverses of arrays via least-squares (
coreax.inverses.LeastSquaresApproximator
) or randomised eigendecomposition (coreax.inverses.RandomisedEigendecompositionApproximator
) all inheriting fromcoreax.inverses.RegularisedInverseApproximator
, - Refactor of package to a functional style to allow for JIT-compilation of the codebase in the largest possible scope:
- Added data classes
coreax.data.Data
andcoreax.data.SupervisedData
that draw distinction between supervised and unsupervised datasets, and handle weighted data. Replacescoreax.data.DataReader
andcoreax.data.ArrayData
. - Added
coreax.solvers.base.Solver
to replace functionality incoreax.refine.py
,coreax.coresubset.py
andcoreax.reduction.py
. In particular,coreax.solvers.base.CoresubsetSolver
parents coresubset algorithms,coreax.solvers.base.RefinementSolver
parents coresubset algorithms which support refinement post-reduction,coreax.solvers.base.ExplicitSizeSolver
parents all coreset algorithms which return a coreset of a specific size. coreax.reduction.MapReduce
functionality moved tocoreax.solvers.composite.MapReduce
, now JIT-compilable via promise described incoreax.solvers.base.PaddingInvariantSolver
.- Moved all coresubset algorithms in
coreax.coresubset.py
tocoreax.solvers.coresubset.py
. - All coreset algorithms now return a
coreax.coreset.Coreset
rather than modifying acoreax.reduction.Coreset
in-place.
- Added data classes
- Use Equinox instead of manually constructing pytrees.
- Wording improvements in README.
- Documentation now builds without warnings.
- GitHub workflow runs automatically after Pre-commit autoupdate.
- Documentation has been rearranged.
- Renamed
coreax.weights.MMD
tocoreax.weights.MMDWeightsOptimiser
and added deprecation warning. - Renamed
coreax.weights.SBQ
tocoreax.weights.SBQWeightsOptimiser
and added deprecation warning. requirements-*.txt
will no longer be updated frequently, thereby providing stable versions.- Single requirements files covering all supported Python versions.
- All references to
kernel_matrix_row_{sum,mean}
have been replaced withGramian row-mean
. coreax.networks.ScoreNetwork
now allows the user to specify number of hidden layers.- Classes in
weights.py
andscore_matching.py
now inherit fromequinox.Module
. - Performance tests replaced by
jit_variants
tests, which checks whether a function has been compiled for reuse. - Replace some pygrep-hooks with ruff equivalents.
- Use Pytest fixtures instead of unittest style.
- Bash script to run integration tests has been removed.
pytest tests/integration
should now work as expected. - Tests for
coreax.kernels.Kernel.{calculate, update}_kernel_matrix_row_sum
. coreax.util.KernelComputeType
; useCallable[[ArrayLike, ArrayLike], Array]
instead.coreax.kernels.Kernel.calculate_kernel_matrix_row_{sum,mean}
; usecoreax.kernels.Kernel.gramian_row_mean
.coreax.kernels.Kernel.updated_kernel_matrix_row_sum
; usecoreax.kernels.Kernel.gramian_row_mean
if possible.coreax.data.DataReader
andcoreax.data.ArrayData
; usecoreax.data.Data
andcoreax.data.SupervisedData
.coreax.refine.py
andcoreax.coresubset.py
removed; usecoreax.solvers.base.RefinementSolver
orcoreax.solvers.base.CoresubsetSolver
to define coreset algorithms incoreax.solvers.coresubset
.coreax.reduction
removed, usecoreax.solvers.base.ExplicitSizeSolver
in place ofcoreax.reduction.SizeReduce
andcoreax.solvers.composite.MapReduce
in place ofcoreax.reduction.MapReduce
. Usecoreax.coreset.Coreset
andcoreax.coreset.Coresubset
in place ofcoreax.reduction.Coreset
.
- All uses of
coreax.weights.MMD
should be replaced withcoreax.weights.MMDWeightsOptimiser
. - All uses of
coreax.weights.SBQ
should be replaced withcoreax.weights.SBQWeightsOptimiser
. - All uses of
coreax.util.squared_distance_pairwise
should be replaced withcoreax.util.pairwise(squared_distance)
. - All uses of
coreax.util.pairwise_difference
should be replaced withcoreax.util.pairwise(difference)
.
- Base Coreax package using Object-Oriented Programming incorporating:
- coreset methods: kernel herding, random sample
- reduction strategies: size reduce, map reduce
- kernels: squared exponential, Laplacian, PCIMQ, Stein
- refinement: regular, reverse, random
- metrics: MMD
- approximations of kernel matrix row sum mean: random, ANNchor, Nystrom
- weights optimisers: SBQ, MMD
- score matching: sliced score matching, kernel density estimation
- I/O: array data not requiring any preprocessing
- Near-complete unit test coverage.
- Example scripts for coreset generation, which may be called as integration tests.
- Bash script to run integration tests in sequence to avoid Jax errors.
- Detailed documentation for the Coreax package published to Read the Docs.
- README.md including an overview of what coresets are, setup instructions, a how-to guide, example applications and an overview of features coming soon.
- Support for Python 3.9-3.11.
- Project configuration and dependencies through pyproject.toml.
- Requirements files providing a pinned set of dependencies that are known to work for each supported Python version.
- Mark Coreax as typed.
- This changelog to make it easier for users and contributors to see precisely what notable changes have been made between each release of the project.
- FAQ.md to address any commonly asked questions.
- Contributor guidelines, code of conduct, license and security policy.
- Git configuration.
- GitHub Actions to run unit tests on Windows, macOS and Ubuntu for supported Python versions.
- Pre-commit checks to run the following, also checked by GitHub Actions:
- black
- isort
- pylint
- cspell spell check with custom dictionaries for library names, people and miscellaneous
- pyroma
- pydocstyle
- assorted file format and encoding checks
- Look-before-you-leap validation of all input to public functions