Sample entropy #71

kahaaga · 2022-08-28T19:49:17Z

No description provided.

codecov · 2022-08-28T19:55:08Z

Codecov Report

Merging #71 (0694b28) into main (5b98fb6) will decrease coverage by 0.37%.
The diff coverage is 60.46%.

@@            Coverage Diff             @@
##             main      #71      +/-   ##
==========================================
- Coverage   80.02%   79.65%   -0.38%     
==========================================
  Files          32       33       +1     
  Lines         706      747      +41     
==========================================
+ Hits          565      595      +30     
- Misses        141      152      +11

Impacted Files	Coverage Δ
src/complexity.jl	`0.00% <0.00%> (ø)`
src/complexity_measures/sample_entropy.jl	`74.28% <74.28%> (ø)`
..._estimators/transfer_operator/transfer_operator.jl	`73.60% <0.00%> (+3.19%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

kahaaga · 2022-08-29T13:02:34Z

Locating and counting neighbors is the bottleneck for performance for this algorithm.

Using a KDTree and finding indices of neighbors using Neighborhood.jl with searchtype bulkisearch gives massive performance improvements over using naive loops (the loop approach is not demonstrated here). However, significant (unnecessary) allocations of the index vectors still slow things down by a lot.

The newly added inrangecount from NearestNeighbors.jl drastically reduces allocations, because it doesn't need to allocate the index vectors.

For this algorithm, dropping bulkisearch in favor of inrangecount roughly halves the runtime, and reduces allocations by orders of magnitude, depending on the number of input points. For 10000 points, allocations were reduced by a factor of 220 (see below).

I am therefore using the inrangecount approach for this PR.

using NearestNeighbors, Neighborhood

function computeprobs_onlycts(x; k::Int, r, metric = Chebyshev())
    N = length(x)
    pts = genembed(x, 0:(k - 1))

    # For each `k`-dimensional xᵢ ∈ pts, locate its within-range-`r` nearest neighbors,
    # excluding the point `xₖ` as a neighbor to itself.
    tree = KDTree(pts, metric)

    # inrangecount includes the point itself, so subtract 1
    cts = [inrangecount(tree, pᵢ, r) - 1 for pᵢ in pts]

    # Pᵐ := The probability that two sequences will match for k points
    Pᵐ = 0
    c = N - k - 1
    for ct in cts
        Pᵐ += ct / c
    end
    Pᵐ /= N - k

    return Pᵐ
end

function computeprobs(x; k::Int, r, metric = Chebyshev())
    N = length(x)
    pts = genembed(x, 0:(k - 1))

    # For each `k`-dimensional xᵢ ∈ pts, locate its within-range-`r` nearest neighbors,
    # excluding the point `xₖ` as a neighbor to itself.
    tree = KDTree(pts, metric)
    theiler = Theiler(0) # w = 0 in the Theiler window means self-exclusion
    idxs = bulkisearch(tree, pts, WithinRange(r), theiler)

    # Pᵐ := The probability that two sequences will match for k points
    Pᵐ = 0
    c = N - k - 1
    for nn_idxsᵢ in idxs
        Pᵐ += length(nn_idxsᵢ) / c
    end
    Pᵐ /= N - k

    return Pᵐ
end

function sample_entropy(x; m = 2, r = StatsBase.std(x), base = MathConstants.e,
        metric = Chebyshev())

    Aᵐ⁺¹ = computeprobs(x; k = m + 1, r = r, metric = metric)
    Bᵐ = computeprobs(x; k = m, r = r, metric = metric)
    return -log(base, Aᵐ⁺¹ / Bᵐ)
end

function sample_entropy_ctsonly(x; m = 2, r = StatsBase.std(x),
        base = MathConstants.e, metric = Chebyshev())

    Aᵐ⁺¹ = computeprobs_onlycts(x; k = m + 1, r = r, metric = metric)
    Bᵐ = computeprobs_onlycts(x; k = m, r = r, metric = metric)
    return -log(base, Aᵐ⁺¹ / Bᵐ)
end

using BenchmarkTools

x = rand(10000)

sample_entropy(x, r = 0.25, m = 2)
@btime sample_entropy($x, r = 0.25, m = 2)
# 587.655 ms (139491 allocations: 741.82 MiB)

sample_entropy_ctsonly(x, r = 0.25, m = 2)
@btime sample_entropy_ctsonly($x, r = 0.25, m = 2)
# 283.440 ms (117696 allocations: 3.38 MiB)

src/sample_entropy/sample_entropy.jl

Datseris · 2022-09-30T22:10:59Z

You don't need to add NearestNeighbors to project.toml. Do instead: using Neighborhood.NearestNeighbors: inrangecount.

kahaaga · 2022-10-01T11:30:05Z

@Datseris This PR has been updated to use the new API.

Tests: Again, I can't find any good analytical examples in the literature, and due to the complexity of the estimation, I can't easily come up with good examples myself. So, as for approximate entropy, I just test the method by ensuring that the resulting values are "close enough" to zero for completely regular signals, and larger than zero for a non-regular signal.
Documentation: I support the test cases with an example for signals with different regularities, and show a small sensitivity analysis, like you'd find in a summary paper on the method.

kahaaga · 2022-10-22T09:34:27Z

@Datseris This has now been updated to use the new complexity from #134. Feel free to merge if it looks okay!

docs/src/examples.md

src/complexity_measures/sample_entropy.jl

Datseris · 2022-10-22T10:45:10Z

src/complexity_measures/sample_entropy.jl

+```math
+\\begin{aligned}
+B(r, m, N) = \\sum_{i = 1}^{N-m\\tau} \\sum_{j = 1, j \\neq i}^{N-m\\tau} \\theta(d({\\bf x}_i^m, {\\bf x}_j^m) \\leq r) \\\\
+A(r, m, N) = \\sum_{i = 1}^{N-m\\tau} \\sum_{j = 1, j \\neq i}^{N-m\\tau} \\theta(d({\\bf x}_i^{m+1}, {\\bf x}_j^{m+1}) \\leq r) \\\\
+\\end{aligned},
+```


Hmmm this is supsiciously similar to Cao's method for deducing an optimal embedding... Right? https://juliadynamics.github.io/DynamicalSystems.jl/dev/embedding/traditional/#DelayEmbeddings.delay_afnn you don't compute the average distance, but you compute how many are within distance r, and how this changes from embedding m to embedding m+1.

I'm not intimately familiar with Cao's method, but just had a quick glance at the source code. It seems there is some overlap. I see that you use bulkisearch in the _average_a function. I saw at least a 2x speed up and 2+ time less allocations here by using inrangecount, so using it there too could probably improve performance.

Co-authored-by: George Datseris <datseris.george@gmail.com>

Add convenience method Missing word Address review comments Improve syntax

kahaaga · 2022-10-22T20:32:40Z

I think all comments should be addressed now. The PR also uses the new version of Neighborhood.jl.

kahaaga added 2 commits August 28, 2022 21:47

Sample entropy

f296116

docstring

5c3467b

kahaaga mentioned this pull request Aug 29, 2022

Approximate entropy #72

Merged

kahaaga mentioned this pull request Aug 29, 2022

Add a new search type CountWithinRange? JuliaNeighbors/Neighborhood.jl#16

Closed

kahaaga added 3 commits August 29, 2022 15:45

Fix missing import and adjust figure

81d20fb

Faster implementation and normalization

4e804e9

Add NearestNeighbors to dependencies

df462a3

kahaaga marked this pull request as ready for review August 29, 2022 19:28

SampleEntropy estimator and convencience function sample_entropy`

26d22bc

Datseris reviewed Sep 5, 2022

View reviewed changes

src/sample_entropy/sample_entropy.jl Outdated Show resolved Hide resolved

This was referenced Sep 5, 2022

Complexity measure API #81

Closed

Refactor api #89

Merged

Use new api

f1c010b

kahaaga marked this pull request as draft September 30, 2022 18:57

kahaaga changed the title ~~Sample entropy~~ WIP: Sample entropy Sep 30, 2022

kahaaga added 4 commits October 1, 2022 12:47

Simplified and more performant algorithm

771a5ed

Documentation and examples

4053642

Adjust normalization

939fb3d

Add "analytical" tests

9ebd546

kahaaga added the new complexity measure label Oct 5, 2022

kahaaga changed the title ~~WIP: Sample entropy~~ Sample entropy Oct 12, 2022

kahaaga mentioned this pull request Oct 18, 2022

Complexity measure API, v2 #133

Closed

kahaaga added 4 commits October 21, 2022 11:59

Merge branch 'main' into sample_entropy

9af6892

Use complexity API

b4b2dc9

Constructors and typo

315bd3f

Need access to estimator variables

df39e32

kahaaga added 8 commits October 21, 2022 12:37

Show normalized sample entropy

223c600

Test normalized version too

fcf5a7b

Shorten comments, restructure and add error messages

7deb914

Error messages should occur generically

e756686

Be specific that we only support vectors

6168c7a

Docs

0df8b62

More tests

2440ab8

Better docstring

5c5b8b5

kahaaga requested a review from Datseris October 22, 2022 09:35

Datseris marked this pull request as ready for review October 22, 2022 10:05

Datseris reviewed Oct 22, 2022

View reviewed changes

kahaaga and others added 5 commits October 22, 2022 12:52

Update src/complexity_measures/sample_entropy.jl

b8b30cc

Co-authored-by: George Datseris <datseris.george@gmail.com>

Address review comments

d7c3d3e

Add convenience method Missing word Address review comments Improve syntax

Add convencience wrapper, as agreed on in #130

ab55130

Double ## forces the header to appear in menu, like for entropy section

f239795

Document convenience functions

fafd418

kahaaga mentioned this pull request Oct 22, 2022

Export inrangecount from NearestNeighbors.jl JuliaNeighbors/Neighborhood.jl#17

Merged

kahaaga added 2 commits October 22, 2022 16:03

Test that error is thrown when not providing r

b34a4bd

Use new version from Neighborhood.jl

0694b28

Datseris merged commit 7ba90f3 into main Oct 23, 2022

Datseris deleted the sample_entropy branch October 23, 2022 21:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sample entropy #71

Sample entropy #71

kahaaga commented Aug 28, 2022

codecov bot commented Aug 28, 2022 •

edited

Loading

kahaaga commented Aug 29, 2022 •

edited

Loading

Datseris commented Sep 30, 2022

kahaaga commented Oct 1, 2022

kahaaga commented Oct 22, 2022

Datseris Oct 22, 2022

kahaaga Oct 22, 2022

kahaaga commented Oct 22, 2022

Sample entropy #71

Sample entropy #71

Conversation

kahaaga commented Aug 28, 2022

codecov bot commented Aug 28, 2022 • edited Loading

Codecov Report

kahaaga commented Aug 29, 2022 • edited Loading

Datseris commented Sep 30, 2022

kahaaga commented Oct 1, 2022

kahaaga commented Oct 22, 2022

Datseris Oct 22, 2022

Choose a reason for hiding this comment

kahaaga Oct 22, 2022

Choose a reason for hiding this comment

kahaaga commented Oct 22, 2022

codecov bot commented Aug 28, 2022 •

edited

Loading

kahaaga commented Aug 29, 2022 •

edited

Loading