Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample entropy #71

Merged
merged 30 commits into from
Oct 23, 2022
Merged

Sample entropy #71

merged 30 commits into from
Oct 23, 2022

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Aug 28, 2022

No description provided.

@codecov
Copy link

codecov bot commented Aug 28, 2022

Codecov Report

Merging #71 (0694b28) into main (5b98fb6) will decrease coverage by 0.37%.
The diff coverage is 60.46%.

@@            Coverage Diff             @@
##             main      #71      +/-   ##
==========================================
- Coverage   80.02%   79.65%   -0.38%     
==========================================
  Files          32       33       +1     
  Lines         706      747      +41     
==========================================
+ Hits          565      595      +30     
- Misses        141      152      +11     
Impacted Files Coverage Δ
src/complexity.jl 0.00% <0.00%> (ø)
src/complexity_measures/sample_entropy.jl 74.28% <74.28%> (ø)
..._estimators/transfer_operator/transfer_operator.jl 73.60% <0.00%> (+3.19%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kahaaga kahaaga mentioned this pull request Aug 29, 2022
@kahaaga
Copy link
Member Author

kahaaga commented Aug 29, 2022

Locating and counting neighbors is the bottleneck for performance for this algorithm.

Using a KDTree and finding indices of neighbors using Neighborhood.jl with searchtype bulkisearch gives massive performance improvements over using naive loops (the loop approach is not demonstrated here). However, significant (unnecessary) allocations of the index vectors still slow things down by a lot.

The newly added inrangecount from NearestNeighbors.jl drastically reduces allocations, because it doesn't need to allocate the index vectors.

For this algorithm, dropping bulkisearch in favor of inrangecount roughly halves the runtime, and reduces allocations by orders of magnitude, depending on the number of input points. For 10000 points, allocations were reduced by a factor of 220 (see below).

I am therefore using the inrangecount approach for this PR.

using NearestNeighbors, Neighborhood

function computeprobs_onlycts(x; k::Int, r, metric = Chebyshev())
    N = length(x)
    pts = genembed(x, 0:(k - 1))

    # For each `k`-dimensional xᵢ ∈ pts, locate its within-range-`r` nearest neighbors,
    # excluding the point `xₖ` as a neighbor to itself.
    tree = KDTree(pts, metric)

    # inrangecount includes the point itself, so subtract 1
    cts = [inrangecount(tree, pᵢ, r) - 1 for pᵢ in pts]

    # Pᵐ := The probability that two sequences will match for k points
    Pᵐ = 0
    c = N - k - 1
    for ct in cts
        Pᵐ += ct / c
    end
    Pᵐ /= N - k

    return Pᵐ
end

function computeprobs(x; k::Int, r, metric = Chebyshev())
    N = length(x)
    pts = genembed(x, 0:(k - 1))

    # For each `k`-dimensional xᵢ ∈ pts, locate its within-range-`r` nearest neighbors,
    # excluding the point `xₖ` as a neighbor to itself.
    tree = KDTree(pts, metric)
    theiler = Theiler(0) # w = 0 in the Theiler window means self-exclusion
    idxs = bulkisearch(tree, pts, WithinRange(r), theiler)

    # Pᵐ := The probability that two sequences will match for k points
    Pᵐ = 0
    c = N - k - 1
    for nn_idxsᵢ in idxs
        Pᵐ += length(nn_idxsᵢ) / c
    end
    Pᵐ /= N - k

    return Pᵐ
end

function sample_entropy(x; m = 2, r = StatsBase.std(x), base = MathConstants.e,
        metric = Chebyshev())

    Aᵐ⁺¹ = computeprobs(x; k = m + 1, r = r, metric = metric)
    Bᵐ = computeprobs(x; k = m, r = r, metric = metric)
    return -log(base, Aᵐ⁺¹ / Bᵐ)
end

function sample_entropy_ctsonly(x; m = 2, r = StatsBase.std(x),
        base = MathConstants.e, metric = Chebyshev())

    Aᵐ⁺¹ = computeprobs_onlycts(x; k = m + 1, r = r, metric = metric)
    Bᵐ = computeprobs_onlycts(x; k = m, r = r, metric = metric)
    return -log(base, Aᵐ⁺¹ / Bᵐ)
end

using BenchmarkTools

x = rand(10000)

sample_entropy(x, r = 0.25, m = 2)
@btime sample_entropy($x, r = 0.25, m = 2)
# 587.655 ms (139491 allocations: 741.82 MiB)

sample_entropy_ctsonly(x, r = 0.25, m = 2)
@btime sample_entropy_ctsonly($x, r = 0.25, m = 2)
# 283.440 ms (117696 allocations: 3.38 MiB)

@kahaaga kahaaga marked this pull request as ready for review August 29, 2022 19:28
This was referenced Sep 5, 2022
@kahaaga kahaaga marked this pull request as draft September 30, 2022 18:57
@kahaaga kahaaga changed the title Sample entropy WIP: Sample entropy Sep 30, 2022
@Datseris
Copy link
Member

You don't need to add NearestNeighbors to project.toml. Do instead: using Neighborhood.NearestNeighbors: inrangecount.

@kahaaga
Copy link
Member Author

kahaaga commented Oct 1, 2022

@Datseris This PR has been updated to use the new API.

  • Tests: Again, I can't find any good analytical examples in the literature, and due to the complexity of the estimation, I can't easily come up with good examples myself. So, as for approximate entropy, I just test the method by ensuring that the resulting values are "close enough" to zero for completely regular signals, and larger than zero for a non-regular signal.
  • Documentation: I support the test cases with an example for signals with different regularities, and show a small sensitivity analysis, like you'd find in a summary paper on the method.

@kahaaga kahaaga changed the title WIP: Sample entropy Sample entropy Oct 12, 2022
@kahaaga
Copy link
Member Author

kahaaga commented Oct 22, 2022

@Datseris This has now been updated to use the new complexity from #134. Feel free to merge if it looks okay!

@kahaaga kahaaga requested a review from Datseris October 22, 2022 09:35
@Datseris Datseris marked this pull request as ready for review October 22, 2022 10:05
docs/src/examples.md Outdated Show resolved Hide resolved
src/complexity_measures/sample_entropy.jl Outdated Show resolved Hide resolved
src/complexity_measures/sample_entropy.jl Show resolved Hide resolved
src/complexity_measures/sample_entropy.jl Show resolved Hide resolved
src/complexity_measures/sample_entropy.jl Outdated Show resolved Hide resolved
src/complexity_measures/sample_entropy.jl Outdated Show resolved Hide resolved
src/complexity_measures/sample_entropy.jl Outdated Show resolved Hide resolved
src/complexity_measures/sample_entropy.jl Show resolved Hide resolved
src/complexity_measures/sample_entropy.jl Outdated Show resolved Hide resolved
Comment on lines +35 to +40
```math
\\begin{aligned}
B(r, m, N) = \\sum_{i = 1}^{N-m\\tau} \\sum_{j = 1, j \\neq i}^{N-m\\tau} \\theta(d({\\bf x}_i^m, {\\bf x}_j^m) \\leq r) \\\\
A(r, m, N) = \\sum_{i = 1}^{N-m\\tau} \\sum_{j = 1, j \\neq i}^{N-m\\tau} \\theta(d({\\bf x}_i^{m+1}, {\\bf x}_j^{m+1}) \\leq r) \\\\
\\end{aligned},
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm this is supsiciously similar to Cao's method for deducing an optimal embedding... Right? https://juliadynamics.github.io/DynamicalSystems.jl/dev/embedding/traditional/#DelayEmbeddings.delay_afnn you don't compute the average distance, but you compute how many are within distance r, and how this changes from embedding m to embedding m+1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not intimately familiar with Cao's method, but just had a quick glance at the source code. It seems there is some overlap. I see that you use bulkisearch in the _average_a function. I saw at least a 2x speed up and 2+ time less allocations here by using inrangecount, so using it there too could probably improve performance.

@kahaaga
Copy link
Member Author

kahaaga commented Oct 22, 2022

I think all comments should be addressed now. The PR also uses the new version of Neighborhood.jl.

@Datseris Datseris merged commit 7ba90f3 into main Oct 23, 2022
@Datseris Datseris deleted the sample_entropy branch October 23, 2022 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants