Add symmetric GTDW and improve GDTW API #25

ericphanson · 2020-06-15T13:32:38Z

This PR adds a symmetric option and improves the GDTW API (I hope). Now, gdtw is called like
cost, ϕ, ψ = gdtw(x,y)
where ϕ, ψ are interpolations so that x ∘ ϕ ≈ y ∘ ψ (well, the GDTW tries to minimize the difference, at least), where either ψ(s) = 2s - ϕ(s) (both signals warped symmetrically, the default), or ψ(s)=s (only the x signal is warped).

Returning an interpolation helps usability and clarity, I think, because before we returned a vector warp, but it might not be clear what the indices mean if you didn't pass timepoints t explicitly. In other words, if you passed N=100, then warp would be of length N, but I'm not sure it's obvious that warp[i] means how much to warp when time is i/N. The interpolation takes care of that automatically.

The symmetric distance adds no algorithmic overhead (I have some branches that maybe are not hoisted out of loops which could be improved, so it could add some overhead in practice) and I think it makes more sense as a default, since the usual DTW distance distorts both signals as well, so it matches that better. Moreover, I think a symmetric distance is more useful in many cases. For type stability and consistency then, I return a trivial ψ(s) = s when symmetric=false.

Note to really get a symmetric distance, one needs to be a bit careful that all the inputs are appropriately symmetric. E.g. if you impose conditions on ϕ', they should also be imposed on ψ' == 2 - ϕ'. The defaults are chosen in this way.

Changing the number of returns and making them interpolations seems pretty breaking, so I bumped the minor version number. If this is a big problem let me know and we can figure out another way to do it without breaking (e.g. a new function name for the new behavior).

Returning two interpolations means, however, that there are some allocations (to construct them). I therefore tried to organize things so that it's easy to avoid them. The function body of gdtw is now

function gdtw(args...; kwargs...)
    data = prepare_gdtw(args...; kwargs...)
    cost = iterative_gdtw!(data)
    return cost, gdtw_warpings(data)...
end

and one can just mimic that in their own code without calling gdtw_warpings if the interpolations aren't needed. This should be allocation free as long as a GDTWWorkspace and warp vector are passed in. (Note that warp is not part of GDTWWorkspace since you need to be careful about reusing it between computations, which is not true for the workspace. More specifically, ϕ contains a reference to warp and mutating warp will change ϕ, so you don't want to re-use warp if you need a consistent ϕ). I also exported these functions and documented them so they can be safely used as part of the public API.

I also added a Ref'd counter to data so that iterative_gdtw!(data) can be called multiple times on the same data to refine further. I found that convienent; e.g. if you set verbose=true and max_iters = 5, you can see the convergence from iteration to iteration (or eg a tensorboard logger in the callback), but if it hasn't seemed to converge after 5 iterations, it's nice to be able to just call iterative_gdtw!(data; max_iters = 10) and resume where you stopped. One downside of this is now I don't do the trick l, l_prev = l_prev, l and just do l_prev .= l instead. I thought it might be worth the slight loss of efficiency for this flexibility, although I'm not entirely sure.

codecov · 2020-06-15T13:43:29Z

Codecov Report

Merging #25 into master will decrease coverage by 0.01%.
The diff coverage is 95.34%.

@@            Coverage Diff             @@
##           master      #25      +/-   ##
==========================================
- Coverage   95.74%   95.72%   -0.02%     
==========================================
  Files          14       14              
  Lines         658      679      +21     
==========================================
+ Hits          630      650      +20     
- Misses         28       29       +1

Flag	Coverage Δ
#unittests	`95.72% <95.34%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
src/DynamicAxisWarping.jl	`100.00% <ø> (ø)`
src/gdtw.jl	`96.55% <95.34%> (-0.30%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f20c0c6...a1d7ff9. Read the comment docs.

baggepinnen · 2020-06-15T14:02:29Z

Very nice, thanks a lot :)
No problem with breaking the API, the feature has only been available for a few days and the 0.x version numbers should be enough warning that API is not guaranteed to be stable.

ericphanson · 2020-06-15T14:16:38Z

Sounds good!

dderiso · 2020-06-19T22:04:55Z

Awesome work!

Adding interpolation such that the resulting vector is also of length N certainly helps with clarity, though in some cases, having the ability to access the M-vector can also be important. For example, in the benchmark work I'm doing right now, I'm comparing the approximate phi hat (M-dimensional) with a ground truth phi that's sampled at same the M timepoints.

If it's helpful, here are a few of my own thoughts on asymmetry as default. I find that asymmetric time-warping is better for group alignment, e.g., aligning a set of signals to a common template. I also find asymmetric time-warping to be more useful as a distance metric, since forcing the warping function to be symmetric may not result in the best overall fit (ie lowest loss) between the two signals given the regularizers.

ericphanson · 2020-06-21T19:24:13Z

@dderiso thanks for the comments and taking a look! Regarding

having the ability to access the M-vector can also be important

which vector are you referring to? I.e. there's τ which is a M x N matrix in our implementation, and warp which is a vector of length N, but no vector of length M (where M is the "spatial" discretization in the warping path and N is the time discretization of the signals, matching the paper).

I find that asymmetric time-warping is better for group alignment, e.g., aligning a set of signals to a common template.

Ah, makes sense; I haven't tried that yet.

I also find asymmetric time-warping to be more useful as a distance metric, since forcing the warping function to be symmetric may not result in the best overall fit (ie lowest loss) between the two signals given the regularizers.

Interesting, I hadn't thought of that. Do you find big differences in loss between the two? To me D(x,y) = D(y,x) feels pretty nice since we can get it in a natural way. I guess it might be interesting to compare the loss against the symmetrized asymmetric version i.e. (D(x,y) + D(y,x) ) /2.

dderiso · 2020-06-22T23:32:23Z

@ericphanson

which vector are you referring to?

Ah, sorry, I meant the N-vector. Great catch once again!

Do you find big differences in loss between the two?

I haven't compared symmetric vs. asymmetric explicitly for classification. I'm working on a standardized test bank for benchmarks, and I imagine that it would be well suited for this sort of question.

ericphanson · 2020-06-23T00:00:13Z

Ah, sorry, I meant the N-vector.

Ah got it! I agree, but with this API there is still a way to get access to that vector: one can pass in a keyword argument warp supplying an N-vector that will be populated appropriately. And you can avoid constructing the interpolations at the same time, for example

warp = zeros(N)
data = prepare_gdtw(x, y; warp = warp)
cost = iterative_gdtw!(data)

will populate warp and not construct any interpolations. (One could also want to pass a GDTWWorkspace object to avoid other memory allocations too). So the idea is that kind of stuff is more of an advanced-level use, and one has to do the extra step (prepare_gdtw then iterate_gdtw!), but the advantage is then the simple access point via gdtw is more intuitive.

I'm working on a standardized test bank for benchmarks, and I imagine that it would be well suited for this sort of question.

Cool! (I saw your reply to my email, by the way-- will respond soon!).

dderiso · 2020-06-23T01:23:15Z

Nice -- that's a very thoughtful design!

ericphanson added 2 commits June 15, 2020 12:18

Add symmetric warping

e950e6e

Improve API

a1d7ff9

baggepinnen merged commit 41cb028 into baggepinnen:master Jun 15, 2020

ericphanson deleted the symmetric branch June 21, 2020 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add symmetric GTDW and improve GDTW API #25

Add symmetric GTDW and improve GDTW API #25

ericphanson commented Jun 15, 2020

codecov bot commented Jun 15, 2020

baggepinnen commented Jun 15, 2020

ericphanson commented Jun 15, 2020

dderiso commented Jun 19, 2020 •

edited

Loading

ericphanson commented Jun 21, 2020

dderiso commented Jun 22, 2020

ericphanson commented Jun 23, 2020

dderiso commented Jun 23, 2020

Add symmetric GTDW and improve GDTW API #25

Add symmetric GTDW and improve GDTW API #25

Conversation

ericphanson commented Jun 15, 2020

codecov bot commented Jun 15, 2020

Codecov Report

baggepinnen commented Jun 15, 2020

ericphanson commented Jun 15, 2020

dderiso commented Jun 19, 2020 • edited Loading

ericphanson commented Jun 21, 2020

dderiso commented Jun 22, 2020

ericphanson commented Jun 23, 2020

dderiso commented Jun 23, 2020

dderiso commented Jun 19, 2020 •

edited

Loading