Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add symmetric GTDW and improve GDTW API #25

Merged
merged 2 commits into from
Jun 15, 2020

Conversation

ericphanson
Copy link
Contributor

This PR adds a symmetric option and improves the GDTW API (I hope). Now, gdtw is called like
cost, ϕ, ψ = gdtw(x,y)
where ϕ, ψ are interpolations so that x ∘ ϕ ≈ y ∘ ψ (well, the GDTW tries to minimize the difference, at least), where either ψ(s) = 2s - ϕ(s) (both signals warped symmetrically, the default), or ψ(s)=s (only the x signal is warped).

Returning an interpolation helps usability and clarity, I think, because before we returned a vector warp, but it might not be clear what the indices mean if you didn't pass timepoints t explicitly. In other words, if you passed N=100, then warp would be of length N, but I'm not sure it's obvious that warp[i] means how much to warp when time is i/N. The interpolation takes care of that automatically.

The symmetric distance adds no algorithmic overhead (I have some branches that maybe are not hoisted out of loops which could be improved, so it could add some overhead in practice) and I think it makes more sense as a default, since the usual DTW distance distorts both signals as well, so it matches that better. Moreover, I think a symmetric distance is more useful in many cases. For type stability and consistency then, I return a trivial ψ(s) = s when symmetric=false.

Note to really get a symmetric distance, one needs to be a bit careful that all the inputs are appropriately symmetric. E.g. if you impose conditions on ϕ', they should also be imposed on ψ' == 2 - ϕ'. The defaults are chosen in this way.

Changing the number of returns and making them interpolations seems pretty breaking, so I bumped the minor version number. If this is a big problem let me know and we can figure out another way to do it without breaking (e.g. a new function name for the new behavior).

Returning two interpolations means, however, that there are some allocations (to construct them). I therefore tried to organize things so that it's easy to avoid them. The function body of gdtw is now

function gdtw(args...; kwargs...)
    data = prepare_gdtw(args...; kwargs...)
    cost = iterative_gdtw!(data)
    return cost, gdtw_warpings(data)...
end

and one can just mimic that in their own code without calling gdtw_warpings if the interpolations aren't needed. This should be allocation free as long as a GDTWWorkspace and warp vector are passed in. (Note that warp is not part of GDTWWorkspace since you need to be careful about reusing it between computations, which is not true for the workspace. More specifically, ϕ contains a reference to warp and mutating warp will change ϕ, so you don't want to re-use warp if you need a consistent ϕ). I also exported these functions and documented them so they can be safely used as part of the public API.

I also added a Ref'd counter to data so that iterative_gdtw!(data) can be called multiple times on the same data to refine further. I found that convienent; e.g. if you set verbose=true and max_iters = 5, you can see the convergence from iteration to iteration (or eg a tensorboard logger in the callback), but if it hasn't seemed to converge after 5 iterations, it's nice to be able to just call iterative_gdtw!(data; max_iters = 10) and resume where you stopped. One downside of this is now I don't do the trick l, l_prev = l_prev, l and just do l_prev .= l instead. I thought it might be worth the slight loss of efficiency for this flexibility, although I'm not entirely sure.

@codecov
Copy link

codecov bot commented Jun 15, 2020

Codecov Report

Merging #25 into master will decrease coverage by 0.01%.
The diff coverage is 95.34%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #25      +/-   ##
==========================================
- Coverage   95.74%   95.72%   -0.02%     
==========================================
  Files          14       14              
  Lines         658      679      +21     
==========================================
+ Hits          630      650      +20     
- Misses         28       29       +1     
Flag Coverage Δ
#unittests 95.72% <95.34%> (-0.02%) ⬇️
Impacted Files Coverage Δ
src/DynamicAxisWarping.jl 100.00% <ø> (ø)
src/gdtw.jl 96.55% <95.34%> (-0.30%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f20c0c6...a1d7ff9. Read the comment docs.

@baggepinnen baggepinnen merged commit 41cb028 into baggepinnen:master Jun 15, 2020
@baggepinnen
Copy link
Owner

Very nice, thanks a lot :)
No problem with breaking the API, the feature has only been available for a few days and the 0.x version numbers should be enough warning that API is not guaranteed to be stable.

@ericphanson
Copy link
Contributor Author

Sounds good!

@dderiso
Copy link

dderiso commented Jun 19, 2020

Awesome work!

Adding interpolation such that the resulting vector is also of length N certainly helps with clarity, though in some cases, having the ability to access the M-vector can also be important. For example, in the benchmark work I'm doing right now, I'm comparing the approximate phi hat (M-dimensional) with a ground truth phi that's sampled at same the M timepoints.

If it's helpful, here are a few of my own thoughts on asymmetry as default. I find that asymmetric time-warping is better for group alignment, e.g., aligning a set of signals to a common template. I also find asymmetric time-warping to be more useful as a distance metric, since forcing the warping function to be symmetric may not result in the best overall fit (ie lowest loss) between the two signals given the regularizers.

@ericphanson
Copy link
Contributor Author

@dderiso thanks for the comments and taking a look! Regarding

having the ability to access the M-vector can also be important

which vector are you referring to? I.e. there's τ which is a M x N matrix in our implementation, and warp which is a vector of length N, but no vector of length M (where M is the "spatial" discretization in the warping path and N is the time discretization of the signals, matching the paper).

I find that asymmetric time-warping is better for group alignment, e.g., aligning a set of signals to a common template.

Ah, makes sense; I haven't tried that yet.

I also find asymmetric time-warping to be more useful as a distance metric, since forcing the warping function to be symmetric may not result in the best overall fit (ie lowest loss) between the two signals given the regularizers.

Interesting, I hadn't thought of that. Do you find big differences in loss between the two? To me D(x,y) = D(y,x) feels pretty nice since we can get it in a natural way. I guess it might be interesting to compare the loss against the symmetrized asymmetric version i.e. (D(x,y) + D(y,x) ) /2.

@ericphanson ericphanson deleted the symmetric branch June 21, 2020 19:24
@dderiso
Copy link

dderiso commented Jun 22, 2020

@ericphanson

which vector are you referring to?

Ah, sorry, I meant the N-vector. Great catch once again!

Do you find big differences in loss between the two?

I haven't compared symmetric vs. asymmetric explicitly for classification. I'm working on a standardized test bank for benchmarks, and I imagine that it would be well suited for this sort of question.

@ericphanson
Copy link
Contributor Author

Ah, sorry, I meant the N-vector.

Ah got it! I agree, but with this API there is still a way to get access to that vector: one can pass in a keyword argument warp supplying an N-vector that will be populated appropriately. And you can avoid constructing the interpolations at the same time, for example

warp = zeros(N)
data = prepare_gdtw(x, y; warp = warp)
cost = iterative_gdtw!(data)

will populate warp and not construct any interpolations. (One could also want to pass a GDTWWorkspace object to avoid other memory allocations too). So the idea is that kind of stuff is more of an advanced-level use, and one has to do the extra step (prepare_gdtw then iterate_gdtw!), but the advantage is then the simple access point via gdtw is more intuitive.

I'm working on a standardized test bank for benchmarks, and I imagine that it would be well suited for this sort of question.

Cool! (I saw your reply to my email, by the way-- will respond soon!).

@dderiso
Copy link

dderiso commented Jun 23, 2020

Nice -- that's a very thoughtful design!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants