-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add symmetric GTDW and improve GDTW API #25
Conversation
Codecov Report
@@ Coverage Diff @@
## master #25 +/- ##
==========================================
- Coverage 95.74% 95.72% -0.02%
==========================================
Files 14 14
Lines 658 679 +21
==========================================
+ Hits 630 650 +20
- Misses 28 29 +1
Continue to review full report at Codecov.
|
Very nice, thanks a lot :) |
Sounds good! |
Awesome work! Adding interpolation such that the resulting vector is also of length N certainly helps with clarity, though in some cases, having the ability to access the M-vector can also be important. For example, in the benchmark work I'm doing right now, I'm comparing the approximate phi hat (M-dimensional) with a ground truth phi that's sampled at same the M timepoints. If it's helpful, here are a few of my own thoughts on asymmetry as default. I find that asymmetric time-warping is better for group alignment, e.g., aligning a set of signals to a common template. I also find asymmetric time-warping to be more useful as a distance metric, since forcing the warping function to be symmetric may not result in the best overall fit (ie lowest loss) between the two signals given the regularizers. |
@dderiso thanks for the comments and taking a look! Regarding
which vector are you referring to? I.e. there's
Ah, makes sense; I haven't tried that yet.
Interesting, I hadn't thought of that. Do you find big differences in loss between the two? To me |
Ah, sorry, I meant the N-vector. Great catch once again!
I haven't compared symmetric vs. asymmetric explicitly for classification. I'm working on a standardized test bank for benchmarks, and I imagine that it would be well suited for this sort of question. |
Ah got it! I agree, but with this API there is still a way to get access to that vector: one can pass in a keyword argument warp = zeros(N)
data = prepare_gdtw(x, y; warp = warp)
cost = iterative_gdtw!(data) will populate
Cool! (I saw your reply to my email, by the way-- will respond soon!). |
Nice -- that's a very thoughtful design! |
This PR adds a symmetric option and improves the GDTW API (I hope). Now,
gdtw
is called likecost, ϕ, ψ = gdtw(x,y)
where
ϕ, ψ
are interpolations so thatx ∘ ϕ ≈ y ∘ ψ
(well, the GDTW tries to minimize the difference, at least), where eitherψ(s) = 2s - ϕ(s)
(both signals warped symmetrically, the default), orψ(s)=s
(only thex
signal is warped).Returning an interpolation helps usability and clarity, I think, because before we returned a vector
warp
, but it might not be clear what the indices mean if you didn't pass timepointst
explicitly. In other words, if you passedN=100
, thenwarp
would be of lengthN
, but I'm not sure it's obvious thatwarp[i]
means how much to warp when time isi/N
. The interpolation takes care of that automatically.The symmetric distance adds no algorithmic overhead (I have some branches that maybe are not hoisted out of loops which could be improved, so it could add some overhead in practice) and I think it makes more sense as a default, since the usual DTW distance distorts both signals as well, so it matches that better. Moreover, I think a symmetric distance is more useful in many cases. For type stability and consistency then, I return a trivial
ψ(s) = s
whensymmetric=false
.Note to really get a symmetric distance, one needs to be a bit careful that all the inputs are appropriately symmetric. E.g. if you impose conditions on
ϕ'
, they should also be imposed onψ' == 2 - ϕ'
. The defaults are chosen in this way.Changing the number of returns and making them interpolations seems pretty breaking, so I bumped the minor version number. If this is a big problem let me know and we can figure out another way to do it without breaking (e.g. a new function name for the new behavior).
Returning two interpolations means, however, that there are some allocations (to construct them). I therefore tried to organize things so that it's easy to avoid them. The function body of
gdtw
is nowand one can just mimic that in their own code without calling
gdtw_warpings
if the interpolations aren't needed. This should be allocation free as long as aGDTWWorkspace
andwarp
vector are passed in. (Note thatwarp
is not part ofGDTWWorkspace
since you need to be careful about reusing it between computations, which is not true for the workspace. More specifically,ϕ
contains a reference to warp and mutating warp will changeϕ
, so you don't want to re-use warp if you need a consistentϕ
). I also exported these functions and documented them so they can be safely used as part of the public API.I also added a
Ref
'd counter todata
so thatiterative_gdtw!(data)
can be called multiple times on the samedata
to refine further. I found that convienent; e.g. if you setverbose=true
andmax_iters = 5
, you can see the convergence from iteration to iteration (or eg a tensorboard logger in the callback), but if it hasn't seemed to converge after 5 iterations, it's nice to be able to just calliterative_gdtw!(data; max_iters = 10)
and resume where you stopped. One downside of this is now I don't do the trickl, l_prev = l_prev, l
and just dol_prev .= l
instead. I thought it might be worth the slight loss of efficiency for this flexibility, although I'm not entirely sure.