Design interface for missing data in Kronecker models #8

dfm · 2020-10-15T21:06:03Z

@tagordon, @ericagol: I have a proposal.

I expect that a common use case would be something like Rubin where you won't ever have multiple bands observed simultaneously, therefore there's a big overhead introduced by building the full matrices and then masking. So:

For the low rank terms, I think that this could be most efficiently implemented by just allowing the model to have variable kernel amplitudes. I would implement this by allowing some NxJ matrix Alpha which you would multiply into U and V and then square, sum along the 2nd axis and then multiply into a. I think that this would be equivalent to the low rank Kronecker model with missing data.
For the dense version, I think that things will be a bit trickier and I'm not sure what the best interface is. I think it would be worth working this through carefully and honestly I think that it might be worth writing a paper. It looks to me like we might be able to come up with a pretty efficient algorithm for this and we'd probably have a lot of users!

The text was updated successfully, but these errors were encountered:

tagordon · 2020-10-15T22:35:09Z

@dfm I'm trying to understand your suggestion for the low-rank version. What determines the entries in Alpha?

dfm · 2020-10-16T11:49:28Z

Sorry - I wrote it pretty vaguely!

So if I understand @ericagol's proposal for the missing data algorithm, we just need to remove all the rows of a, U, and V that correspond to the missing data (P is actually trickier in that form - more on that later!). So if you only observe one band at each time, this would be equivalent to just multiplying each row of U and V by the alpha value for the band observed at that time and multiplying that row of a by the square (modulo the diag entries, but you know what I mean!). So in its simplest form, Alpha would be an N-vector where each entry is the alpha parameter for the band at time n. The reason why I said NxJ is that if you add KronTerms that is equivalent to allowing different alphas for different Js and we might want to support that.

Is this at all clearer? Hard to explain without a whiteboard perhaps when it's all a little muddled in my brain.

Also: the reason why the P matrix is harder when masking is because most of the rows of P are 1, I think, but we need the row corresponding to the first observed band of P to be evaluated as exp(-c (t_n - t_{n-1})) (currently it's the first band, which will change if that band gets masked). This is trivial in the "one band per time" case as described above.

ericagol · 2020-10-22T13:53:17Z

@dfm Perhaps we should meet about this; I haven't thought about this much, and so could use an explainer.

dfm · 2020-10-23T11:51:58Z

Sure that would be great! Next week is getting a bit overloaded already. Perhaps some time on Nov 3? Y'all vote by mail in Washington right?

ericagol · 2020-10-23T16:53:02Z

@dfm Voted!

tagordon · 2020-10-23T17:49:21Z

Election day works for me!

dfm mentioned this issue Oct 15, 2020

[WIP]: Implementing Kronecker terms #6

Closed

6 tasks

dfm added the enhancement New feature or request label Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design interface for missing data in Kronecker models #8

Design interface for missing data in Kronecker models #8

dfm commented Oct 15, 2020

tagordon commented Oct 15, 2020

dfm commented Oct 16, 2020

ericagol commented Oct 22, 2020

dfm commented Oct 23, 2020

ericagol commented Oct 23, 2020

tagordon commented Oct 23, 2020

Design interface for missing data in Kronecker models #8

Design interface for missing data in Kronecker models #8

Comments

dfm commented Oct 15, 2020

tagordon commented Oct 15, 2020

dfm commented Oct 16, 2020

ericagol commented Oct 22, 2020

dfm commented Oct 23, 2020

ericagol commented Oct 23, 2020

tagordon commented Oct 23, 2020