Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design interface for missing data in Kronecker models #8

Open
dfm opened this issue Oct 15, 2020 · 6 comments
Open

Design interface for missing data in Kronecker models #8

dfm opened this issue Oct 15, 2020 · 6 comments
Labels
enhancement New feature or request

Comments

@dfm
Copy link
Member

dfm commented Oct 15, 2020

@tagordon, @ericagol: I have a proposal.

I expect that a common use case would be something like Rubin where you won't ever have multiple bands observed simultaneously, therefore there's a big overhead introduced by building the full matrices and then masking. So:

  • For the low rank terms, I think that this could be most efficiently implemented by just allowing the model to have variable kernel amplitudes. I would implement this by allowing some NxJ matrix Alpha which you would multiply into U and V and then square, sum along the 2nd axis and then multiply into a. I think that this would be equivalent to the low rank Kronecker model with missing data.
  • For the dense version, I think that things will be a bit trickier and I'm not sure what the best interface is. I think it would be worth working this through carefully and honestly I think that it might be worth writing a paper. It looks to me like we might be able to come up with a pretty efficient algorithm for this and we'd probably have a lot of users!
@dfm dfm added the enhancement New feature or request label Oct 15, 2020
@tagordon
Copy link
Collaborator

@dfm I'm trying to understand your suggestion for the low-rank version. What determines the entries in Alpha?

@dfm
Copy link
Member Author

dfm commented Oct 16, 2020

Sorry - I wrote it pretty vaguely!

So if I understand @ericagol's proposal for the missing data algorithm, we just need to remove all the rows of a, U, and V that correspond to the missing data (P is actually trickier in that form - more on that later!). So if you only observe one band at each time, this would be equivalent to just multiplying each row of U and V by the alpha value for the band observed at that time and multiplying that row of a by the square (modulo the diag entries, but you know what I mean!). So in its simplest form, Alpha would be an N-vector where each entry is the alpha parameter for the band at time n. The reason why I said NxJ is that if you add KronTerms that is equivalent to allowing different alphas for different Js and we might want to support that.

Is this at all clearer? Hard to explain without a whiteboard perhaps when it's all a little muddled in my brain.

Also: the reason why the P matrix is harder when masking is because most of the rows of P are 1, I think, but we need the row corresponding to the first observed band of P to be evaluated as exp(-c (t_n - t_{n-1})) (currently it's the first band, which will change if that band gets masked). This is trivial in the "one band per time" case as described above.

@ericagol
Copy link

@dfm Perhaps we should meet about this; I haven't thought about this much, and so could use an explainer.

@dfm
Copy link
Member Author

dfm commented Oct 23, 2020

Sure that would be great! Next week is getting a bit overloaded already. Perhaps some time on Nov 3? Y'all vote by mail in Washington right?

@ericagol
Copy link

@dfm Voted!

@tagordon
Copy link
Collaborator

Election day works for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants