-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathTODO
306 lines (227 loc) · 15.9 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
* compute covariance of spots across cell type contributions during MCMC and accumulate statistics -> clustering of spots
* perform DE between sets of spots clustered according to MCMC-accumulated covariance and based on uncertainty propagated to spot-level
* improve code for and CLI to freeze the scaling factors
* compute means etc. on chain statistics:
- point estimates: mean, standard deviation, percentiles
- interval estimates: equal-tail and highest posterior density (HPD) intervals
- posterior covariance matrix
- posterior correlation matrix
- deviance information criterion (DIC)
* determine and direct acceptance rate of Metropolis-Hastings sampling
* write out predicted means and variances, based on
- the NB conditional posterior
- the NM conditional posterior
- the Poisson conditional posterior
* re-consider split and merge algorithm
* for the splitting / merging:
- independently also optimize the parameters without splitting / merging
* experiment with splitting a single factor explaining everything
* add flag for consistent contribution marginals and expectation marginals
* contribution initialization is unneeded if Gibbs sampling does that as a first step
* compute perplexity, i.e. validation
* automatically determine single experiments in analyze.r
* store all iterations' parameters;
* compute auto-correlation
* print out acceptance / rejection statistics for MH parts
* test likelihood optimization using ML
* consider hierarchies of factors
* trans-experiment factors:
- by using data in blocks corresponding to the experiments
- alternative: make sense of trans-experiment profiles post-analysis
* experiment-wise mix-in of factors
- like normal factors (with features and weights in the same matrices), but treated differently when sampling theta
* use faster method for sampling from the Poisson distribution
* bring back statistical summary printing for vectors
* templatize scaling variables
* evaluate log-normal Poisson rate modeling
* differential expression analysis
* check the Dirichlet posteriors
- done! The feature side is as in that publication; the mixing side is different but ultimately analogous; I haven't written this down yet
* analysis script:
- create barplots for expected reads explained, aside from expected spots explained;
- also break-down experiment-wise
- aside from basing on theta, also consider using the assigned contributions
> do this across iterations, for each in decreasing order (because of non-identifiability due to exchangeability!), coloring the first say in red, the last in black to see whether (and if so, how long) there change in the distribution
implement t1, t2, g1, g2, s1, s2 to set limits for sampling
* experiment-constant factors?
* what would be simpler than this is factors that are zero in the other samples, otherwise free
Hierarchical Dirichlet features
a two-level hierarchy
When sampling a novel type (i.e. one that isn't already present in the global hierarchy) then sample a Dirichlet-multinomial as Dirichlet prior for it, with a similar scaling as the initial factors
currently, the nHDP results in trees with a lot of weight on a single branch; it would be good to get more balanced trees instead!
It would be possible to sample multinomially for all counts of a gene in a spot
Kmeans: use a Gaussian mixture approach, rather than 'hard' K-means
Do bagged updates
Consider re-setting the mixing counts from time to time
Consider performing a Dirichlet process in which the observations are count vectors
implement the HDP
Drop template-based approach in favor of inheritance
* is it possible to benefit even more from sparsity? Consider non-zero values; this might make various things faster
less important:
* collect log-likelihoods and write to file
* note that some sums across factors, genes, and / or spots may yield numbers greater than 2^16; however the count data type uses 32 bit! So this should be unproblematic
* use something like libmagic to figure out file types
* consider using GPU for sampling contributions
* experiment with activating ARMA_NO_DEBUG in types.hpp
* write probabilistic unit tests for sampling routines
* consider using HDF5 for IO
* test possible benefits of JIT compiling with fixed G, S, T
- the speed-up was marginal in a manual test it went from 2:05 min to 2:00 (round about...). Unfortunately, I forgot whether I used all or only the top 1000 genes; some speed-ups only become apparent with more parameters!
* use profile-generate / profile-use gcc flags
* enforcing medians as alternative to means
* print out some sort of Bayesian summary
-> percentiles: 0%, 5%, 25%, 50%, 75%, 95%, 100%
-> mean and variance
If gamma mixture proportions are used then the spot scaling should probably be deactivated
Feature-side hierarchy:
basically, introduce a plate for gene, type, and experiment.
Let it's structure be identical to that of the gene and type plate.
I.e. then we have \lambda_{gst} = \phi_{gt} \phi_{gte} \theta{st} \sigma{s} maybe times \sigma{e}
Terminology: "groups" are what I formerly called "experiments".
It might be nice to turn the experiment-specific hyper-parameters into random variables
This might be feasible because we have so many parameters sharing that parameter.
Exponential distributions could be used, and estimation be performed by MH.
It might also be nice to have the same kind of hierarchical structure on the mixing side
Like that it would be possible for the theta priors to share information across groups.
The same generalized beta prime / compound gamma distributions could be used to sample the 2nd prior of the mixing weights as is used for the 2nd prior of the mixing components.
Resurrect:
* experiment-specific phi prior estimation
* experiment scaling
Determine how similar the "real" local gene expression profiles (the component-wise product of local and global phi matrices) really are to the global ones.
Consider using such a similarity measure to determine homogeneity amongst the "real" local profiles, identifying outliers and associating them with other global profiles.
Fix the likelihood calculation of phi's p prior
Write DGE analysis code
* compare all factors with each other within each experiment
* requires code to compute the distribution
* the question of the baselines arises
- several approaches exist:
> arg min sum log FC
> arg min Jensen-Shannon divergence
Verify the implementation of the experiment-dependent baseline feature
i.e.: lamba_gst = phi_gt global_phi_gt experiment_phi_g theta_st sigma_s
Re-implement Dirichlet features and mixing weights
Modularize calling modes:
* only global
* only local
* both
In each case, for the local features, we are NOT learning the priors
Study mean vs CV plots
Make sure the subroutine call in Metropolis Hastings for the prior of theta is inlined!
Simplify template boiler plate code by contracting; i.e. form a template struct that carries the combined information of the two current template arguments, and expect the model / experiment etc template types to be instantiated with bespoke template struct.
It might be good to test for each effective local factor, whether or not it would fit better to other global factors
Add a baseline mixing weight factor?
* what would be the benefit of that?
- it would improve likelihood by pushing shared local regulation into the baseline rather than in each of the factor mixing weights.
- apart from complicated implications on the mixing rate, this should not change much how the weighted theta matrices look
- one could certainly try doing dim red on matrices that leave out the local baseline weights; but since each experiment's sample factor activity distribution around the baseline might differ due to shifting cell type composition entropy, it's unfortunately also unclear how much this might be of benefit for dimensionality reduction
Then, what about global mixing weights?
Global baseline feature
Ensure that global_theta_prior is off when using Dirichlet mixtures; similarly actually for all theta priors
Add a sampling mode for the end, in which the theta values are randomized, and then contributions and theta values are updated iteratively.
This would be done multiple times, and the resulting theta matrix reported.
Importantly, in doing so, everything feature-side, as well as the spot scaling variables, and global theta priors should not be touched.
When local theta priors are used, then it's debatable whether those should be fixed or also randomized & updated.
Feature hierarchy: tree
Perhaps such tree-shaped feature hierarchies might live in global scope, and one would keep the hierarchical experiment-local feature structure
Global baseline feature?
Sets of features with different priors; ones in which genes can be switched off, and other with broad support
Reconsider order of sampling steps in Experiment.
Could have:
0. baseline prior
1. baseline
2. theta prior
3. theta
4. phi prior
4. phi
Alternatively:
0. theta prior
1. theta
2. baseline prior
3. baseline
4. phi prior
4. phi
Up to now baseline was last, and by moving it forward perhaps its dynamics might be accelerated.
Reconsider using expected values of contributions instead of a Gibbs sample
Reset experiments
Field models
* difference of log gammas of neighbors as pairwise energy term
* maybe kernel-based weighted sums over neighborhood's observations and explanation is exactly the way to bring this into a gamma-based formalism
Write out vectors in the same format as matrices - i.e. give a column name
Add global feature updates to experiment resetting
Vectorize the sampling routines to simplify parallelization logic and to thus even out parallel work loads
Try out modified posteriors for baseline features. Pretend there are no local features... -> branch greedy_baseline
It might be beneficial even in single-experiment mode to have local feature baselines.
So perhaps it might be preferable to deactivate the global features for single-experiment mode.
Try out a momentum-enhanced Gibbs sampler.
This would integrate the previous few Gibbs steps difference vectors.
Re-introduce experiment scaling (integrate with spot scaling).
Put a prior on the characteristic length-scale and sample from it's posterior if possible - otherwise do Metropolis-Hastings.
Use one shared, not gene-dependent feature prior.
Try setting higher prior values to the local weights than to the baseline.
Make loading more memory efficient: don't constantly re-allocate.
Perhaps first do a scan for the names, and then load them all in place?
Also: multi-thread!
A GUI gimmick showing current activities
-> this could also generate plots
Reactivate Dirichlet mixtures (need to fix the field posterior reliance on that type)
CLI capabilities to specify shared coordinate systems.
E.g. based on spot names a:23x22x0
Alternatively, or in combination with that: path prefixes, e.g. a:path.tsv, potentially even a:+0.5:path.tsv to specify both a coordinate system and a z-offset
As a bonus, when multiple samples are specified, one could use an automatic z-offset.
Furthermore, linear transforms might also be a nice thing
Predict field on a cube.
Investigate the likelihood riddle further.
Certainly, the missing field contribution is partly to blame.
But this issue is older than the fields.
* just checked the implementation of log_gamma and all its calls - and the implementation and handling of the 2nd argument seem to be all correct
* need to check the calls of gamma_distribution, too
* another thing: the p prior (the 2nd argument) in our case is mostly a negative odds... i'll need to check that it is used correctly in sampling, and likelihood calculation
* does it have to do with how that p prior is used?
The global feature log likelihood decreases because the values become larger
Merging and splitting could be well compatible with underlying factors.
E.g. in the multiple binary contrast setting, one might consider flipping the membership of factors
Need a tool to construct and visualize iso surfaces
Why does usage of fields aparrently entail that weak factors stay away from zero? See below.
The fact that the feature and mixing realizations of the have different sample function types indicates that the two might perhaps best be served with different classes, although possibly derived from the same base type
The problem with the decreasing global feature likelihood seems to be mostly solved (there appears to be some marginal, slow decrease of the global feature likelihood, but it seems to be simultaneously with an equally slow increase of the data likelihood; over all the joint log posterior is perhaps still decreasing but not by much - this all with 20 factors on one data set with the top 1000 genes)
Seems the problem was really caused by the wrong initialization of the contributions_gene_type and contributions_spot_type; as there are unequal numbers of spots and genes they get unequal treameant in the marginalization for the gibbs sampling of the other.
What remains to be resolved is the problem of the non-vanishing tail of the factor activity distribution, see the factor activity barplot.
This should be analyzed with
multiScoop --localthetapriors -t 20 -v --top 1000 --sample contributions,phi,theta,theta_prior
or some thing like that.
(-> feature hierarchy does not seem to have this problem - not true?)
The number of genes used is definitely of importance! For top 1000 genes there is only a linear decrease, but it seems if more genes are used then there is also a much reduced base line.
However, the fields do make a difference... without fields on all genes the tail does vanish; using fields on all genes it doesn't
Also, the results seem to be of rather bad quality with
multiScoop -t 20 -v --top 1000 --sample contributions,phi,theta
check wether this has to do with initialization
-> also seems to be the case for feature hierarchy
Both above problems (non-vanishing tails and weird quality) seem to be preserved with either of the CLI switches --localthetapriors and with also sampling gene priors.
Generally, problems should be analyzed in a minimal setting, i.e. with
multiScoop -t 20 --localthetapriors -v --top 1000 --sample contributions,phi,theta
Try out to inherit the partial models from Matrix
Profile once in a while
Put in warning / fix always putting samples in a joint coordinate system
For flipping factors, consider the following objective functions:
1. sum of Kullbach-Leibler divergence between global and local profile
2. likelihood-based; consider the number of gene-type attributed reads locally and globally
In any case, some kind of procedure needs to be derived, as each factor can realistically be compared with each other in O(T^2), but we cannot test all assignments.
1. Greedy: do the T^2, find the closest, link them, then proceed
2. Only consider one factor per iteration
a. consider exchange with one other factor
a. consider exchange with all other factors
It seems subtracting the local attributions from the global ones and re-attributing the possible link-candidate would favor shattering.
Two avenues to think ahead:
* factor factorization
* local factor exchange
Two things to urgently do:
1. fields
(1.5 the above factor exchange - if working might polish up the last of the BC data glitches - although potentially they're real; yet again perhaps not since - while repeatedly observed - they're still appearing in an odd spatial patterns across the slices).
2. filtering
Evaluate likelihood after contributions have been updated, not just at the end of the iteration since by that time the lambda_gst are outdated and don't fit with the updated spot scaling and feature baseline.
Add a guard mechanism against trying to calculate likelihood when the lambdas are out-of-date. Also relevant after loading.
Consider giving the contained classes references to their containing class.
Write command-based CLI, in which the default command is "run". Options are rerun (based on shell call file), continue (possibly on new data), dge, predict.
Split & merge
Why do the fields not exhibit sparcity?
How do we need determine P(x|x*) for Metropolis Hastings steps with succeeding Gibbs sampling?