-
Notifications
You must be signed in to change notification settings - Fork 8
/
TODO
24 lines (18 loc) · 954 Bytes
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[*] separate feature creation from computed index
[ ] incremental indexing
- use mode 'append' but the index needs to be recomputed
[ ] distributed computation of the sparse multiplication
- use multi-processing module
- have workers compute a chuck of the matrix (a sequential list of items)
- merge sort each worker result
- accross machines (not just cores), we need distributed indexes as well
[ ] implement other feature types besides bag of words
- some basic image features (color histogram)
[ ] for bag of words features:
- multiple features in one table
- normalize the feature values
- database agnostic
[ ] SSCursor is better to fetch lots of rows but still has problems:
http://stackoverflow.com/questions/337479/how-to-get-a-row-by-row-mysql-resultset-in-python
[ ] to speed things, we could actually only perform the matrix multiplication on the reamining ids
(either by looping over each item or by manipulating the matrix)