forked from BIDS/datarray
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
98 lines (71 loc) · 3.72 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
.. -*- rest -*-
.. vim:syntax=rest
========================================
Datarray: Numpy arrays with named axes
========================================
Scientists, engineers, mathematicians and statisticians don't just work with
matrices; they often work with structured data, just like you'd find in a
table. However, functionality for this is missing from Numpy, and there are
efforts to create something to fill the void. This is one of those efforts.
.. warning::
This code is currently experimental, and its API *will* change! It is meant
to be a place for the community to understand and develop the right
semantics and have a prototype implementation that will ultimately
(hopefully) be folded back into Numpy.
Datarray provides a subclass of Numpy ndarrays that support:
- individual dimensions (axes) being labeled with meaningful descriptions
- labeled 'ticks' along each axis
- indexing and slicing by named axis
- indexing on any axis with the tick labels instead of only integers
- reduction operations (like .sum, .mean, etc) support named axis arguments
instead of only integer indices.
Prior Art
=========
At present, there is no accepted standard solution to dealing with tabular data
such as this. However, based on the following list of ad-hoc and proposal-level
implementations of something such as this, there is *definitely* a demand for
it. For examples, in no particular order:
* [Tabular](http://bitbucket.org/elaine/tabular/src) implements a
spreadsheet-inspired datatype, with rows/columns, csv/etc. IO, and fancy
tabular operations.
* [scikits.statsmodels](http://scikits.appspot.com/statsmodels) sounded as
though it had some features we'd like to eventually see implemented on top of
something such as datarray, and [Skipper](http://scipystats.blogspot.com/)
seemed pretty interested in something like this himself.
* [scikits.timeseries](http://scikits.appspot.com/timeseries) also has a
time-series-specific object that's somewhat reminiscent of labeled arrays.
* [pandas](http://pandas.sourceforge.net/) is based around a number of
DataFrame-esque datatypes.
* [pydataframe](http://code.google.com/p/pydataframe/) is supposed to be a
clone of R's data.frame.
* [larry](http://github.com/kwgoodman/la), or "labeled array," often comes up
in discussions alongside pandas.
* [divisi](http://github.com/commonsense/divisi2) includes labeled sparse and
dense arrays.
* [pymvpa](https://github.com/PyMVPA/PyMVPA) provides Dataset class
encapsulating the data together with matching in length sets of
attributes for the first two (samples and features) dimensions.
Dataset is not a subclass of numpy array to allow other data
structures (e.g. sparse matrices).
* [ptsa](http://git.debian.org/?p=pkg-exppsy/ptsa.git) subclasses
ndarray to provide attributes per dimensions aiming to ease
slicing/indexing given the values of the axis attributes
Project Goals
=============
1. Get something akin to this in the numpy core.
2. Stick to basic functionality such that projects like scikits.statsmodels and
pandas can use it as a base datatype.
3. Make an interface that allows for simple, pretty manipulation that doesn't
introduce confusion.
4. Oh, and make sure that the base numpy array is still accessible.
Code
====
You can find our sources and single-click downloads:
* `Main repository`_ on Github.
* Documentation_ for all releases and current development tree.
* Download as a tar/zip file the `current trunk`_.
* Downloads of all `available releases`_.
.. _main repository: http://github.com/fperez/datarray
.. _Documentation: http://fperez.github.com/datarray-doc
.. _current trunk: http://github.com/fperez/datarray/archives/master
.. _available releases: http://github.com/fperez/datarray/downloads