Skip to content

Allow time series of 3D vectors #4913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ufechner7 opened this issue Sep 21, 2013 · 13 comments
Closed

Allow time series of 3D vectors #4913

ufechner7 opened this issue Sep 21, 2013 · 13 comments

Comments

@ufechner7
Copy link

We use Pandas to analyse flight data. Many of the recorded measurements are 3D vectors of double, e.g. position, velocity, acceleration. Currently I can only store scalars or objects in a time series. Scalars make the dataset very large, processing it is not very convenient.

I would like to be able to do:
df.velocity.norm().plot()

to plot the norm of the velocity vector that is stored in the data-frame.

Currently I have to type:
pd.Series(np.sqrt(df.velocity_x2 + df.velocity_y2 + df.velocity_z**2)).plot()

which is not very convenient.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

have you tried using a Panel? or multi-level frame?

what you are describing is not efficient at all (to store a vector inside a vector).

@ufechner7
Copy link
Author

Well, why should two dimensional vectors (and a time-series is stored as np.array, as far as I know) be inefficient? I will look at the Panel again to find out why this did not work for me.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

might this be what you are looking for?

In [2]: df = DataFrame(np.random.randn(6,3),columns=['position','velocity','accel'])

In [3]: df
Out[3]: 
   position  velocity     accel
0  0.113437  0.051340  2.102710
1 -0.730743 -0.411552 -0.356178
2  0.790855 -0.630646 -0.213115
3 -0.297808  0.765525  3.008339
4 -1.037385 -0.725488  1.382944
5  0.528578  0.116152  1.253986

In [4]: df['sample'] = [1,1,1,2,2,2]

In [5]: df['time'] = list(date_range('20130101 9:01:01',freq='s',periods=3)) + list(date_range('20130101 9:01:01',freq='s',periods=3))

In [6]: df
Out[6]: 
   position  velocity     accel  sample                time
0  0.113437  0.051340  2.102710       1 2013-01-01 09:01:01
1 -0.730743 -0.411552 -0.356178       1 2013-01-01 09:01:02
2  0.790855 -0.630646 -0.213115       1 2013-01-01 09:01:03
3 -0.297808  0.765525  3.008339       2 2013-01-01 09:01:01
4 -1.037385 -0.725488  1.382944       2 2013-01-01 09:01:02
5  0.528578  0.116152  1.253986       2 2013-01-01 09:01:03

In [9]: df = df.set_index(['sample','time'])

In [10]: df
Out[10]: 
                            position  velocity     accel
sample time                                             
1      2013-01-01 09:01:01  0.113437  0.051340  2.102710
       2013-01-01 09:01:02 -0.730743 -0.411552 -0.356178
       2013-01-01 09:01:03  0.790855 -0.630646 -0.213115
2      2013-01-01 09:01:01 -0.297808  0.765525  3.008339
       2013-01-01 09:01:02 -1.037385 -0.725488  1.382944
       2013-01-01 09:01:03  0.528578  0.116152  1.253986

In [11]: np.sqrt(df['position']**2 + df['velocity']**2 + df['accel']**2)
Out[11]: 
sample  time               
1       2013-01-01 09:01:01    2.106393
        2013-01-01 09:01:02    0.911166
        2013-01-01 09:01:03    1.033724
2       2013-01-01 09:01:01    3.118465
        2013-01-01 09:01:02    1.874843
        2013-01-01 09:01:03    1.365784
dtype: float64

@ufechner7
Copy link
Author

Ok, a panel can be used if you have three dimensional vectors only.

But two questions remain:
a) we use different coordinate systems. This means, that the components are not always
called x, y and z, but also phi, theta and psi or elevation, azimuth and height.
If a panel consists of data frames, is it possible to use different column names in these data frames?
b) we have currently 99 colums in our data set. 14 of them are 3D vectors, the rest are scalars or structs
of a different dimension. Is there a way to combine data frames of different number of colums to a panel?

Example code for 3d vectors only (this works fine):

import pandas as pd
import numpy as np
import numpy.linalg as la

"""
        items: axis 0, each item corresponds to a DataFrame contained inside (pos, vel, acc)
        major_axis: axis 1, it is the index (rows) of each of the DataFrames (timestamps)
        minor_axis: axis 2, it is the columns of each of the DataFrames      (x, y, z)
"""
pa = pd.Panel(np.random.randn(3, 5, 4), items=['pos', 'vel', 'acc'],
              major_axis = pd.date_range(start='2001-01-01 00:00:00', end='2001-01-01 00:00:00.2', freq='50L'),
              minor_axis=['x', 'y', 'z', 'norm'])

pa.pos.norm = 0
pa.vel.norm = 0
pa.acc.norm = 0
pa.pos.norm = la.norm(pa.pos.as_matrix(), axis=1)
pa.vel.norm = la.norm(pa.vel.as_matrix(), axis=1)
pa.acc.norm = la.norm(pa.acc.as_matrix(), axis=1)

print pa.to_frame()
print
print "pa.pos\n", pa.pos

Best regards:
Uwe Fechner

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

@ufechner7 I think you could create a class to hold some of this information with variables of a Panel/Multi-index frame.

Don't try to jam too much into a single data structures, sometimes have multiple objects is the way to go

@ufechner7
Copy link
Author

Well, the data I have is kind of limited in size, just the data of many sensors of 2 hour flights with 20 Hz sampling rate (in average). This means that one dataset has a size of 100-150 MB (uncompressed).

I found out that I can use a Panel4D to store vectors of different type and scalars (all this data are time series) in one data structure. This works OK, but there is room for improvement w.r.t. to the Panel4D implementation.

The following code shows how to store scalars and 3d vectors in one Panel4D object:

""" Test code for using a panel object for storing dataframes with different column sets. """

import pandas as pd
import numpy as np

"""
        items: axis 0, each item corresponds to a DataFrame contained inside (pos, vel, acc)
        major_axis: axis 1, it is the index (rows) of each of the DataFrames (timestamps)
        minor_axis: axis 2, it is the columns of each of the DataFrames      (x, y, z)
"""
XYZ  = ['x','y','z']
NORM = ['norm']
ENU  = ['pos', 'vel', 'acc']
pav = pd.Panel(np.random.randn(3, 6, 3), items=ENU,
              major_axis = pd.date_range(start='2001-01-01 00:00:00', end='2001-01-01 00:00:00.25', freq='50L'),
              minor_axis=['x', 'y', 'z'])

pas = pd.Panel(np.random.randn(5, 6, 1), items=['pos', 'vel', 'acc', 'v_reel_out','u_winch'],
              major_axis = pd.date_range(start='2001-01-01 00:00:00', end='2001-01-01 00:00:00.25', freq='50L'),
              minor_axis=['norm'])

pav.pos.norm = 0
pav.vel.norm = 0
pav.acc.norm = 0

data = { 'ENU'    : pav ,
         'scalar' : pas }

pa = pd.Panel4D(data)

print pa
print pa.ENU.loc[ENU].to_frame()

print pa.ENU.pos[XYZ]
print
print pa.scalar.to_frame()
print pa.scalar.pos[NORM]

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

@ufechner7 you can certainly do that but keep in mind that the scalar data is 'replicated', e.g. it is not sparse. this may or may not work for you.

if you are simultaneously using say 3-d and scalars then it makes sense to keep related objects (that are pandas objects), but not necessarily put them in one object....up2u

e.g.

{ 'nam' : 'sample1', '3d' : Panel, 'scalars' : Series }, essentially a non-homogeneous container

@ufechner7
Copy link
Author

Why is the data handling not sparse? Why is it not possible to use Panel4D as container for non-homogeneous data? I want to be able to filter in the time axis, and that becomes complicated if I use a dictionary of pandas objects as container.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

why would you expect it to be sparse?

its a 4-dim container that is homogeneous in dimensions, e.g. there is a recorded value for each of the dimensions. You are able to put non-homogeneous types (e.g. floats/strings etc).

This is true of all pandas objects (and numpy objects in general).

Try doing df.values, p.values, or p4d.values and it will become clear

@ufechner7
Copy link
Author

Well, I think that the idea is that Pandas offers more features than just numpy arrays. I think I will open a new issue "Pandas should support packed, heterogeneous, numeric data structures."

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

Please don't. We already have one: #3443

@cpcloud
Copy link
Member

cpcloud commented Sep 21, 2013

@ufechner7 If you'd like to discuss some of your ideas over at #3443, we would love to hear them. Deciding on an API for the proposed RelationalDataFrame is worth plenty of discussion, if we decide that's the way to go. Also, your use cases can help us mold the API.

@jreback
Copy link
Contributor

jreback commented Oct 3, 2013

closing as not a bug

@jreback jreback closed this as completed Oct 3, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants