Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraction of data from my timeseries fails with ZeroDivisionError #399

Closed
vganapathy-lifesize opened this issue Jul 31, 2018 · 2 comments
Closed

Comments

@vganapathy-lifesize
Copy link

vganapathy-lifesize commented Jul 31, 2018

I have the following dataset (haven't put the complete one here)
id,date,participants
250,1517344239,2
418,1497884457,6
63,1515513662,3
67,1498667379,2
498,1503235860,2
45,1501160446,10
61,1515016822,3
1,1515169968,2
563,1497884443,8
184,1523390349,3
42,1516111608,3
85,1516095293,2
498,1503531487,2

id - represents virtual meetings (each id identifies a series of meetings that occured. For some could have occurred once every week, some every day some once every n weeks and so on). I am treating each meeting as a series. Hope thats fine
date - time (UTC) when the meeting happened
participants - number of participants that joined the meeting

I am trying to see if I can apply time series analysis to this kind of a data to predict number of users who might join a future occurrence. I am exploring tsfresh to see if I can get more features out for my data which can improve my prediction (its a regression problem). This is the context

series = pd.read_csv('meetings.data')
X = series[[col for col in series.columns if col != 'participants']]
y = series['participants']
extracted_features = extract_features(X, column_id='id', column_sort='date')

This results in following error

Traceback (most recent call last):
  File "tsfresh_predict.py", line 16, in <module>
    extracted_features = extract_features(X, column_id='id', column_sort='date')
  File "/Users/vganapathy/mcufe/dev3/lib/python3.6/site-packages/tsfresh/feature_extraction/extraction.py", line 152, in extract_features
    distributor=distributor)
  File "/Users/vganapathy/mcufe/dev3/lib/python3.6/site-packages/tsfresh/feature_extraction/extraction.py", line 233, in _do_extraction
    function_kwargs=kwargs)
  File "/Users/vganapathy/mcufe/dev3/lib/python3.6/site-packages/tsfresh/utilities/distribution.py", line 142, in map_reduce
    total_number_of_expected_results = math.ceil(data_length / chunk_size)
ZeroDivisionError: division by zero

In my data time of the meeting is the only parameter I have. Just would like to know what is causing this exception.

@nikhase
Copy link
Contributor

nikhase commented Aug 2, 2018

You have the ZeroDivisionError because you pass no data. You extract features on the X dataframe, but this solely consists of the columns id and date.
You need to use your series dataframe and specify a column_value like this:

extracted_features = extract_features(series, column_id='id', column_sort='date', column_value='participants')

@MaxBenChrist
Copy link
Collaborator

Actually, we should add a test + warning message for this case. Will see that I add a test + error message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants