Skip to content
This repository was archived by the owner on Sep 18, 2024. It is now read-only.

New TimeseriesGenerator #7

Closed
wants to merge 6 commits into from
Closed

Conversation

TanguyUrvoy
Copy link

changes relative to previous version of TimeseriesGenerator()

  • kept length (with a depreciation warning)
  • added hlength as a replacement
  • same parameters order

Other changes relative to keras version

  • consolidated the init() method by adding a host of sanity checks
  • now it also works with text
  • added prediction gap parameter
  • shuffle is now a real shuffle (i.e. without replacement)
  • added target_seq option for seq2seq models
  • added dtype parameter to force output dtype
  • added stateful option to help parameters tuning in stateful mode (experimental)

raise ValueError('`targets` has to be at least as long as `data`.')

if hlength is None:
if length % sampling_rate != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a DeprecationWarning

self.data = np.asarray(data)
self.targets = np.asarray(targets)

# FIXME: targets must be 2D for sequences output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new limitation or was it always like that?

with assert_raises(ValueError) as context:
TimeseriesGenerator(data, data, length=50, sampling_rate=0)
error = str(context.exception)
print(error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid the printing please

@Dref360
Copy link
Contributor

Dref360 commented Jul 27, 2018

@srjoglekar246, Could I get your input on this PR?

The data should be at 2D, and axis 0 is expected
to be the time dimension.
The data should be convertible into a 1D numpy array,
if 2D or more, axis 0 is expected to be the time dimension.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Might make this a little easier to understand if we say "and* axis 0"

assert data_gen[-1][1].tostring() == u"."

t = np.linspace(0,20*np.pi, num=1000) # time
x = np.sin(np.cos(3*t)) # input signa

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signal*

assert data_gen[-1][0].tostring() == u" is simple"
assert data_gen[-1][1].tostring() == u"."

t = np.linspace(0,20*np.pi, num=1000) # time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after ","

stateful=False):

# Sanity check

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove stray newline



def test_TimeseriesGenerator_exceptions():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove newlines at the beginning of functions :-)

expected_len = ceil((len(x) - g.hlength + 1.0) / g.batch_size)
print('gap: %i, hlength: %i, expected-len:%i, len: %i' %
(g.gap, g.hlength, expected_len, g.len))
# for i in range(len(g)):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

@rragundez rragundez added the sequence Related to sequences label Jan 13, 2019
Copy link

@wptmdoorn wptmdoorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason why this is not applied yet?
I think TimeseriesGenerator is a perfect module to use, but downside it misses some basic features such as a prediction gap (which was implemented here) so I would be very happy to see this commit actually being integrated into the library.

@rjmccabe3701
Copy link

Looks like this MR is more far reaching then this. In my MR i simply added a inter_batch_stride option.

I'm hoping that these changes expose the same behavior I'm after (so the first sample of a batch isn't forced to follow the last sample of the previous batch)?

@OverLordGoldDragon
Copy link

OverLordGoldDragon commented Oct 11, 2019

This class looks like a lost cause in need of a total revision. Correct me if I'm wrong, but these lines will mix the batch dimension with a features dimension (timesteps), which is a big no-no and will obliterate training:

import numpy as np

data = np.random.randn(240, 4)  # (timesteps, channels)
length = 6
batch_size = 8
stride = 1
start_index, end_index, index = 6, 40, 0
i = start_index + batch_size * stride * index
rows = np.arange(i, min(i + batch_size * stride, end_index + 1), stride)

samples = np.array([data[row - length:row:sampling_rate] for row in rows])
print(samples.shape)
# (8, 6, 4)

The only workaround is feeding data compatible with such a manipulation to begin with - as no measurements are ever taken in such a format, it'll require an uneasy preprocessing, maybe on-the-fly for various length settings, which may substantially slow down training. There are other pitfalls I'll spare listing - but as I'm not too familiar with TimeseriesGenerator, maybe I'm missing something - feel free to point out.

@fchollet
Copy link
Contributor

fchollet commented Feb 9, 2022

Closing outdated PR. Note that the Keras Preprocessing repository is no longer in use, instead please just use the dataset utilities provided in tf.keras.utils (see docs).

@fchollet fchollet closed this Feb 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
sequence Related to sequences
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants