-
Notifications
You must be signed in to change notification settings - Fork 439
Conversation
raise ValueError('`targets` has to be at least as long as `data`.') | ||
|
||
if hlength is None: | ||
if length % sampling_rate != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a DeprecationWarning
self.data = np.asarray(data) | ||
self.targets = np.asarray(targets) | ||
|
||
# FIXME: targets must be 2D for sequences output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a new limitation or was it always like that?
with assert_raises(ValueError) as context: | ||
TimeseriesGenerator(data, data, length=50, sampling_rate=0) | ||
error = str(context.exception) | ||
print(error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid the printing please
@srjoglekar246, Could I get your input on this PR? |
The data should be at 2D, and axis 0 is expected | ||
to be the time dimension. | ||
The data should be convertible into a 1D numpy array, | ||
if 2D or more, axis 0 is expected to be the time dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Might make this a little easier to understand if we say "and* axis 0"
assert data_gen[-1][1].tostring() == u"." | ||
|
||
t = np.linspace(0,20*np.pi, num=1000) # time | ||
x = np.sin(np.cos(3*t)) # input signa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signal*
assert data_gen[-1][0].tostring() == u" is simple" | ||
assert data_gen[-1][1].tostring() == u"." | ||
|
||
t = np.linspace(0,20*np.pi, num=1000) # time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after ","
stateful=False): | ||
|
||
# Sanity check | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove stray newline
|
||
|
||
def test_TimeseriesGenerator_exceptions(): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove newlines at the beginning of functions :-)
expected_len = ceil((len(x) - g.hlength + 1.0) / g.batch_size) | ||
print('gap: %i, hlength: %i, expected-len:%i, len: %i' % | ||
(g.gap, g.hlength, expected_len, g.len)) | ||
# for i in range(len(g)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why this is not applied yet?
I think TimeseriesGenerator is a perfect module to use, but downside it misses some basic features such as a prediction gap (which was implemented here) so I would be very happy to see this commit actually being integrated into the library.
Looks like this MR is more far reaching then this. In my MR i simply added a I'm hoping that these changes expose the same behavior I'm after (so the first sample of a batch isn't forced to follow the last sample of the previous batch)? |
This class looks like a lost cause in need of a total revision. Correct me if I'm wrong, but these lines will mix the batch dimension with a features dimension (timesteps), which is a big no-no and will obliterate training: import numpy as np
data = np.random.randn(240, 4) # (timesteps, channels)
length = 6
batch_size = 8
stride = 1
start_index, end_index, index = 6, 40, 0 i = start_index + batch_size * stride * index
rows = np.arange(i, min(i + batch_size * stride, end_index + 1), stride)
samples = np.array([data[row - length:row:sampling_rate] for row in rows])
print(samples.shape)
# (8, 6, 4) The only workaround is feeding data compatible with such a manipulation to begin with - as no measurements are ever taken in such a format, it'll require an uneasy preprocessing, maybe on-the-fly for various |
Closing outdated PR. Note that the Keras Preprocessing repository is no longer in use, instead please just use the dataset utilities provided in |
changes relative to previous version of
TimeseriesGenerator()
length
(with a depreciation warning)hlength
as a replacementOther changes relative to keras version
init()
method by adding a host of sanity checksgap
parametertarget_seq
option for seq2seq modelsdtype
parameter to force output dtypestateful
option to help parameters tuning in stateful mode (experimental)