Reading from a tf.data Dataset #171

modernAlcibiades · 2019-04-16T03:59:01Z

I have created a batch data iterator using tf.data api and tensorflow iterator but I do not know how to plug this into returnn. Is this functionality already implemented?

albertz · 2019-04-16T15:36:18Z

This is planned and partly implemented. (You find some outdated discussion in the documentation of TFDataPipeline.py. This was written when the tf.data API did not exist yet.)

In principle, when ExternData gets created, it should not create tf.placeholder but use the data iterator instead. This by itself is easy. But the handling for multiple epochs, etc, this can make it more tricky.

Anyway, this is on my TODO list. But not sure when I get time for it.

modernAlcibiades · 2019-04-18T07:46:43Z

thanks.

albertz · 2019-04-18T07:51:02Z

Btw, just for reference: The current way to implement a dataset is to derive from the RETURNN class Dataset (or maybe better CachedDataset2 or so; see many of the existing examples). This is totally independent from TF.

modernAlcibiades · 2019-04-22T05:26:02Z

Yeah, I have currently locally modified CachedDataset2 to take tf.data as input and return that as output to _collect_single_seqs, but I am having trouble replacing TFDataPipeline as direct eval and assignment to extern_data.data[key].placeholder causes shape issues in multiple layers for some reason I am unable to decipher.
This error occurs in 2018-asr-attention experiment in the s layer inside output rec layer.
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [58,10,1000] vs. shape[1] = [57,10,1000]

albertz · 2019-07-29T07:23:38Z

@wd929 This is not related to this post at all, or is it? To answer your question: You can use any you like. E.g. OggZipDataset (see tool bliss-to-ogg-zip), or HDFDataset (see tool hdf_dump). But even the original LibriSpeechDataset (just prepare it accordingly). See the documentation.
Please don't ask further questions on this here in this issue, which is not related to that. Open a new issue if you think there is a bug in Returnn. If you have questions, please post it on StackOverflow and we can answer it there.

albertz · 2020-05-19T14:04:26Z

I'm closing this now in favor of #292.

albertz mentioned this issue May 19, 2020

New TF dataset pipeline: draft #292

Open

albertz closed this as completed May 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading from a tf.data Dataset #171

Reading from a tf.data Dataset #171

modernAlcibiades commented Apr 16, 2019

albertz commented Apr 16, 2019

modernAlcibiades commented Apr 18, 2019

albertz commented Apr 18, 2019

modernAlcibiades commented Apr 22, 2019 •

edited

Loading

albertz commented Jul 29, 2019

albertz commented May 19, 2020

Reading from a tf.data Dataset #171

Reading from a tf.data Dataset #171

Comments

modernAlcibiades commented Apr 16, 2019

albertz commented Apr 16, 2019

modernAlcibiades commented Apr 18, 2019

albertz commented Apr 18, 2019

modernAlcibiades commented Apr 22, 2019 • edited Loading

albertz commented Jul 29, 2019

albertz commented May 19, 2020

modernAlcibiades commented Apr 22, 2019 •

edited

Loading