Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading from a tf.data Dataset #171

Closed
modernAlcibiades opened this issue Apr 16, 2019 · 6 comments
Closed

Reading from a tf.data Dataset #171

modernAlcibiades opened this issue Apr 16, 2019 · 6 comments

Comments

@modernAlcibiades
Copy link

I have created a batch data iterator using tf.data api and tensorflow iterator but I do not know how to plug this into returnn. Is this functionality already implemented?

@albertz
Copy link
Member

albertz commented Apr 16, 2019

This is planned and partly implemented. (You find some outdated discussion in the documentation of TFDataPipeline.py. This was written when the tf.data API did not exist yet.)

In principle, when ExternData gets created, it should not create tf.placeholder but use the data iterator instead. This by itself is easy. But the handling for multiple epochs, etc, this can make it more tricky.

Anyway, this is on my TODO list. But not sure when I get time for it.

@modernAlcibiades
Copy link
Author

thanks.

@albertz
Copy link
Member

albertz commented Apr 18, 2019

Btw, just for reference: The current way to implement a dataset is to derive from the RETURNN class Dataset (or maybe better CachedDataset2 or so; see many of the existing examples). This is totally independent from TF.

@modernAlcibiades
Copy link
Author

modernAlcibiades commented Apr 22, 2019

Yeah, I have currently locally modified CachedDataset2 to take tf.data as input and return that as output to _collect_single_seqs, but I am having trouble replacing TFDataPipeline as direct eval and assignment to extern_data.data[key].placeholder causes shape issues in multiple layers for some reason I am unable to decipher.
This error occurs in 2018-asr-attention experiment in the s layer inside output rec layer.
InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [58,10,1000] vs. shape[1] = [57,10,1000]

@albertz
Copy link
Member

albertz commented Jul 29, 2019

@wd929 This is not related to this post at all, or is it? To answer your question: You can use any you like. E.g. OggZipDataset (see tool bliss-to-ogg-zip), or HDFDataset (see tool hdf_dump). But even the original LibriSpeechDataset (just prepare it accordingly). See the documentation.
Please don't ask further questions on this here in this issue, which is not related to that. Open a new issue if you think there is a bug in Returnn. If you have questions, please post it on StackOverflow and we can answer it there.

@albertz
Copy link
Member

albertz commented May 19, 2020

I'm closing this now in favor of #292.

@albertz albertz closed this as completed May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants