Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feed model engine #39

Merged
merged 33 commits into from
Jul 21, 2020
Merged

Feed model engine #39

merged 33 commits into from
Jul 21, 2020

Conversation

davitbzh
Copy link
Contributor

Work is not ready yet but will give you good idea what FeedModel engine looks looks like:

train_dataset = fs.get_training_dataset("train_dataset", 1)
train_tfdataset = train_dataset.feed(label_name='target', batch_size=32, num_epochs=1).TFRecordDataset()

At the moment only tfrecords reader is implemented (can't yet parse array data types).

davitbzh added 11 commits April 28, 2020 00:26
simple bug fix self.training_dataset.training_dataset.schema

adding pydoop to read from hdfs

adding tfrecords schema

adding tfrecords schema

temporary fix for create_tf_record_schema

temporary fix for create_tf_record_schema

temporary fix for create_tf_record_schema

temporary fix for create_tf_record_schema
@moritzmeister moritzmeister added python WIP This issue or pull request is a work in progress labels Apr 30, 2020
@moritzmeister
Copy link
Contributor

A general comment upfront. So this is a lot of logic to convert between Spark Types to the Tensorflow Types. This logic is currently duplicated in java. I think we should find a solution that does not require this massive if ... else ... construct. The TFrecord spark connector is somehow converting between the types, so we should make use of that and call that to infer the TF schema, so we are consistent.

@davitbzh
Copy link
Contributor Author

davitbzh commented Jul 2, 2020

@moritzmeister now it can be directly inferred from ftrecord files. no spark and no extra information is required to read. I tested both on tf1 and tf2. However, didn't manage to install hsfs to properly test it. Getting certificate error when I try to connect to fs. Can you please look at it if in general it looks fine?

API now looks like this

anomaly_data = fs.get_training_dataset("anomaly_data", 1)
train = anomaly_data.read("train")
train_input = anomaly_data.feed(target_names=['target'],split='train').tf_record_dataset()

@davitbzh davitbzh merged commit 4781152 into logicalclocks:master Jul 21, 2020
davitbzh added a commit that referenced this pull request Jul 21, 2020
moritzmeister added a commit that referenced this pull request Jul 22, 2020
SirOibaf pushed a commit that referenced this pull request Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python WIP This issue or pull request is a work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants