Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add start_record interface #3128

Merged
merged 6 commits into from
Aug 1, 2017

Conversation

Yancey1989
Copy link
Contributor

Fixed #3127

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One comment.

@@ -54,6 +54,9 @@ def main():

# event_handler to print training and testing info
def event_handler(event):
if isinstance(event, paddle.event.BeginPass):
master_client.paddle_start_get_records(event.pass_id)
Copy link
Contributor

@helinwang helinwang Jul 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think paddle_start_get_records is a implementation detail of cloud reader, we need to hide it inside the reader implementation.
Since the new pass is also indicated by reader() being called again (see here), so maybe it's better to put it here (in the def reader implementation), so every time a new pass begin, reader() is called, then paddle_start_get_records is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need demo

@@ -98,14 +102,17 @@ def recordio(paths, buf_size=100):
if host_name not in os.environ.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the line host_name = "MASTER_SERVICE_HOST" we don't have a "master service" currently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -81,6 +81,9 @@ def reader():
return dec.buffered(reader, buf_size)


pass_num = 0


def recordio(paths, buf_size=100):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the name recordio as cloud_reader is more clear?

Copy link
Contributor Author

@Yancey1989 Yancey1989 Aug 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done.


def reader():
c = cloud(addr, buf_size)
c.set_dataset(paths)
c.paddle_start_get_records(pass_id)
global pass_num
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put global the first line under function define.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -90,6 +93,7 @@ def recordio(paths, buf_size=100):
"""
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need some demo code of how to use this reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -90,6 +93,7 @@ def recordio(paths, buf_size=100):
"""
import os
import paddle.v2.master.client as cloud
import cPickle as pickle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this, not see anywhere it being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it's my mistake, we need pickle.loads() with the record fetched from master.
Done.

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after addressing Wuyi's comments.

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM exept one comment

"""
Creates a data reader that outputs record one one by one
from given local or cloud recordio path.
Create a data reader that yield a record one bye one from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one bye one => one by one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

@Yancey1989
Copy link
Contributor Author

Yancey1989 commented Aug 1, 2017

I deleted the unit test for the older recordio reader because the newer cloud_reader is dependencies on the etcd mater process, and we use 'https://github.com/PaddlePaddle/Paddle/blob/develop/go/pserver/client/c/test/test_train.py to do the test.

@Yancey1989 Yancey1989 merged commit ec9d4d5 into PaddlePaddle:develop Aug 1, 2017
@Yancey1989 Yancey1989 deleted the fix_py_master_interface branch August 1, 2017 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants