Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement master client for reading training tasks #2468

Merged
merged 8 commits into from
Jun 19, 2017

Conversation

helinwang
Copy link
Contributor

No description provided.

if err != nil {
// TODO(helin): wait before move on with next
// getTask call.
log.Println(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we all use glog instead of log, so we can do glog.Errorf or glog.Infof to separate info log and error log. Use glog.V(10).Infof to log more detailed debug logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! glog does not work with flag package other than official "flag", can we use https://github.com/sirupsen/logrus instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice lib! This lib seems also able to send logs to "logstash" in json format, this is very useful when run jobs on kubernetes, we can collect logs and search logs using "EFK"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@typhoonzero Thanks for mentioning logstash and EFK, good to know about them!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


s := recordio.NewRangeScanner(f, &chunk.Index, -1, -1)
for s.Scan() {
c.ch <- s.Record()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line will block util some reader read the record from this channel. How to deal with "batches" like paddle.dataset.batch(some_reader(), 32)? Will adding channel buffers like c.ch = make(chan []byte, batch_size) increase the efficiency?

Copy link
Contributor Author

@helinwang helinwang Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use paddle.reader.buffered for this purpose. Still I think that's a good point. Will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,19 @@
from setuptools import setup, Distribution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to add a single python package to install master_client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with Python packaging, can you let me know what's the better way to do it? I will make the change. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it again, I am planning to put it together within paddle Python package, integrated with paddle's CMake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@helinwang helinwang force-pushed the master_dispatch branch 3 times, most recently from 4ce639e to cadb481 Compare June 16, 2017 02:10
Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a naming problem.

'plot',
'evaluator',
'image',
'master',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python module master stands for "master client" or "job master client". Using name "master" may be confused for someone reading the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when user use the package, it will be

import paddle.v2 as paddle
client = paddle.master.client("localhost:3000", 30)
record = client.NextRecord()

I was thinking since there is a client after package name (paddle.master.client), maybe it's clear that it means master's client?

I am actually not very familiar with Python's naming convention, do you have any suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I see. Usage like paddle.master.client is definitely fine.

Copy link
Contributor

@typhoonzero typhoonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@helinwang helinwang merged commit 1a12720 into PaddlePaddle:develop Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants