Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for Neural Architecture Search with Reinforcement Learning #339

Merged
merged 24 commits into from
Feb 21, 2019

Conversation

jinan-zhou
Copy link
Contributor

@jinan-zhou jinan-zhou commented Jan 23, 2019

Fixes: #338.

The first suggestion we are working on is reinforcement learning. The algorithm follows the paper Neural Architecture Search with Reinforcement Learning by Zoph & Le (https://arxiv.org/abs/1611.01578), and the implementation is based on the github of Efficient Neural Architecture Search via Parameter Sharing (https://github.com/melodyguan/enas).

Related to #257


This change is Reviewable

@googlebot
Copy link

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and have the pull request author add another comment and the bot will run again. If the bot doesn't comment, it means it doesn't think anything has changed.

@jinan-zhou
Copy link
Contributor Author

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

@jinan-zhou jinan-zhou changed the title [WIP] Suggestion for Neural Architecture Search with Reinforcement Learning Suggestion for Neural Architecture Search with Reinforcement Learning Jan 30, 2019
@jinan-zhou
Copy link
Contributor Author

/assign @YujiOshima

pkg/suggestion/nasrl_service.py Show resolved Hide resolved
pkg/suggestion/nasrl_service.py Outdated Show resolved Hide resolved
@hougangliu
Copy link
Member

/assign @richardsliu @gaocegege

@hougangliu
Copy link
Member

/assign @johnugeorge

for w in worker_list:
if w.Worker.status == api_pb2.COMPLETED:
for ml in w.metrics_logs:
if ml.name != self.objective_name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if w.metrics_logs is like [objectivemetrics, othermetrics], line 214 will return othermetrics value (ml is othermetrics)

how about

if ml.name == self.objective_name:
    self.logger.info("Evaluation result of previous candidate: {}".format(ml.values[-1].value))
    return float(ml.values[-1].value)

and keep below to keep same indent with line 204 (completed_count can be removed)

self.logger.warning("Error. No trial has completed.")
return None

Copy link
Contributor Author

@jinan-zhou jinan-zhou Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe w.metrics_logs will only return one selected metric defined in https://github.com/kubeflow/katib/blob/master/examples/nasjob-example-RL.yaml#L12. The metrics collector will parse the pattern <metrics_name>: <metrics_value> from the log of training container and give it to suggestion.

Currently, the algorithm only spawns one trial each time and uses the metrics of the very last trial to update. That is why I will go through all the available trials and index the last one. In the future, the suggestion will be able to spawn multiple trials. Therefore, I think it is better to store all the metrics first and process the data afterwards. Currently the process is just taking the last one and return . In the future, it may become take the average and return.

Copy link
Member

@hougangliu hougangliu Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

w.metrics_logs will return Collection of https://github.com/kubeflow/katib/blob/master/examples/nasjob-example-RL.yaml#L12 and https://github.com/kubeflow/katib/blob/master/examples/nasjob-example-RL.yaml#L15. since only_latest_log=True, each of them only return last value, that is, ml.values[-1].value == ml.values[0].value

Maybe you can take a look by debug to verify it

Copy link
Contributor Author

@jinan-zhou jinan-zhou Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right! The program has been modified as suggested.

However, according to my testing run, writing like this causes the program to repeatedly print "Error. No trial has completed." So I removed return None and print error message and everything works fine. It is OK to not return, so that the GetEvaluationResult does not return anything and suggestion will not give any trials. And it is not en error if the status is not completed. There is no need to handle that.

# However, if the user want to minimize the metrics, we can take the negative of the result
if self.opt_direction == api_pb2.MINIMIZE:
result = -result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why delete above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I made a mistake. Should delete the code above that. Already fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks

@hougangliu
Copy link
Member

/lgtm

@hougangliu
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hougangliu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 3bb8b54 into kubeflow:master Feb 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants