[discussion] How to set clusterspec #369

gaocegege · 2018-02-04T10:32:36Z

Now we set the TF_CONFIG to get the cluster spec in the training code, and it follows the idea in Google Cloud Machine Learning Engine (Cloud ML Engine).

In TensorFlow distributed training docs, it uses CLI arguments instead. We can not support those two ways since we want to hide the service discovery layer from the users, so I think we could discuss which one will be supported. Maybe AI engineers could give us more info.

\cc @DjangoPeng @ScorpioCPH

The text was updated successfully, but these errors were encountered:

jlewi · 2018-02-04T19:26:52Z

TF_CONFIG is a TensorFlow convention that TF APIs like the EstimatorAPI use to get information about the runtime environment and configure the job appropriately.

If users want to use command line arguments, they can write a launcher script that parses TF_CONFIG and sets the command line arguments as needed. This is what we do in the TFCNN example; see https://github.com/kubeflow/kubeflow/blob/master/tf-controller-examples/tf-cnn/launcher.py

I think the problem with command line arguments is that everyone will use slightly different conventions.

gaocegege · 2018-02-05T01:40:37Z

SGTM, WDYT @DjangoPeng

gaocegege · 2018-02-15T01:19:43Z

I am closing it since it is stale. If any comments I will reopen it, thanks.

gaocegege added area/api kind/discussion labels Feb 4, 2018

gaocegege closed this as completed Feb 15, 2018

gaocegege mentioned this issue Mar 6, 2018

Add tf-operator design doc for API v1alpha2 kubeflow/community#30

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[discussion] How to set clusterspec #369

[discussion] How to set clusterspec #369

gaocegege commented Feb 4, 2018

jlewi commented Feb 4, 2018

gaocegege commented Feb 5, 2018

gaocegege commented Feb 15, 2018

[discussion] How to set clusterspec #369

[discussion] How to set clusterspec #369

Comments

gaocegege commented Feb 4, 2018

jlewi commented Feb 4, 2018

gaocegege commented Feb 5, 2018

gaocegege commented Feb 15, 2018