diff --git a/examples/advanced/experiment-tracking/mlflow/README.md b/examples/advanced/experiment-tracking/mlflow/README.md index 9170885d32..a3d986426a 100644 --- a/examples/advanced/experiment-tracking/mlflow/README.md +++ b/examples/advanced/experiment-tracking/mlflow/README.md @@ -42,7 +42,7 @@ app_server app_site-1 app_site-2 log.txt tb_events By default, MLflow will create an experiment log directory under a directory named "mlruns" in the simulator's workspace. If you ran the simulator with "/tmp/nvflare" as the workspace, then you can launch the MLflow UI with: ``` -mlflow ui --backend-store-uri /tmp/nvflare/mlruns/ +mlflow ui --backend-store-uri /tmp/nvflare/server/simulate_job/mlruns/ ``` ### 4. MLflow Streaming diff --git a/examples/advanced/experiment-tracking/mlflow/experiment_tracking.ipynb b/examples/advanced/experiment-tracking/mlflow/experiment_tracking.ipynb index 0a92d40c2d..b1070aceab 100644 --- a/examples/advanced/experiment-tracking/mlflow/experiment_tracking.ipynb +++ b/examples/advanced/experiment-tracking/mlflow/experiment_tracking.ipynb @@ -88,14 +88,38 @@ "\n", "```\n", "nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-pt-tb-mlflow\n", - "```" + "```\n", + "\n", + "or set the PYTHONPATH programmatically. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "faeb00d2-fa1f-4a95-b2b3-0029d3a4a671", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "parent_directory = os.path.dirname(os.getcwd())\n", + "\n", + "# Get the current PATH\n", + "current_path = os.environ.get('PYTHONPATH', '')\n", + "\n", + "# Add the path if it's not already there\n", + "if parent_directory not in current_path:\n", + " os.environ['PYTHONPATH'] = parent_directory + os.pathsep + current_path\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c8f08cef", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "!nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-pt-tb-mlflow" @@ -120,10 +144,31 @@ "To view training metrics that are being streamed to the server, run:\n", "\n", "```\n", - "tensorboard --logdir=/tmp/nvflare/simulate_job/tb_events\n", + "tensorboard --logdir=/tmp/nvflare/server/simulate_job/tb_events\n", "```" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "40441575-95e6-47ec-907a-af93e1c77949", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "!tensorboard --logdir=/tmp/nvflare/server/simulate_job/tb_events" + ] + }, + { + "cell_type": "markdown", + "id": "c85b6330-9a99-4751-ac03-307c4da9f0c5", + "metadata": {}, + "source": [ + ">Note \n", + "Remember to \"stop\" above cell before running next cell" + ] + }, { "cell_type": "markdown", "id": "534d7879", @@ -137,7 +182,7 @@ "To view training metrics that are being streamed to the server, run:\n", "\n", "```\n", - "mlflow ui --backend-store-uri=/tmp/nvflare/mlruns\n", + "mlflow ui --backend-store-uri=/tmp/nvflare/server/simulate_job/mlruns\n", "```\n", "\n", "Then \n", @@ -149,11 +194,29 @@ "cell_type": "code", "execution_count": null, "id": "da1e7952-c3e6-4e90-a42e-648a823ede78", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ - "!mlflow ui --backend-store-uri=/tmp/nvflare/mlruns" + "!mlflow ui --backend-store-uri=/tmp/nvflare/server/simulate_job/mlruns" ] + }, + { + "cell_type": "markdown", + "id": "278a1d7b-a71d-4b50-bbd8-61bc5122d423", + "metadata": {}, + "source": [ + "> Note: remember to \"stop\" above cell" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "486046c6-0b74-4d95-925b-e175799df6f9", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -172,7 +235,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.17" + "version": "3.8.18" } }, "nbformat": 4, diff --git a/examples/advanced/experiment-tracking/mlflow/jobs/hello-pt-tb-mlflow/app/config/config_fed_server.conf b/examples/advanced/experiment-tracking/mlflow/jobs/hello-pt-tb-mlflow/app/config/config_fed_server.conf index e9f94a9b82..3dceed2149 100644 --- a/examples/advanced/experiment-tracking/mlflow/jobs/hello-pt-tb-mlflow/app/config/config_fed_server.conf +++ b/examples/advanced/experiment-tracking/mlflow/jobs/hello-pt-tb-mlflow/app/config/config_fed_server.conf @@ -49,6 +49,7 @@ "id": "mlflow_receiver_with_tracking_uri", "path": "nvflare.app_opt.tracking.mlflow.mlflow_receiver.MLflowReceiver", "args": { + "tracking_uri": "file:///{WORKSPACE}/{JOB_ID}/mlruns" "kwargs": { "experiment_name": "hello-pt-experiment", "run_name": "hello-pt-with-mlflow", diff --git a/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb b/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb index 37c71982f0..6a2db8b6bc 100644 --- a/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb +++ b/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb @@ -55,26 +55,12 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": null, "id": "c3dbde69", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Using memory store\n", - "SystemInfo\n", - "server_info:\n", - "status: stopped, start_time: Thu Jan 26 11:52:39 2023\n", - "client_info:\n", - "site_a(last_connect_time: Thu Jan 26 14:56:33 2023)\n", - "site_b(last_connect_time: Thu Jan 26 14:56:33 2023)\n", - "job_info:\n", - "\n" - ] - } - ], + "metadata": { + "tags": [] + }, + "outputs": [], "source": [ "import os\n", "from nvflare.fuel.flare_api.flare_api import new_secure_session\n", @@ -98,25 +84,33 @@ "source": [ "### 4. Submit the Job with the FLARE API\n", "\n", - "With a session successfully connected, you can use `submit_job()` to submit your job. You can change `path_to_example_job` to the location of the job you are submitting. If your session is not active, go back to the previous step and connect with a session." + "With a session successfully connected, you can use `submit_job()` to submit your job. You can change `path_to_example_job` to the location of the job you are submitting. If your session is not active, go back to the previous step and connect with a session.\n", + "\n", + "With POC command, we link the examples to the following directory ``` /tmp/nvflare/poc/example_project/prod_00/admin@nvidia.com/transfer```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b3589b60-434b-4b6d-97bc-74e95bbc7b52", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "ls -l /tmp/nvflare/poc/example_project/prod_00/admin@nvidia.com/transfer\n" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "id": "c8f08cef", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "cd3f69d4-c78b-47fa-8db0-bb6325cc7be5 was submitted\n" - ] - } - ], + "metadata": { + "tags": [] + }, + "outputs": [], "source": [ - "path_to_example_job = \"/workspace/NVFlare/examples/hello-numpy-sag/jobs/hello-numpy-sag\"\n", + "path_to_example_job = \"hello-world/hello-numpy-sag/jobs/hello-numpy-sag\"\n", "job_id = sess.submit_job(path_to_example_job)\n", "print(job_id + \" was submitted\")" ] @@ -137,32 +131,12 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "id": "03fd93d0", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'name': 'hello-numpy-sag', 'job_folder_name': 'hello-numpy-sag', 'resource_spec': {}, 'deploy_map': {'hello-numpy-sag': ['@ALL']}, 'min_clients': 1, 'submitter_name': 'super@a.org', 'submitter_org': 'org_a', 'submitter_role': 'project_admin', 'job_id': 'cd3f69d4-c78b-47fa-8db0-bb6325cc7be5', 'submit_time': 1674763013.0868688, 'submit_time_iso': '2023-01-26T14:56:53.086869-05:00', 'start_time': '2023-01-26 14:56:54.702227', 'duration': 'N/A', 'status': 'RUNNING', 'job_deploy_detail': ['server: OK', 'site_a: OK', 'site_b: OK'], 'schedule_count': 1, 'last_schedule_time': 1674763013.6890023, 'schedule_history': ['2023-01-26 14:56:53: scheduled']}\n", - "{'name': 'hello-numpy-sag', 'job_folder_name': 'hello-numpy-sag', 'resource_spec': {}, 'deploy_map': {'hello-numpy-sag': ['@ALL']}, 'min_clients': 1, 'submitter_name': 'super@a.org', 'submitter_org': 'org_a', 'submitter_role': 'project_admin', 'job_id': 'cd3f69d4-c78b-47fa-8db0-bb6325cc7be5', 'submit_time': 1674763013.0868688, 'submit_time_iso': '2023-01-26T14:56:53.086869-05:00', 'start_time': '2023-01-26 14:56:54.702227', 'duration': 'N/A', 'status': 'RUNNING', 'job_deploy_detail': ['server: OK', 'site_a: OK', 'site_b: OK'], 'schedule_count': 1, 'last_schedule_time': 1674763013.6890023, 'schedule_history': ['2023-01-26 14:56:53: scheduled']}\n", - "{'name': 'hello-numpy-sag', 'job_folder_name': 'hello-numpy-sag', 'resource_spec': {}, 'deploy_map': {'hello-numpy-sag': ['@ALL']}, 'min_clients': 1, 'submitter_name': 'super@a.org', 'submitter_org': 'org_a', 'submitter_role': 'project_admin', 'job_id': 'cd3f69d4-c78b-47fa-8db0-bb6325cc7be5', 'submit_time': 1674763013.0868688, 'submit_time_iso': '2023-01-26T14:56:53.086869-05:00', 'start_time': '2023-01-26 14:56:54.702227', 'duration': 'N/A', 'status': 'RUNNING', 'job_deploy_detail': ['server: OK', 'site_a: OK', 'site_b: OK'], 'schedule_count': 1, 'last_schedule_time': 1674763013.6890023, 'schedule_history': ['2023-01-26 14:56:53: scheduled']}\n", - "....................\n", - "{'name': 'hello-numpy-sag', 'job_folder_name': 'hello-numpy-sag', 'resource_spec': {}, 'deploy_map': {'hello-numpy-sag': ['@ALL']}, 'min_clients': 1, 'submitter_name': 'super@a.org', 'submitter_org': 'org_a', 'submitter_role': 'project_admin', 'job_id': 'cd3f69d4-c78b-47fa-8db0-bb6325cc7be5', 'submit_time': 1674763013.0868688, 'submit_time_iso': '2023-01-26T14:56:53.086869-05:00', 'start_time': '2023-01-26 14:56:54.702227', 'duration': '0:00:46.957211', 'status': 'FINISHED:COMPLETED', 'job_deploy_detail': ['server: OK', 'site_a: OK', 'site_b: OK'], 'schedule_count': 1, 'last_schedule_time': 1674763013.6890023, 'schedule_history': ['2023-01-26 14:56:53: scheduled']}\n" - ] - }, - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], + "metadata": { + "tags": [] + }, + "outputs": [], "source": [ "from nvflare.fuel.flare_api.flare_api import Session, basic_cb_with_print\n", "\n", @@ -182,25 +156,26 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "id": "b0d8aa9c", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'time': '2023-01-26 14:57:45.968111', 'data': [{'type': 'table', 'rows': [['CLIENT', 'RESPONSE'], ['site_a', 'Shutdown the client...'], ['site_b', 'Shutdown the client...']]}, {'type': 'success', 'data': ''}], 'meta': {'status': 'ok', 'info': ''}, 'status': }\n", - "{'time': '2023-01-26 14:57:46.882510', 'data': [{'type': 'string', 'data': 'FL app has been shutdown.'}, {'type': 'shutdown', 'data': 'Bye bye'}, {'type': 'success', 'data': ''}], 'meta': {'status': 'ok', 'info': ''}, 'status': }\n" - ] - } - ], + "metadata": { + "tags": [] + }, + "outputs": [], "source": [ "print(sess.api.do_command(\"shutdown client\"))\n", "print(sess.api.do_command(\"shutdown server\"))\n", "\n", "sess.close()" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "331c0ba2-8abe-47a3-a864-18dcb7489a44", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -219,7 +194,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.17" + "version": "3.8.18" }, "vscode": { "interpreter": { diff --git a/examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb b/examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb index fa295c0e3b..e046732b4b 100644 --- a/examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb +++ b/examples/hello-world/step-by-step/cifar10/sag_mlflow/sag_mlflow.ipynb @@ -71,13 +71,15 @@ "cell_type": "code", "execution_count": null, "id": "de430380", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "! nvflare job create -j /tmp/nvflare/jobs/cifar10_sag_pt_mlflow -w sag_pt_mlflow \\\n", "-f meta.conf min_clients=2 \\\n", "-f config_fed_client.conf app_script=train_with_mlflow.py app_config=\"--batch_size 6 --dataset_path /tmp/nvflare/data/cifar10 --num_workers 2\" \\\n", - "-f config_fed_server.conf num_rounds=5 experiment_name=\"nvflare-sag-pt-experiment\" run_name=\"nvflare-sag-pt-with-mlflow\" tracking_uri=\\\"\\\" \\\n", + "-f config_fed_server.conf num_rounds=5 experiment_name=\"nvflare-sag-pt-experiment\" run_name=\"nvflare-sag-pt-with-mlflow\" tracking_uri=\\\"file:///{WORKSPACE}/{JOB_ID}/mlruns\\\" \\\n", "-sd ../code/fl \\\n", "-force" ] @@ -130,7 +132,9 @@ "cell_type": "code", "execution_count": null, "id": "17323f61", - "metadata": {}, + "metadata": { + "tags": [] + }, "outputs": [], "source": [ "! python ../data/download.py --dataset_path /tmp/nvflare/data/cifar10" @@ -171,9 +175,9 @@ "\n", "If the tracking_uri is specified, you can directly go to the tracking_uri to view the results\n", "\n", - "If the tracking_uri is not specified, the results will be saved in `/tmp/nvflare/cifar10_sag_pt_mlflow/mlruns/`\n", + "If the tracking_uri is not specified, the results will be saved in `/tmp/nvflare/cifar10_sag_pt_mlflow/server/simulate_job/mlruns/`\n", "\n", - "You can then run the mlflow command: `mlflow ui --port 5000` inside the directory `/tmp/nvflare/cifar10_sag_pt_mlflow/`\n", + "You can then run the mlflow command: `mlflow ui --port 5000` inside the directory `/tmp/nvflare/cifar10_sag_pt_mlflow/server/simulate_job`\n", "\n", "Then you should be seeing similar thing as the following screenshot:\n", "\n", @@ -182,6 +186,26 @@ "\n" ] }, + { + "cell_type": "code", + "execution_count": null, + "id": "a86a4ab4-00d0-4907-b770-71969ffb15ac", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "!mlflow ui --port 5000 --backend-store-uri /tmp/nvflare/cifar10_sag_pt_mlflow/server/simulate_job/mlruns\n" + ] + }, + { + "cell_type": "markdown", + "id": "0211edd5-35e3-4af5-bc81-bd906325a4c4", + "metadata": {}, + "source": [ + "Make sure you \"stop\" the above Cell when you done with review the MLFlow results. " + ] + }, { "cell_type": "markdown", "id": "58037d1e", @@ -207,7 +231,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.2" + "version": "3.8.18" } }, "nbformat": 4,