Skip to content

Latest commit

 

History

History
75 lines (60 loc) · 3.8 KB

clearml_agent.md

File metadata and controls

75 lines (60 loc) · 3.8 KB
title
Overview
<iframe style={{position: 'absolute', top: '0', left: '0', bottom: '0', right: '0', width: '100%', height: '100%'}} src="https://www.youtube.com/embed/MX3BrXnaULs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; fullscreen" allowfullscreen> </iframe>

ClearML Agent is a virtual environment and execution manager for DL / ML solutions on GPU machines. It integrates with the ClearML Python Package and ClearML Server to provide a full AI cluster solution.
Its main focus is around:

  • Reproducing experiments, including their complete environments.
  • Scaling workflows on multiple target machines.

ClearML Agent executes an experiment or other workflow by reproducing the state of the code from the original machine to a remote machine.

ClearML Agent flow diagram

The preceding diagram demonstrates a typical flow where an agent executes a task:

  1. Enqueue a task for execution on the queue.
  2. The agent pulls the task from the queue.
  3. The agent launches a docker container in which to run the task's code.
  4. The task's execution environment is set up:
    1. Execute any custom setup script configured.
    2. Install any required system packages.
    3. Clone the code from a git repository.
    4. Apply any uncommitted changes recorded.
    5. Set up the python environment and required packages.
  5. The task's script/code is executed.

:::note Python Version ClearML Agent uses the Python version available in the environment or docker in which it executes the code. It does not install Python, so make sure to use a docker or environment with the version you need. :::

While the agent is running, it continuously reports system metrics to the ClearML Server (these can be monitored in the Orchestration page).

Continue using ClearML Agent once it is running on a target machine. Reproduce experiments and execute automated workflows in one (or both) of the following ways:

  • Programmatically (using Task.enqueue() or Task.execute_remotely())
  • Through the ClearML Web UI (without working directly with code), by cloning experiments and enqueuing them to the queue that a ClearML Agent is servicing.

The agent facilitates overriding task execution detail values through the UI without code modification. Modifying a task clone’s configuration will have the ClearML agent executing it override the original values:

  • Modified package requirements will have the experiment script run with updated packages
  • Modified recorded command line arguments will have the ClearML agent inject the new values in their stead
  • Code-level configuration instrumented with Task.connect() will be overridden by modified hyperparameters

ClearML Agent can be deployed in various setups to suit different workflows and infrastructure needs:

References

For more information, see the following: