-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mlflow implementation of Tracking Interface #768
Conversation
Signed-off-by: Nathan Brake <33383515+njbrake@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd insist on adding the list of JobCreate as workflow params. This will avoid making adaptations to the job and workflow params that would need to be removed later on. We can assume that only the [infer, eval] list will be sent, for the moment.
👍 |
lumigator/backend/backend/tests/integration/api/routes/test_api_workflows.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with a couple of caveats.
- Adding the jobs directory directly in the PYTHONPATH causes a few issues when the job "pseudo package" has a deeper structure, and the backend starts importing job implementation packages
- The worklow will need to expose the list of jobs configs but this can be done in First attempt at a parametrized JobCreate #740
What's changing
Mlflow implementation of tracking interface to store and retrieve job results and workflow results.
This means we no longer use the ExperimentRepository for tracking experiments.
This PR also elevates workflow management code out of the Job service, and converts the _on_job_complete function into a "wait_for_complete" function this way we don't need to trigger nested background tasks: a workflow is run as a single background task, which encapsulates it into a single function of the workflows.py, to more easily accommodate an expanded range of workflow tasks (future work here).
This pull request includes several changes to the backend of the Lumigator project, focusing on enhancing the functionality of the experiment and workflow services, improving dependency management, and updating the Makefile for testing configurations. The most significant changes include the addition of tracking client dependencies, updates to experiment and workflow routes, and improvements to job creation processes.
Enhancements to Experiment and Workflow Services:
lumigator/backend/backend/api/deps.py: Added
tracking_client_managerdependency and updated
get_experiment_serviceand
get_workflow_serviceto include
tracking_client` as a parameter.inf + eval
workflow, which is the only type of workflow supported at the momentUpdates to Experiment and Workflow Routes:
lumigator/backend/backend/api/router.py
: Included the workflows route in the OpenAPI schema.lumigator/backend/backend/api/routes/experiments.py
: Updated the response types forcreate_experiment_id
andget_experiment_new
routes, and added a new route to delete experiments.lumigator/backend/backend/api/routes/workflows.py
: Added new routes for getting workflow logs and deleting workflows, and updated the request type for creating workflows.Improvements to Job Creation Processes:
lumigator/backend/backend/api/routes/jobs.py
: Removedbackground_tasks
parameter from job creation methods and added asynchronous background tasks for handling job completion and dataset updates.Exception Handling Enhancements:
lumigator/backend/backend/main.py
: Added workflow exception mappings to the FastAPI application.lumigator/backend/backend/services/exceptions/experiment_exceptions.py
: UpdatedExperimentNotFoundError
to usestr
instead ofUUID
for resource ID.lumigator/backend/backend/services/exceptions/workflow_exceptions.py
: AddedWorkflowNotFoundError
class for handling workflow-related exceptions.Testing Configuration Updates:
Makefile
: AddedMLFLOW_TRACKING_URI
environment variable totest-backend-unit
andtest-backend-integration
targets.Deployment Updates:
Refs #527
How to test it
This PR does not yet switch the frontend or SDK over to using workflows, so it can either be tested via curl commands or via the test. The test exercises the integration and makes sure that experiments/workflows/jobs can be created and deleted successfully, and that the logs can be accessed
Steps to test the changes:
2 .make test-backend-integration
Additional notes for reviewers
I already...
/docs
)