Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ORCA-229] Implement monitor_workflow() for Tower #22

Merged
merged 1 commit into from
May 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
172 changes: 91 additions & 81 deletions Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,7 @@ testing =
pytest-cov~=4.0
pytest-mock~=3.0
pytest-dotenv~=0.5.2
pytest-asyncio~=0.21.0

# Dependencies for development (used by Pipenv)
dev =
Expand Down Expand Up @@ -120,7 +121,6 @@ apache_airflow_provider =
# Comment those flags to avoid this pytest issue.
addopts =
--cov "orca" --cov-report "term-missing" --cov-report "xml"
-m "not slow and not integration and not acceptance and not cost"
--verbose
norecursedirs =
dist
Expand Down
4 changes: 4 additions & 0 deletions src/orca/services/nextflowtower/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,10 @@ class Workflow(BaseTowerModel):
"state": "status",
}

def __repr__(self) -> str:
"""String representation of a workflow."""
return f"Workflow(run_name={self.run_name}, id={self.id}, state={self.state})"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on adding a __str__ representation too that is just the id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataclasses already have default implementations for __repr()__ and __str__(). I'm just overriding this one because the full output is too long for logging, which I want both the human-friendly run name and computer-friendly ID to appear. I'm aware of the field(..., repr=False) parameter, but I don't want to set that for all but a handful of attributes.

@property
def status(self) -> WorkflowStatus:
"""Workflow run status."""
Expand Down
35 changes: 23 additions & 12 deletions src/orca/services/nextflowtower/ops.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import asyncio
import logging
from dataclasses import field
from functools import cached_property
Expand Down Expand Up @@ -180,18 +181,6 @@ def get_workflow(self, workflow_id: str) -> Workflow:
"""
return self.client.get_workflow(workflow_id, self.workspace_id)

def get_workflow_status(self, workflow_id: str) -> WorkflowStatus:
"""Retrieve status of a workflow run.

Args:
workflow_id: Workflow run ID.

Returns:
Workflow status and whether the workflow is done.
"""
workflow = self.get_workflow(workflow_id)
return workflow.status

def list_workflows(self, search_filter: str = "") -> list[Workflow]:
"""List available workflows that match search filter.

Expand Down Expand Up @@ -261,3 +250,25 @@ def get_latest_previous_workflow(
# Otherwise, return latest based on submission timestamp
sorted_runs = sorted(previous_runs, key=lambda x: x.get("submit"))
return sorted_runs[-1]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work with the airflow sensor? Or do you envision airflow just to call:

workflow = ...get_workflow()
Status = workflow.status

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would look like this (original version):

@task.sensor(poke_interval=10, timeout=604800, mode="poke")
def monitor_workflow(params, workflow_id):
    hook = NextflowTowerHook(context["params"]["conn_id"])
    workflow = hook.ops.get_workflow(workflow_id)
    return PokeReturnValue(workflow.is_done, workflow.state)

Or the class-based equivalent of the above.

async def monitor_workflow(
self, run_id: str, wait_time: int = 60 * 5
) -> WorkflowStatus:
"""Wait until the workflow run completes.

Args:
run_id: Workflow run ID.
wait_time: Number of seconds to wait between checks.
Default is 5 minutes.

Returns:
Final workflow status.
"""
workflow = self.get_workflow(run_id)
while not workflow.status.is_done:
logger.info(f"{workflow} is not done yet...")
await asyncio.sleep(wait_time)
workflow = self.get_workflow(run_id)

logger.info(f"{workflow} is now done!")
return workflow.status
Loading