Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge label-refactor branch and refactor the annotations into kubernetes decorator #1

Conversation

saikonen
Copy link

As part of the previous discussions, I mentioned that there was a need to start refactoring how the Kubernetes labels are being handled, moving much of the logic into the decorator instead of spreading logic across argo_workflows.py and kubernetes.py

As the annotations is going to follow the same patterns, this is a good opportunity to rework those at the same time to be handled inside the Kubernetes decorator as much as possible.

This PR:

romain-intel and others added 26 commits June 9, 2023 01:15
Additional options to have explicit parameter mappings as well
as a more flexible way of specifying project and branch
)

This allows the pattern:
```
try:
  import foo
except ImportError:
  pass
```

to work as expected. Currently, it could raise a RuntimeError which would
then propagate back to the user.
…x#1445)

Co-authored-by: Preetam Joshi <preetamj@netflix.com>
The issue would occur if an exception was both listed as a class
and an exception in the mapping file.
* add email characters to sanitized_name in sensor

* support passing fully qualified flowname to trigger_on_finish containing email address
* Add optional ServerSideEncryption setting for S3 uploads

* lint fixes for pre-commit check failure
* fix quoting issue with argo

* cleanup testing for payload stringification

---------

Co-authored-by: Sakari Ikonen <sakari.a.ikonen@gmail.com>
- There was a bug overwrote the env values set in the environment decorator
- This made the env decorator export all metaflow related env values in the cli instead of the ones set via the cli.
…etflix#1379)

* feat(sfn): delete a workflow

* feat(sfn): delete eventbridge rule

* add output on deletion progress, clean up error handling and add production token validation

* refactor token validation instruction echoing

* address review comments

* add todo for future refactoring

---------

Co-authored-by: Steven Hoelscher <shoelscher@artica.com>
* [bugfix]: Optional check for encrpytion in s3op response

* Addressed comments.
* Job completion check based on container status.

This change modifies the way a Metaflow step running as a Kubernetes job
is marked completed. Previously, Metaflow would wait until the job
itself is marked as completed. However, this can take much longer than
the time the container running in the job is marked completed.

There don't seem to be any obvious tunables in K8s for making it faster
to mark a job as completed.

Given that a Metaflow job runs a single container, this change relies
on checking that containers' status.

Testing Done:

- Basic linear flows run to completion.
- Foreach based flows run to completion.
- Flows that have failed steps show up as failed. Logs were shown on
  stderr and the metaflow ui.
  - Tested with a step that explicitly did a sys.exit()
  - Tested with a step that had it's pod explicitly terminated
    (kubectl delete pod)
- Terminated one step in a foreach. Verified that the run
  shows up as failed
- Tested that flows run just fine with Argo workflows

* Incorporate review feedback.

* Implement has_succeeded, has_failed methods.

* rename new properties as private. remove unused property. cleanup some unnecessary checks

---------

Co-authored-by: Sakari Ikonen <sakari.a.ikonen@gmail.com>
* first draft of suspend/unsuspend for argo

* rename run_id arg to name

* move exception for workflow not found

* move argo workflow name validation to cli

* decouple pause method

* add production token validation

* feature: add validation for what flow the run_id belongs to (Netflix#1452)

* add validation for what flow the run_id belongs to

* test using workflow production token for authorization directly.

* rework run_id validation to perform separate checks for project_name and branch_name, accounting for --name. refactor Argo resource name sanitization
* Add production token validation. add more output to delete process

* add separate output for removing the schedule of a workflow, when applicable.

* rename schedule deletion. use staticmethod instead of classmethod. error message cleanup. Document function behavior

* add comments to delete behavior in argo cli. formatting fix

* fix token instructions output

* feature: add terminate for argo workflows (Netflix#1309)

* introduce terminate command for argo-workflows

* rename parameter, add check to terminate that workflow is not already finished. fix error output

* changing error messages for argo workflow termination

* change run_id from option to argument. refine terminate process output

* Check for termination before issuing terminate on a workflow

* move schedule deletion logic and comments into main delete method.

* feat: refactor sensor deletion (Netflix#1343)

* add sensor deletion as part of argo workflow deletion

* rework token instructions output

* add docstring and note about refactoring to validate_token

* use validate_run_id for terminate
@tylerpotts
Copy link
Owner

LGTM

@tylerpotts tylerpotts merged commit 9a193d4 into tylerpotts:annotation_dev Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants