Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop failing if artifact file exists, but empty #1653

Conversation

Ark-kun
Copy link
Member

@Ark-kun Ark-kun commented Oct 7, 2019

Empty output artifacts are a valid and desired use case. (Think of grep or any other filtering program where one of the outputs can be empty.)
Argo failing on empty outputs was a pretty bad surprise for us.

Fixes #1642
The problem was introduced in https://github.com/argoproj/argo/pulls/1247

@Ark-kun Ark-kun force-pushed the Stop-failing-if-artifact-file-exists,-but-empty branch 3 times, most recently from 7bd7e96 to c4718fa Compare October 7, 2019 06:31
Empty output artifacts are a valid and desired use case. (Think of `grep` or any other filtering program where one of the outputs can be empty.)
Argo failing on empty outputs was a pretty bad surprise for us.

Fixes argoproj#1642
The problem was introduced in https://github.com/argoproj/argo/pulls/1247
@Ark-kun
Copy link
Member Author

Ark-kun commented Oct 8, 2019

@jessesuen Can you please take a look at this small PR?

@sarabala1979 sarabala1979 merged commit 4a654ca into argoproj:master Oct 8, 2019
decarboxy added a commit to CyrusBiotechnology/argo that referenced this pull request Jan 16, 2020
* Validate ArchiveLocation artifacts (argoproj#1167)

* Update README and preview notice in CLA.

* Update README. (argoproj#1173) (argoproj#1176)

* Argo users: Equinor (argoproj#1175)

* Do not mount unnecessary docker socket (argoproj#1178)

* Issue argoproj#1113 - Wait for daemon pods completion to handle annotations (argoproj#1177)

* Issue argoproj#1113 - Wait for daemon pods completion to handle annotations

* Add output artifacts to influxdb-ci example

* Increased S3 artifact retry time and added log (argoproj#1138)

* Issue argoproj#1123 - Fix 'kubectl get' failure if resource namespace is different from workflow namespace (argoproj#1171)

* Refactor Makefile/Dockerfile to remove volume binding in favor of build stages (argoproj#1189)

* Add Docker Hub build hooks

* Add documentation how to use parameter-file's (argoproj#1191)

* Issue argoproj#988 - Submit should not print logs to stdout unless output is 'wide' (argoproj#1192)

* Fix missing docker binary in argoexec image. Improve reuse of image layers

* Fischerjulian adds ruby to rest docs (argoproj#1196)

* Adds link to ruby kubernetes library.

* Links to a ruby example on how to start a workflow

* Updated OWNERS (argoproj#1198)

* Update community/README (argoproj#1197)

* Issue argoproj#1128 - Use polling instead of fs notify to get annotation changes (argoproj#1194)

* Minor spelling, formatting, and style updates. (argoproj#1193)

* Dockerfile: argoexec base image correction (fixes argoproj#1209) (argoproj#1213)

* Set executor image pull policy for resource template (argoproj#1174)

* Add schedulerName to workflow and template spec (argoproj#1184)

* Issue argoproj#1190 - Fix incorrect retry node handling (argoproj#1208)

* fix dag retries (argoproj#1221)

* Executor can access the k8s apiserver with a out-of-cluster config file (argoproj#1134)

Executor can access the k8s apiserver with a out-of-cluster config file

* Update README with typo fixes (argoproj#1220)

* Update README.md (argoproj#1236)

* Remove extra quotes around output parameter value (argoproj#1232)

Ensure we do not insert extra single quotes when using
valueFrom: jsonPath to set the value of an output parameter for
resource templates.

Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>

* Update README.md (argoproj#1224)

* Include stderr when retrieving docker logs (argoproj#1225)

* Add Gardener to "Who uses Argo" (argoproj#1228)

* Add feature to continue workflow on failed/error steps/tasks (argoproj#1205)

* Fix the Prometheus address references (argoproj#1237)

* Fixed Issue#1223 Kubernetes Resource action: patch is not supported (argoproj#1245)

* Fixed Issue#1223 Kubernetes Resource action: patch is not supported

This PR is fixed the Issue#1223 reported by @shanesiebken . Argo kubernetes resource workflow failed on patch action. --patch or -p option is required for kubectl patch action.
This PR is including the manifest yaml as patch argument for kubectl. This Fix will support the Patch action in Argo kubernetes resource workflow.

This Fix will support only JSON merge strategic in patch action

* udpated formating

* typo, executo -> executor (argoproj#1243)

* Issue#1165 fake outputs don't notify and task completes successfully (argoproj#1247)

* Issue#1165 fake outputs don't notify and task completes successfully

This PR is addressing the Issue#1165 reported by @alexfrieden.

Issue/Bug: Argo is finishing the task successfully even artifact /file does exist.

Fix: Validate the created gzip contains artifact or file. if file/artifact doesn't exist, Current step/stage/task will be failed with log message .

Sample Log:
'''
INFO[0029] Updating node artifact-passing-lkvj8[0].generate-artifact (artifact-passing-lkvj8-1949982165) status Running -> Error
INFO[0029] Updating node artifact-passing-lkvj8[0].generate-artifact (artifact-passing-lkvj8-1949982165) message: failed to save outputs: File or Artifact does not exist. /tmp/hello_world.txt
INFO[0029] Step group node artifact-passing-lkvj8[0] (artifact-passing-lkvj8-1067333159) deemed failed: child 'artifact-passing-lkvj8-1949982165' failed  namespace=default workflow=artifact-passing-lkvj8
INFO[0029] node artifact-passing-lkvj8[0] (artifact-passing-lkvj8-1067333159) phase Running -> Failed  namespace=default workflow=artifact-passing-lkvj8
'''

* fixed gometalinter errcheck issue

* Git cloning via SSH was not verifying host public key (argoproj#1261)

* Update versions (argoproj#1218)

* Proxy Priority and PriorityClassName to pods (argoproj#1179)

* Error running 1000s of tasks: "etcdserver: request is too large" argoproj#1186 (argoproj#1264)

* Error running 1000s of tasks: "etcdserver: request is too large" argoproj#1186

This PR is addressing the feature request argoproj#1186.
Issue:
Nodestatus element keeps growing  for big workflow.  Workflow will fail once the workflow total size reachs 1 MB (maz size limit in ETCD) .
Solution:
Compressing the Nodestatus once size reachs the 1 MB which increasing 60% to 80% more steps to execute in compress mode.

Latest: Argo cli and Argo UI will able to decode and print nodestatus from compressednoode.

Limitation:
Kubectl willl not decode the compressedNode element

* added Operator.go

* revert the testing yaml

* Fixed the lint issue

* fixed

* fixed lint

* Fixed Testcase

* incorporated the review comments

* Reverted the change

* incorporated review comments

* fixing gometalinter checks

* incorporated review comments

* Update pod-limits.yaml

* updated few comments

* updated error message format

* reverted unwanted files

* Reduce redundancy pod label action (argoproj#1271)

* Add the `mergeStrategy` option to resource patching (argoproj#1269)

* This adds the ability to pass a mergeStrategy to a patch resource.
  this is valuable because the default merge strategy for kubernetes is
  'strategic', which does not work with Custom Resources.
* This also updates the resource example to demonstrate how it is used

* Fix bug with DockerExecutor's CopyFile (argoproj#1275)

The check to see if the source path was in the tgz archive was wrong
when source path was a folder, the arguments to strings.Contains were
inverted.

* Add workflow labels and annotations global vars (argoproj#1280)

* Argo CI is current inactive (argoproj#1285)

* Issue#896 Workflow steps with non-existant output artifact path will succeed (argoproj#1277)

* Issue#896 Workflow steps with non-existant output artifact path will succeed

Issue: argoproj#897
Solution: Added new element "optional" in Artifact. The default is false.  This flag will make artifact as optional and existence check will be ignored if input/output artifact has optional=true.

Output Artifact ( optional=true ):
Artifact existence check will be ignored during the save artifact in destination and continued workflow

Input Artifact ( optional=true ):
Artifact exist check will be ignored during load artifact from source and continued workflow

* added end of line

* removed unwanted whitespace

* Deleted test code

* go formatted

* added formatting directives

* updated Codegen

* Fixed format on merge conflict

* format fix

* updated comments

* improved error case

* Fix for Resource creation where template has same parameter templating (argoproj#1283)

* Fix for Resource creation where template has same parameter templating

This PR will enable to support the custom template  variable reference.
Soulltion: Workflow  variable reference resolve will check the Workflow variable prefix.

* added test

* fixed gofmt issue

* fixed format

* fixed gofmt on common.go

* fixed testcase

* fixed gofmt

* Added unit testcase and documented

* fixed Gofmt format

* updated comments

* Admiralty: add link to blog post, add user (argoproj#1295)

* Add dns config support (argoproj#1301)

* Speed up podReconciliation using parallel goroutine (argoproj#1286)

* Speed up podReconciliation using parallel goroutine

* Fix make lint issue

* put checkandcompress back

* Add community meeting notes link (argoproj#1304)

* Add Karius to users in README.md (argoproj#1305)

* Added support for artifact path references (argoproj#1300)

* Added support for artifact path references
Adds new `{{inputs.artifacts.<NAME>.path}}` and `{{outputs.artifacts.<NAME>.path}}`  placeholders.

* Add support for init containers (argoproj#1183)

* Secrets should be passed to pods using volumes instead of API calls (argoproj#1302)

* Secrets should be passed to pods using downward API instead of API calls

* Fixed Gogfmt format

* fixed file close Gofmt

* updated review comments

* fixed gofmt

* updated review comments

* CheckandEstimate implementation to optimize podReconciliation (argoproj#1308)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* Update operator.go

* Update operator.go

* Add alibaba cloud to officially using argo list (argoproj#1313)

* Refactor checkandEstimate to optimize podReconciliation (argoproj#1311)

* Refactor checkandEstimate to optimize podReconciliation

* Move compress function to persistUpdates

* Fix formatting issues in examples documentation (argoproj#1310)

* Fix nil pointer dereference with secret volumes (argoproj#1314)

* Archive location should conditionally be added to template only when needed

* Fix SIGSEGV in watch/CheckAndDecompress. Consolidate duplicate code (resolves argoproj#1315)

* Implement support for PNS (Process Namespace Sharing) executor (argoproj#1214)

* Implements PNS (Process Namespace Sharing) executor
* Adds limited support for Kubelet/K8s API artifact collection by mirroring volume mounts to wait sidecar
* Adds validation to detect when output artifacts are not supported by the executor
* Adds ability to customize executor from workflow-controller-configmap (e.g. add environment variables, append command line args such as loglevel)
* Fixes an issue where daemon steps were not getting terminated properly

* Reorganize manifests to kustomize 2 and update version to v2.3.0-rc1

* Update v2.3.0 CHANGELOG.md

* Export the methods of `KubernetesClientInterface` (argoproj#1294)

All calls to these methods previously generated a panic at runtime
because the calls resolved to the default, panic-always implementation,
not to the overrides provided by `k8sAPIClient` and `kubeletClient`.

Embedding an exported interface with unexported methods into a struct is
the only way to implement that interface in another package.  When doing
this, the compiler generates default, panic-always implementations for
all methods from the interface.  Implementors can override exported
methods, but it's not possible to override an unexported method from the
interface.  All invocations that go through the interface will come to
the default implementation, even if the struct tries to provide an
override.

* Update README.md (argoproj#1321)

* Issue1316 Pod creation with secret volumemount  (argoproj#1318)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* Fixed the duplicate mountpath issue

* Support parameter substitution in the volumes attribute (argoproj#1238)

* `argo list` was not displaying non-zero priorities correctly

* Fix regression where argoexec wait would not return when podname was too long

* wait will conditionally become privileged if main/sidecar privileged (resolves argoproj#1323)

* Update version to v2.3.0-rc2. Update changelog

* Add documentation on releasing

* Fix missing template local volumes, Handle volumes only used in init containers (argoproj#1342)

* Fix argoproj#1340 parameter substitution bug (argoproj#1345)

Also create podParams map in substitutePodParams

Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>

* add / test (argoproj#1240)

* Fix input artifacts with multiple ssh keys (argoproj#1338)

* Fixed : Validate the secret credentials name and key (argoproj#1358)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* Fixed Issue1355

* fixed style

* Delete e2e_temp.tmp

* Fix: # 1328 argo submit --wait and argo wait quits while workflow is running (argoproj#1347)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* Fixed argo submit --wait and argo wait quits while workflow is running

* fixed Style

* Update version to v2.3.0-rc3

* Update release instructions

* Add --status filter for get command (argoproj#1325)

* Support an easy way to set owner reference (argoproj#1333)

* Use golangci-lint instead of deprecated gometalinter (argoproj#1335)

* [Fix argoproj#1242] Failed DAG nodes are now kept and set to running on RetryWorkflow. (argoproj#1250)

* Fixed :  CLI Does Not Honor metadata.namespace argoproj#1288 (argoproj#1352)

* Validate action for resource templates (argoproj#1346)

* Fix issue where a DAG with exhausted retries would get stuck Running (argoproj#1364)

* Update README.md (argoproj#1372)

Add Adevinta.
https://www.adevinta.com/

* Update docs for the v2.3.0 release and to use the stable tag

* Add Max Kelsen to USERS in README.md (argoproj#1374)

Max Kelsen us utilising Argo throughout the organisation to manage data processing and machine learning pipelines. 

Incredibly thankful to the great community!

* Fixed: Support hostAliases in WorkflowSpec argoproj#1265 (argoproj#1365)

* Fixed : Support hostAliases in WorkflowSpec argoproj#1265

* Fixed:  failed to save outputs: verify serviceaccount default:default has necessary privileges (argoproj#1362)

Fixed:  failed to save outputs: verify serviceaccount default:default has necessary privileges (argoproj#1362)

* Fixed: make verify-codegen is failing on the master branch (argoproj#1399) (argoproj#1400)

* Fixed: withParam parsing of JSON/YAML lists argoproj#1389 (argoproj#1397)

* Make locating kubeconfig in example os independent (argoproj#1393)

* Added Argo Rollouts to README (argoproj#1388)

* Add Mirantis as an official user (argoproj#1401)

* Update README.md (argoproj#1402)

* Update README.md (argoproj#1404)

Includes SAP Fieldglass in users section.

* Fiixed: persistentvolumeclaims already exists argoproj#1130 (argoproj#1363)

* Fixed: persistentvolumeclaims already exists  argoproj#1130

* chore: add IBM to official users section in README.md (argoproj#1409)

* Orders uses alphabetically (argoproj#1411)

* Update OWNERS (argoproj#1429)

* Typo fix in ARTIFACT_REPO.md (argoproj#1425)

In the non-default artifact repo section, when showing the gcs example the bucket name said 'my-aws-bucket-name'. I've updated this to say 'my-gcs-bucket-name'.

Super minor change but I've been banging my head against artifact repo outputs all day and this was bothering me.

* Add OVH as official user (argoproj#1417)

Add OVH as official user

* Update demo.md (argoproj#1396)

Step 2 instructs the user to create the namespace `argo`, and the coin-flip (at least) uses the service account `argo`, so it makes sense to provide `--serviceaccount=argo:argo` so that the initial experience works, "out of the box".

* Fix typo (argoproj#1431)

* PNS executor intermitently failed to capture entire log of script templates (argoproj#1406)

* Terminate all containers within pod after main container completes (argoproj#1423)

Resolves argoproj#1422

* Ability to configure hostPath mount for `/var/run/docker.sock` (argoproj#1419)

* CheckandEstimate implementation

* fixed variable rename

* fixed gofmt

* fixed feedbacks

* implement the configurable Docker sock path

* Update workflowpod.go

* Style updated

* Fixed:  Implemented Template level service account (argoproj#1354)

Fixed:  Implemented Template level service account (argoproj#1354)

* Add paging function for list command (argoproj#1420)

* Add paging function for list command (argoproj#1420)

* Revert "Update demo.md (argoproj#1396)" (argoproj#1433)

This reverts commit 5635c33.

* Update documentation for workflow.outputs.artifacts (argoproj#1439)

* Improve bash completion (argoproj#1437)

* Add threekit to user list (argoproj#1444)

* Fix demo's doc issue of install minio chart (argoproj#1450)

Signed-off-by: Aisuko <urakiny@gmail.com>

* mention sidecar in failure message for sidecar containers (argoproj#1430)

* Centralized Longterm workflow persistence storage  (argoproj#1344)


* Centralized Longterm workflow persistence storage  implementaion

* New Feature:  provide failFast flag, allow a DAG to run all branches of the DAG (either success or failure) (argoproj#1443)

* Fix bug:   dag will missing some nodes when another branch node fails

* Add test file

* New Feature:   provide failFast  flag, allow a DAG to run all branches of the DAG (either success or failure)

* Move failFast flag to DAG template spec

* * Move test case file to test/e2e/expectedfailures since it is expected to fail
* Remove unused check code

* issue-1445: changing temp directory for output artifacts from root to tmp (argoproj#1458)

* Support PodSecurityContext (argoproj#1463)

* Add doc about failFast feature (argoproj#1453)

* Added Codec to the Argo community list (argoproj#1477)

* fix typo: symboloic > symbolic (argoproj#1478)

* Add --no-color flag to logs (argoproj#1479)

* Fix failFast bug:   When a node in the middle fails, the entire workflow will hang (argoproj#1468)

* Document the insecureIgnoreHostKey git flag (argoproj#1483)

* Fix: 1008 `argo wait` and `argo submit --wait` should exit 1 if workflow fails  (argoproj#1467)

Fix: 1008 `argo wait` and `argo submit --wait` should exit 1 if workflow fails  (argoproj#1467)

* Update OWNERS (argoproj#1485)

* Add Commodus Tech as official user (argoproj#1484)

* Fix: Argo CLI should show warning if there is no workflow definition in file argoproj#1486

Fix: Argo CLI should show warning if there is no workflow definition in file argoproj#1486

* Exposed workflow priority as a variable (argoproj#1476)

* Fix argoproj#1366 unpredictable global artifact behavior (argoproj#1461)

* Fix: Support the List within List type in withParam argoproj#1471 (argoproj#1473)

Fix: Support the List within List type in withParam argoproj#1471 (argoproj#1473)

* Fix a compiler error (argoproj#1500)

Fix a compiler error (argoproj#1500)

* Readme update to add argo and airflow comparison (argoproj#1502)

* Added argo vs airflow presentation

* Update README.md

* change 'continue-on-fail' example to better reflect its description (argoproj#1494)

* Implemented Conditionally annotate outputs of script template only when consumed argoproj#1359 (argoproj#1462)

* Fixed argoproj#1359 Implemented Conditionally annotate outputs of script template only when consumed

* Allow output parameters with .value, not only .valueFrom (argoproj#1336)

Fixed argoproj#1329 Allow output parameters with .value, not only .valueFrom (argoproj#1336)

* Fix the lint target (argoproj#1505)

Fix the lint target

This fixes an issue with the `make lint` target, where if a developer
has golangci-lint installed and also has linter errors, the linter fails
with an error, causing the next case to fall through (and the old linter
is run).

This also fixes all of the linter errors that had somehow cropped up in
the repo.

* Fix a compiler error in a unit test (argoproj#1514)

* Allow Makefile variables to be set from the command line (argoproj#1501)

This changes the assignment operator of various Makefile variables from
the recursive expansion operator (=) to the conditional assignment
operator (?=), such that a developer can define their own values for
those variables. This is highly valuable for a dev who wants to do local
development with a local docker image

* Fixed argoproj#1287 Executor kubectl version is obsolete (argoproj#1513)

Fixed argoproj#1287 Executor kubectl version is obsolete (argoproj#1513)

* Fix issue [Documentation] kubectl get service argo-artifacts -o wide (argoproj#1516)

* Allow overriding workflow labels in 'argo submit' (argoproj#1475)

* Support git shallow clones and additional ref fetches (argoproj#1521)

Implemented a `depth` field for git artifact configuration that, when
specified, will result in a shallow clone (and fetch) of the given
number of commits from the branch tip.

Implemented a `fetch` field for git artifact configuration that fetches
the given refspecs prior to checkout. This is necessary when one wants
to retrieve git revisions that exist in non-branch/-tag refs.

The motivation for these features is to support retrieval of patchset
refs from Gerrit code review (`refs/changes/[n]/[change]/[patch]`) but
these new fields should provide more flexibility to anyone integrating
with other git-based systems.

* Add --dry-run option to `argo submit` (argoproj#1506)

* Fix validation (argoproj#1508)

* Implemented support for WorkflowSpec.ArtifactRepositoryRef (argoproj#1350)

This change allows the workflow to specify the reference the configMap holding the artifact repository configuration.

* Fix argo logs empty content when workflow run in virtual kubelet env (argoproj#1201)

* Expose all input parameters to template as JSON (argoproj#1488)

* WorkflowTemplate CRD (argoproj#1312)

* Added Architecture doc (argoproj#1515)

Fixed argoproj#894 Added Architecture doc (argoproj#1515)

* Format sources and order imports with the help of goimports (argoproj#1504)

* Update ISSUE_TEMPLATE.md (argoproj#1528)

edit to follow to current README.md installation guides.

* Introduce podGC strategy for deleting completed/successful pods (argoproj#1234)

* Update CHANGELOG for v2.4 (argoproj#1531)

* Update README.md (argoproj#1533)

* Use cache to retrieve WorkflowTemplates (argoproj#1534)

* Update argo dependencies to kubernetes v1.14 (argoproj#1530)

* Update argo dependencies to kubernetes v1.14

* Update version to v2.4.0-rc1

* Update main.go (argoproj#1536)

* Update main.go (argoproj#1536)

* Remove GLog config from argo executor (argoproj#1537)

* Remove GLog config from argo executor (argoproj#1537)

* Initialize the wfClientset before using it (argoproj#1548)

* docs(readme): fix workflow types link (argoproj#1560)

* Optimize argo binary install documentation (argoproj#1563)

* Document workflow controller dockerSockPath config (argoproj#1555)

* Add coverage make target (argoproj#1557)

* Fix issue saving outputs which overlap paths with inputs (argoproj#1567)

* Support AutomountServiceAccountToken and executor specific service account(argoproj#1480)

* added DataStax as an organization that uses Argo (argoproj#1576)

* Fix inputs and arguments during template resolution (argoproj#1545)

* Add entrypoint label to workflow default labels (argoproj#1550)

* remove redundant codes (argoproj#1582)

Signed-off-by: xiechengsheng <xie1995@whut.edu.cn>

* Fix workflow template in namespaced controller (argoproj#1580)

* Add workflow template permissions to namespaced deployment manifests

* Use filtered shared informer factory for namespaced deployment

* Regard resource templates as leaf nodes (argoproj#1593)

This enables retryStrategy to be respected on resource templates.
This closes argoproj#1370

* Update from github.com/ghodss/yaml to sigs.k8s.io/yaml (argoproj#1572)

* Update Gopkg.toml and Gopkg.lock (argoproj#1596)

* Issue1571  Support ability to assume IAM roles in S3 Artifacts  (argoproj#1587)

* Fixed: Ability to interface with S3 using assumed roles (session tokens)
This PR fixes argoproj#1571

* Added retry around RuntimeExecutor.Wait call when waiting for main container completion (argoproj#1597)

* Do not relocate the mounted docker.sock (argoproj#1607)

The mount path of the docker.sock should not depend on the host path of the docker.sock

* Fix DAG enable failFast will hang in some case (argoproj#1595)

* Fix failFast will hang in some case

* Increased Lint timeout (argoproj#1612)

* Add merge keys to Workflow objects to allow for StrategicMergePatches (argoproj#1611)

* Small code cleanup and add tests (argoproj#1562)

* Added WorkflowStatus and NodeStatus types to the Open API Spec. (argoproj#1614)

* Prevent controller from crashing due to glog writing to /tmp (argoproj#1613)

* Updated the API Rule Violations list (argoproj#1618)

* updated invite link (argoproj#1621)

* Increase timeout of golangci-lint (argoproj#1623)

* Store resolved templates (argoproj#1552)

* Store resolved templates in node status

* Update operator.go (argoproj#1630)

* Update operator.go

* update API

* Fix retry workflow state (argoproj#1632)

* Save stored template ID in nodes (argoproj#1631)

* Grant get secret role to controller to support persistence (argoproj#1615)

* Regenerate installation manifests (argoproj#1638)

* Update CHANGELOG for v2.4.0 (argoproj#1636)

* Update version to v2.4.0

* Add back SetGlogLevel calls

* Fix regression where parallelism could cause workflow to fail (argoproj#1639)

* Fix regression where global outputs were unresolveable in DAGs (argoproj#1640)

* Fix global lint issue (argoproj#1641)

* pin colinmarc/hdfs to the next commit, which no longer has vendored deps (argoproj#1622)

* Delay killing sidecars until artifacts are saved (argoproj#1645)

* fixed example wrong comment (argoproj#1643)

* Fix missing merged changes in validate.go (argoproj#1647)

* Fix DAG output aggregation (argoproj#1648)

* Fix dag output aggregation correctly (argoproj#1649)

* Use stored templates to raggregate step outputs (argoproj#1651)

* Fix child node template handling (argoproj#1654)

* Stop failing if artifact file exists, but empty (argoproj#1653)

* Resolve WorkflowTemplate lazily (argoproj#1655)

* Don't provision VM for empty artifacts (argoproj#1660)

* Update version to v2.4.1

* Fix typo (argoproj#1679)

* Handle sidecar killing properly (argoproj#1675)

* Update README.md  Argo Ansible role: Provisioning Argo Workflows on Kubernetes/OpenShift (argoproj#1673)

* Handle retried node properly (argoproj#1669)

* Store locally referenced template properly (argoproj#1670)

* Update version to v2.4.2

* Fix issue that workflow.priority substitution didn't pass validation (argoproj#1690)

* Added status of previous steps as variables (argoproj#1681)

* Print multiple workflows in one command (argoproj#1650)

* Fix retry node processing (argoproj#1694)

* Apply Strategic merge patch against the pod spec (argoproj#1687)

* fixed broke metrics endpoint per argoproj#1634 (argoproj#1695)

* Fixed incorrect `pod.name` in retry pods (argoproj#1699)

* Added ability to auto-resume from suspended state (argoproj#1715)

* Filter workflows in list  based on name prefix (argoproj#1721)

* Support no-headers flag (argoproj#1760)

* Refactoring Template Resolution Logic (argoproj#1744)

* Fix retry node name issue on error (argoproj#1732)

* Do not resolve remote templates in lint (argoproj#1787)

* Handle operation level errors PVC in Retry (argoproj#1762)

* Added hint when using certain tokens in when expressions (argoproj#1810)

* Added hint when using certain tokens in when expressions

* Minor

* SSL enabled database connection for workflow repository (argoproj#1712) (argoproj#1756)

* Error occurred on pod watch should result in an error on the wait container (argoproj#1776)

* Update version to v2.4.3

* Update version to v2.4.3

* rename

* fixing jenkins, committing extra changes

* jenkins

Co-authored-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>
Co-authored-by: Ed Lee <edlee2121@users.noreply.github.com>
Co-authored-by: Erik Parmann <eparmann@gmail.com>
Co-authored-by: Alexander Matyushentsev <AMatyushentsev@gmail.com>
Co-authored-by: kshamajain99 <kshamajain99@gmail.com>
Co-authored-by: Jesse Suen <jessesuen@users.noreply.github.com>
Co-authored-by: Marcin Karkocha <marcin.karkocha@outlook.com>
Co-authored-by: Julian Fischer <ich@julianfischer.name>
Co-authored-by: Anna Winkler <3526523+annawinkler@users.noreply.github.com>
Co-authored-by: Ilias Katsakioris <elikatsis@arrikto.com>
Co-authored-by: jdfalko <43558452+jdfalko@users.noreply.github.com>
Co-authored-by: Greg Roodt <groodt@gmail.com>
Co-authored-by: Naoto Migita <migggy@users.noreply.github.com>
Co-authored-by: shahin <shahin@users.noreply.github.com>
Co-authored-by: Tim Schrodi <tschrodi96@googlemail.com>
Co-authored-by: Matthew Coleman <matthew.e.coleman@gmail.com>
Co-authored-by: Saravanan Balasubramanian <33908564+sarabala1979@users.noreply.github.com>
Co-authored-by: Nick Stott <nick@nickstott.com>
Co-authored-by: Ismail Alidzhikov <i.alidjikov@gmail.com>
Co-authored-by: Xianlu Bird <xianlubird@gmail.com>
Co-authored-by: Ian Howell <ian.howell0@gmail.com>
Co-authored-by: Fred Dubois <169247+duboisf@users.noreply.github.com>
Co-authored-by: Johannes 'fish' Ziemke <github@freigeist.org>
Co-authored-by: Adrien Trouillaud <adrienjt@users.noreply.github.com>
Co-authored-by: xubofei1983 <39540637+xubofei1983@users.noreply.github.com>
Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>
Co-authored-by: Clemens Lange <clemens.lange@cern.ch>
Co-authored-by: Chris Chambers <chris-chambers@users.noreply.github.com>
Co-authored-by: Hideto Inamura <h.inamura0710@gmail.com>
Co-authored-by: almariah <abdullahalmariah@gmail.com>
Co-authored-by: Cristian Pop <cristian.pop3009@gmail.com>
Co-authored-by: Jaime <yauma21@gmail.com>
Co-authored-by: Jacob O'Farrell <jacob@maxkelsen.com>
Co-authored-by: Ben Wells <b.v.wells@gmail.com>
Co-authored-by: Paul Brit <paulbrit44@gmail.com>
Co-authored-by: Brandon Steinman <brandon.steinman@sap.com>
Co-authored-by: alex weidner <shimmerjs@us.ibm.com>
Co-authored-by: Alex Collins <alexec@users.noreply.github.com>
Co-authored-by: ianCambrio <50969109+ianCambrio@users.noreply.github.com>
Co-authored-by: Jean-Louis Queguiner <jean-louis.queguiner@gadz.org>
Co-authored-by: Stephen Steiner <ssteiner@juniper.net>
Co-authored-by: Jonathon Belotti <jonathon@canva.com>
Co-authored-by: Semjon Kopp <semjon.kopp@sap.com>
Co-authored-by: Orion Delwaterman <delwaterman@gmail.com>
Co-authored-by: Edwin Jacques <31151721+edwinpjacques@users.noreply.github.com>
Co-authored-by: Ziyang Wang <wangziyang507@gmail.com>
Co-authored-by: Aisuko <urakiny@gmail.com>
Co-authored-by: tralexa <39952205+tralexa@users.noreply.github.com>
Co-authored-by: Alex Capras <alexcapras@gmail.com>
Co-authored-by: mark9white <mark@markwhite.com>
Co-authored-by: Mostapha Sadeghipour Roudsari <sadeghipour@gmail.com>
Co-authored-by: commodus-sebastien <37178563+commodus-sebastien@users.noreply.github.com>
Co-authored-by: Mukulikak <mukulikak@gmail.com>
Co-authored-by: Daniel Duvall <dan@mutual.io>
Co-authored-by: Anes Benmerzoug <Anes.Benmerzoug@gmail.com>
Co-authored-by: Christian Muehlhaeuser <muesli@gmail.com>
Co-authored-by: hidekuro <hidekuro@users.noreply.github.com>
Co-authored-by: jacky <jacky.wucheng@gmail.com>
Co-authored-by: Brian Mericle <bpmericle@users.noreply.github.com>
Co-authored-by: Takayuki Kasai <unblee@users.noreply.github.com>
Co-authored-by: Xie.CS <xie1995@whut.edu.cn>
Co-authored-by: John Wass <jwass3@gmail.com>
Co-authored-by: Premkumar Masilamani <smileprem@users.noreply.github.com>
Co-authored-by: Pablo Osinaga <paguos@gmail.com>
Co-authored-by: David Seapy <ddseapy@ccri.com>
Co-authored-by: Anastasia Satonina <56155326+darthnastya@users.noreply.github.com>
Co-authored-by: Simon Behar <simbeh7@gmail.com>
Co-authored-by: Tobias Bradtke <webwurst@gmail.com>
Co-authored-by: Marek Čermák <prace.mcermak@gmail.com>
Co-authored-by: Rick Avendaño <Avendano.Richard@gmail.com>
Co-authored-by: sang <sanooj.m@gmail.com>
Co-authored-by: Antoine Dao <antoinedao1@gmail.com>
Co-authored-by: gerdos82 <37865635+gerdos82@users.noreply.github.com>
Ark-kun added a commit to kubeflow/pipelines that referenced this pull request Apr 16, 2020
Working around an Argo bug.
Revert this when we upgrade to Argo version which has the fix: argoproj/argo-workflows#1653
k8s-ci-robot pushed a commit to kubeflow/pipelines that referenced this pull request Apr 17, 2020
Working around an Argo bug.
Revert this when we upgrade to Argo version which has the fix: argoproj/argo-workflows#1653
kumare3 pushed a commit to EngHabu/pipelines that referenced this pull request May 27, 2020
* [UI] Show step pod yaml and events in RunDetails page (#3304)

* [UI Server] Pod info handler

* [UI] Pod info tab in run details page

* Change pod info preview to use yaml editor

* Fix namespace

* Adds error handling for PodInfo

* Adjust to warning message

* [UI] Pod events in RunDetails page

* Adjust error message

* Refactor k8s helper to get rid of in cluster limit

* Tests for pod info handler

* Tests for pod event list handler

* Move pod yaml viewer related components to separate file.

* Unit tests for PodYaml component

* Fix react unit tests

* Fix error message

* Address CR comments

* Add permission to ui role

* [Backend]Cache - Cache logic with db interaction (#3266)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* Add initial server logic

* Add const

* Change folder name

* Change execution key name

* Fix unit test

* Add Dockerfile and OWNERS file

This commit adds Dockerfile for building source code and OWNERS file for
easy review. This commit also renames some functions.

* fix go.sum

This PR fixes changes on go.sum

* Add local deployment scripts

This commit adds local deployment scripts which can deploy cache service
to an existing cluster with KFP installed.

* refactor src code

* Add standalone deployment scripts and yamls

This commit adds execution cache deployment scripts and yaml files in
KFP standalone deployment. Including a deployer which will generate the
certification and mutatingwebhookconfiguration and execution cache
deployment.

* Minor fix

* Add execution cache image build in test folder

* fix test cloudbuild

* Fix cloudbuild

* Add execution cache deployer image to test folder

* Add copyright

* Fix deployer build

* Add license for execution cache and cloudbuild for execution cache
images

This commit adds licenses for execution cache source code. Also adds
cloud build step for building cache image and cache deployer image.
Change the manifest name based on changed image.

* Refactor license intermediate data

* Fix execution cache image manifest

* Typo fix for cache and cache deployer images

* Add arguments in ca generation scripts and change deployer base image to google/cloud

* minor fix

* fix arg

* Mirror source code with MPL in execution_cache image

* Minor fix

* minor refactor on error handling

* Refactor cache source code, Docker image and manifest

* Fix variable names

* Add images in .release.cloudbuild.yaml

* Change execution_cache to generic name

* revice readme

* Move deployer job out of upgrade script

* fix tests

* fix tests

* Seperate cache service and cache deployer job

* mysql set up

* wip

* WIP

* WIP

* work mysql connection

* initial cache logic

* watcher

* WIP pod watching with mysql

* worked crud

* Add sql unit test

* fix manifest

* Add copyright

* Add watcher check and update cache key generation logic

* test replace container images

* work cache service

* Add configmap for cache service

* refactor

* fix manifest

* Add unit tests

* Remove delete table

* Fix sql dialect

* Add cached step log

* Add metadata execution id

* minor fix

* revert go.mod and go.sum

* revert go.sum and go.mod

* revert go.sum and go.mod

* revert go.mod and go.sum

* SDK - Added support for maxCacheStaleness (#3318)

* SDK - Added support for maxCacheStaleness

* Added the vendor prefix to the annotation

* Update Watson ML example to take output param path (#3316)

* update watson components with output path args to support tekton

* fix store bug and stop batch logs

* update pipeline with explicit helper function

* add missing commit

* SDK - Moved python op pipeline compilation test to bridge tests (#3323)

* SDK - Moved the @python_component decorator test to dsl tests (#3324)

* SDK - Moved the @python_component decorator test to dsl tests

* Deprecate @python_component

* Release be497983cda7a1d17f3883c67e39a969cf0868a9 (#3327)

* Updated component images to version be497983cda7a1d17f3883c67e39a969cf0868a9

* Updated components to version 2df775a28045bda15372d6dd4644f71dcfe41bfe

* update setup.py

* Style - Moved imports to the start of the file (#3325)

* SDK - Support kubernetes client v11 (#3319)

Fixes https://github.com/kubeflow/pipelines/issues/3275

* Bump version to 0.3.0 (#3329)

* Bump version to 0.3.0

* Fix formatting

* More formatting fixes

* More formatting fixes

* update requirements.txt

* update version

* Reduce steps for release cloud build yaml (#3331)

* Reduce steps for release cloud build yaml

* Update .release.cloudbuild.yaml

* Disables cache and cache-deployer temporarily because they block upgrade tests (#3333)

* Add namespace to experiment SDK calls (#3272)

* Post-submit test for Hosted/MKP (mpdev verify) (#3193)

* try generate MKP binary for each submit

* try run

* fix format

* fix format

* fix format

* it works, gcloud builds submit --config test/cloudbuild/mkp_verify.yaml --project ml-pipeline-test

* test commit trigger

* backup codes

* test

* fix

* pass manual test before submit

* 0.3.0

Co-authored-by: Renmin Gu <renming@google.com>

* Update CHANGELOG for 0.3.0 (#3349)

* kfp UI node server support preview and handles gzip, tarball, and raw artifacts in a consistent manner. (#2992)

* Fix README formatting. (#3348)

* Fix README formatting.

* more fixes

* [UI Server] Blocks non public KFP report APIs (#3334)

* [UI Server] Blocks reportMetrics KFP api

* Also reject report workflow endpoint

* Also block report swf endpoint

* Add hostNetwork for marketplace proxy-agent manifest (#3330)

* SDK - Tests - Improved tests for serializing lists containing objects (#3326)

Added test_fail_on_handling_list_arguments_containing_python_objects
Added test_handling_list_arguments_containing_serializable_python_objects
Moved test_handling_list_arguments_containing_pipelineparam to component_bridge_tests

* [UI] Stops experiment list from leaking previous error message (#3350)

* [UI] Stops experiment list from leaking previous error message

* Move the fix to Page component so it's more generic

* Update ExperimentList.test.tsx

* [UI] Add namespace filter for All and Archived Runs page (#3351)

* [UI] Stops experiment list from leaking previous error message

* Move the fix to Page component so it's more generic

* [UI] Add namespace to AllRunsList api request

* [UI] Add namespace to archived run page

* Fix snapshot

* Fix tensorboard image parsing (#3356)

I introduced a bug when parsing the image for Tensorboard in
https://github.com/kubeflow/pipelines/pull/3235. This fixes it.

* Integration test fix (#3357)

* try generate MKP binary for each submit

* try run

* fix format

* fix format

* fix format

* it works, gcloud builds submit --config test/cloudbuild/mkp_verify.yaml --project ml-pipeline-test

* test commit trigger

* backup codes

* test

* fix

* pass manual test before submit

* 0.3.0

* quick fix for test path

Co-authored-by: Renmin Gu <renming@google.com>

* [Manifest] Cache - Fix upgrade manifest (#3338)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* Change cache deployer job to stateful set

* Delete cache deployer job

* Delete cache deployer job after it completes

* minor fix

* fix indention

* Change cache deployer job to statefulset

* Remove extra cluster role for cache deployer

* remove cache in base kustomize file for upgrade test

* minor fix

* Add authorization check on ListExperiments (#3341)

* apiserver: Handle BucketExists() error (#3132)

* [UI] Tensorboard support for multi user (#3355)

* [UI Server] Add namespace argument for tensorboard endpoints

* Allow local node server to talk to minio in cluster

* Use tensorboard namespace in UI

* Add unit tests for tensorboard UI server

* Fix tests

* Fix tensorboard proxy url

* Fix tensorboard proxy failure

* Fix tests

* Remove unecessary encodeURIComponent

* Add old comment back

* [Test] Add argo retry in sample/integration tests to reduce flakiness. (#3365)

* add retry

* test

* revert test only change

* add retry to e2e tests

* try to parameterize retry limit

* Revert "try to parameterize retry limit"

This reverts commit 46451e3a

* update the retry limit to 2

* update e2e retry

* Manifests: Rename metadata gRPC server's resources to metadata-grpc-* (#3108)

* Manifests: Rename metadata gRPC server's resources to metadata-grpc-*

The metadata service deployed is a gRPC server.

Proper KF installation deploys both an HTTP server, naming the required
resources as 'metadata-deployment' and 'metadata-service', as well as a
gRPC server, naming the corresponding resources
'metadata-grpc-deployment' and 'metadata-grpc-service'.

KFP standalone installation manifests deploy solely the gRPC server, but
use naming identical to the KF's HTTP server one.
Applying them on top of an existing KF cluster breaks Metadata service.

In this PR we change the naming making it not diverge from a proper KF
installation. We also make MetadataWriter aware of that change.

Closes #2889.

Signed-off-by: Ilias Katsakioris <elikatsis@arrikto.com>

* Fix ConfigMaps' label

* metadata-configmap
* metadata-mysql-configmap

* README: Link to KF installation & reference KFP version

* [Sample] CI Sample: Kaggle (#3021)

* kaggle sample

* code path

* fix typo

* visualize table component

* visualize html

* train model step

* submit result

* real image

* fix typo

* push before use

* sed to replace image in component.yaml

* general instructions

* typos; more robust; better code style

* notice about gcp sa and workload identity choice

* [Backend][Multi-user] Adjust/implement run api for multiuser support (#3337)

* Adjust/implement run api for multiuser support

* Fix error message

* use consistent run name in test

* add unit test

* ListRuns must specify filter either by namespace or by experiment

* fix comments

* SDK - Added pinned dependency snapshot (#3303)

* SDK - Added pinned dependency snapshot

* Downgraded zipp

The zipp package has dropped support for python3.5. https://zipp.readthedocs.io/en/latest/history.html#v2-0-0
https://github.com/jaraco/zipp/issues/28

* Fixing sample building in the backend Dockerfile

Installing SDK using pip.
Using SDK's requirements.txt.

* Enabled kubernetes v11

* Reverted the backend/Dockerfile for now

* Fixed the version of kfp-server-api

* pass token outside of SDK for server-to-server case (#3363)

* pass token outside of SDK for server-to-server case

* add more docs

* fix merge issue

* fix merge issue

Co-authored-by: Renmin Gu <renming@google.com>

* Fix lstrip + regex bug in the KFP client (#3396)

* [Backend] Cache - Add cache_enabled label for cache filtering (#3352)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* Add cache enabled annotation to pod annotaion for cache filtering

* fix go.sum

* Add cache disable annotation value for future use

* Rename annotation key to cache qualified

* revert cache_qualified to cache_enabled

* Fix code comment

* Change cache_enabled annotation to label

* Add value check

* Read cache_enabled flag from config

* Add comments on set template labels

* Testing - Upgraded GKE master version to fix tests (#3404)

* [Backend]Cache - KFP pod filter logic looking for cache_enabled = true label selector (#3368)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* fix go.sum

* Change kfp annotation for pod filtering

* update filter logic

* Remove unused const

* [Manifest]Cache - mkp deployment (#3343)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* Add cache manifests for mkp deployment

* revert go.sum

* Add helm on delete policy for cache deployer job

* Change cache deployer job to statefulset

* remove unnecessary cluster role

* seperate clusterrole and role

* add role and rolebinding to mkp

* change secret role to clusterrole

* Add cloudsql support to cache

* Fix presubmit failure by avoiding license downloading when building image (#3406)

* Commit licenses of visualization server dependencies into repo to avoid flakiness from download during image building

* Fix script

* Add licenses

* Remove avro

* Remove two packages

* remove two licenses

* update image (#3395)

* quick fix envoy (#3413)

Co-authored-by: Renmin Gu <renming@google.com>

* [Manifest]Fix - Cache mkp deployment (#3414)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* Add cache manifests for mkp deployment

* revert go.sum

* Add helm on delete policy for cache deployer job

* Change cache deployer job to statefulset

* remove unnecessary cluster role

* seperate clusterrole and role

* add role and rolebinding to mkp

* change secret role to clusterrole

* Add cloudsql support to cache

* fix comma

* [UI] No longer pass namespace to createRun api (#3403)

* revert kfp-cache from Hosted/MKP (#3416)

Co-authored-by: Renmin Gu <renming@google.com>

* enable native Keras + TFMA (#3409)

Co-authored-by: Renmin Gu <renming@google.com>

* [Manifest] Cache - Enable cache and cache deployer in base kustomization file (#3376)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* Change cache deployer job to stateful set

* Delete cache deployer job

* Delete cache deployer job after it completes

* minor fix

* fix indention

* Change cache deployer job to statefulset

* Remove extra cluster role for cache deployer

* remove cache in base kustomize file for upgrade test

* minor fix

* Enable cache and cache-deployer in base kustomization file

* fix

* fix

* test

* test

* test

* Refactor cluster scope resources

* refactor

* Add namespace for sa

* Fix

* Add crds folder to cluster kustomization yaml

* namespace change

* fix

* fix

* fix

* update test

* Rename cluster to cluster-scoped-resource

* test adding namespace in kustomization file

* revert namespace for clusterrolebinding

* fix

* Add db_name in cache_deployment manifest

* rename

* change secret cluster role to role

* [Backend][Multi-user] support multi-user mode for job APIs (#3384)

* Backend multi-user support for job

* Fix UT

* Clean up unused code.

* cleanup, merge duplicate code

* Skip host name preprocess for the IAP case (#3427)

* [Backend]Cache - Fix flag parse (#3429)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* fix go.sum

* Add flag.Parse() to read in flags

* Add new instructions to ensure compatibility for managed ai platform … (#3400)

* Add new instructions to ensure compatibility for managed ai platform pipeline

* change description to AI Platform Pipelies

* add instruction and clarification for AI Platform Pipeline in the first setup notebook

Co-authored-by: luoshixin <luoshixin@google.com>

* [Fix]Cache - Revert objectSelector in mutatingwebhookconfiguration (#3433)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* fix go.sum

* Change kfp annotation for pod filtering

* update filter logic

* Remove unused const

* revert objectSelector in mutatingwebhookconfig

* remove objectSelector

* remove recursive pipeline in e2e test to prevent infinite loop with cache

* updated version (#3421)

* enable CloudSQL+GCSObjStore without default credential (#3378)

* enable CloudSQL+GCSObjStore without default credential

* refresh document

* fix schema

* minio project ID is required

* fix several

* self throtting Github requests to let build be stable

* can work now

* upsize and lowercase for bucket name

Co-authored-by: Renmin Gu <renming@google.com>

* [SDK][Multi-user] refine sdk for multi-user support (#3417)

* Allow writing/reading user namespace to/from local context file

* update docstring

* Move LOCAL_KFP_CONTEXT into Client class

* Fix docstring

Signed-off-by: Chen Sun <chensun@users.noreply.github.com>

* Make context_setting an instance variable and load from file during init.

* [Backend]Cache - Max cache staleness support (#3411)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* fix go.sum

* Initial max_cache_staleness

* Add max_cache_staleness=-1 support

* Unit tests

* fix test key

* Revise getCacheEntry logic

* minor fix

* [SDK/CLI] Add version param to run_pipeline (#3339)

* [SDK/CLI] Add version param to run_pipeline

* Set PIPELINE_VERSION relationship to CREATOR

Also adds a note about pipeline_id taking precedence over version_id

* SDK  - Components - Fixed bug in loading input-less graph components (#3446)

* [Manifest] Cache - MKP deployment (#3430)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* Add cache manifests for mkp deployment

* revert go.sum

* Add helm on delete policy for cache deployer job

* Change cache deployer job to statefulset

* remove unnecessary cluster role

* seperate clusterrole and role

* add role and rolebinding to mkp

* change secret role to clusterrole

* Add cloudsql support to cache

* fix comma

* Change cache secret clusterrole to role

* Adjust sequences of resources

* Update values and schema

* remove extra tab

* Change statefulset to job

* Add pod delete permission to cache deployer role

* Test changing cache deployer job to deployment

* remove extra permission

* remove statefulset check

* Change cache-deployer to strategy recreate (#3456)

* AWS sagemaker : Added license files and updated Dockerfile to use AmazonLinux (#3397)

* Added new LICENSE file

* added 2 more license files

* copy license files into the docker image

* pinned pip packages and rearranged the dockerfile

* [Backend] Keep workflow service account when not default or empty (#3435)

* [Backend] Keep workflow service account when not default or empty

* Fix unit tests

* Rename const to be consistent in style

* Refactor the legacy way of using pipeline id to create run in KFP backend (#3437)

* For legacy interface, we switch to the new presentation underhood

* when create run, if user specify a pipeline, we subsitute it with the pipeline's default version

* Add a case where a version and a pipeline are both specified

* comment; get ready pipeline

* comments

* fix upgrade integration test

* comments of todo; expected run/job now has resource references

* fix upgrade test expected value according to the new response

* fix a typo

* a quick hack for upgrade test

* surface err from conversion

* AWS Sagemaker : Updated documents  (#3440)

* Initial readme for Train component

* example input

* add train pipeline

* added simple_train_pipeline

* Updated readme to include kmeans-hpo-pipeline.py

* Updated train component readme

* fix typo

* Update details about how to get sample data for Train component

* update comment and give a defaault path for output

* change s3 bucket to match other sample pipelines

* Release eb69a6b8ae2d82cd8574ed11f04af4607756061c (#3466)

* Updated component images to version eb69a6b8ae2d82cd8574ed11f04af4607756061c

* Updated components to version 0e794e8a0eff6f81ddc857946ee8311c7c431ec2

* update version number

* Make endpoint_url None (#3374)

* update version (#3467)

* presubmit for MKP/Hosted (#3438)

* presubmit for MKP

* activate service account

Co-authored-by: Renmin <rmgogogo@users.noreply.github.com>

* version bump fix (#3472)

* Release 0.4.0: Update change log (#3468)

* Updated component images to version eb69a6b8ae2d82cd8574ed11f04af4607756061c

* Updated components to version 0e794e8a0eff6f81ddc857946ee8311c7c431ec2

* update version number

* update change log

* [Test] fix upgrade test (#3469)

* update deploy-pipeline-lite.sh

* fix

* fix?

* revert

* [UI - multiuser] Fix pod log namespace source (#3477)

* Fix pod log namespace source

* Fix unit tests

* Update documentation for AWS components (#3410)

* deploy_createModel_readme

* readme for batch and minor updates to deploy and create_model

* updates based on review comments 1

* correct SageMaker typo

* [Fix]Fix release (#3476)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* fix go.sum

* Add cache server and cache deployer in release cloud build file

* backend/metadata_writer: Pin python dependencies (#3408)

* [Frontend] Fix npm reported vulnerabilities (#3480)

* Fix server vulnerability

* Fix vulnerability in frontend

* Fix frontend vulnerabilities

* pin webpackv ersion

* [UI] Show execution details in Run Details Page ML Metadata tab of steps (#3457)

* ML metadata tab in run details page

* Show execution details UI in run details step tab

* Fix tests

* Revert unnecessary changes

* SDK - Components - Restored the yaml formatting style (#3488)

Fixing compatibility with PyYaml 5.3

* [Test]Fix e2e test (#3471)

* Initial execution cache

This commit adds initial execution cache service. Including http service
and execution key generation.

* fix master

* fix go.sum

* Fix e2e test

* Add max_cache_staleness for flipA

* add comments

* [UI] Redirect to experiment list page when namespace changes in RunDetails or Compare page (#3483)

* Redirect to experiment list page when namespace changes

* Fix namespace initialize case

* Update KubeflowClient.tsx

* Components - Add model URL to AutoML - Create model/dataset for tables  (#3486)

* Re-generated the components

* Components - Add model URL to AutoML - Create model for tables

Fixes https://github.com/kubeflow/pipelines/issues/3246

* Added dataset URL to the AutoML - Create dataset for tables component

* Qwiklab caip updates (#3423)

* updates to AI Platform sample for Qwiklabs

* notebook updates

* Qwiklab changes

* Upadate backend BUILD files (#3455)

* Upadate BUILD files

* add workspace file

* reenable visualization server in multi-user mode (#3475)

* [Deployment] Move crds to cluster-scoped kustomize folders (#3498)

* [Deployment] Move crds to cluster-scoped kustomize folders

* Fix naming

* Rename folder

* Add STRUCTURE.md, fix bug

* fix

* one project share one default bucket (#3478)

* pass projectID from env/configmap without user input (#3458)

* Metadata Writer - Log pod names (#3479)

Fixes https://github.com/kubeflow/pipelines/issues/3462

* [Testing] Reduce image build flakiness by share and retry cloudbuild jobs (#3492)

* Let presubmit tests share and retry cloudbuild

* Fix ongoing_build_ids

* Add retry for workload identity binding

* Fix build id

* fix

* Parralelize image buidling for api server and others

* Fix

* fix

* fix

* Fix again

* Allow retry twice instead

* Update deploy-pipeline-lite.sh

* Update batch_build.yaml

* Refine log and retry tests

* Update log and retry

* Update and retry

* Update build-images.sh

* [UI] Groups executions by run if it exists (#3485)

* Fix concurrent IAM policy changes flakiness (#3504)

* Enable NFS dynamic PVC (#3314)

* GPU with Kubeflow Pipeline Standalone (#3484)

* GPU with Kubeflow Pipeline Standalone

* done

* dont' check in compiled pipeline

* gpu tpu preemptible

* done

* scope and quota comment

* Update metadata-envoy-deployment.yaml (#3502)

* AWS sagemaker: fixed a bug in ground_truth and updated all components to use images from new docker hub repo (#3474)

* Don't leave active_learning_model_arn.txt empty

* updated readme for ground_truth_pipeline_demo

* update docker repo

* Small changes to readme of ground truth sample pipeline

* [SDK] Make service account configurable for build_image_from_working_dir (#3419)

* Add kfp-container-builder sa

* Allow service account to be configurable

* Fix tests

* Fix test

* Use documentation for service account to introduce compatibility with different types of installation

* updated doc

* clean up

* Update container_builder_test.py

* Update _build_image_api.py

* Update kustomization.yaml

* Add executable permission for presubmit tests mkp.sh

* add user agent header to boto3 client for aws components (#3487)

* add user agent header to boto client

* add component version according to license file

* fetch version from license file at runtime

* Add archive experiment feature in backend (#3359)

* add new field in db schema and api schema

* auto genereted types for experiment storage state

* add archive and unarchive methods to backend for experiments.

* auto generated archive/unarchive methods for epxeriments

* add archive and unarchive to client

* set proper storage state when creating experiment

* retrieve storage state when we get/list epxeriment(s)

* change expection in test to have storage state

* add storage state in resource manager test

* revise experiemnt server test

* revise api converter test

* integration test of experiment archive

* archive/unarchive experiment affect the storage state of runs in it

* test all the runs in archive/unarchive experiment

* test all runs are archived/unarchived with their experiment in experiment server

* integration test

* integration test: value type mismatch in assertion

* unused import; default value for storage state

* autogen code for frontend

* reorder the fields in api experiment schema

* switch the position of the two enum to verify a hypothesis

* Put a place hodler to prevent any valid item to take the value 0

* Get rid of the place holder since the cause of issue related to value 0 is found and fixed.

* The returned api experiment now has storage state field

* create experiment return doesn't contain storege state

* Cleanup needs to clean runs and pipeliens now

* a missing client

* use resource reference as fileter instead of experiment uuid

* use same namespace in archive unit test

* Leave archive/unarchive experiment integration test to a separate PR

* also need to update jobs when experiments are archived

* Change of unarchiving logic. When experiment is unarchived, jobs/runs in
it stay archived

* add unit test for the job status in archived/unarchived experiment

* change archive state to 3 value enum; add experiment integration test

* make archive state 3 value enum to avoid 0 value mapped to available; add integration test

* run swagger autogen

* fix an expected value

* fix experiment server test

* add job check in experiment server test

* update job crds

* fix a typo

* remove accidentally included irrelevant changes

* add missing licenses for viz server (#3529)

* add missing licenses for viz server

* removing unused licenses.

* SDK - Made YAML dumping more awesome (#3520)

See the root cause explanation in https://github.com/kubeflow/pipelines/issues/3519

* Components - Fixed BugQuery - Query component (#3514)

Working around an Argo bug.
Revert this when we upgrade to Argo version which has the fix: https://github.com/argoproj/argo/pull/1653

* Revise run_pipeline comment (as expected) (#3506)

* revise run_pipeline comment (as expected)

* add the explanation of behavior if old appoarch is used

* add periods to the end of the sentences

* Upload pipeline/pipeline version with a description (#3511)

* add desscription to client interface

* autogen

* version doesn't have description field

* swagger autogen

* remove two accidentally committed local python package

* Fix confusing .gitignore config

* Testing - Fixed python requirements for sample tests (#3536)

Cleaned up requirements.in
Included kfp package requirements.
Fixed version conflicts.
Generated requirements.txt using the improved script:
https://github.com/kubeflow/pipelines/pull/3535

* Infra - Improved the update_requirements script (#3535)

This helps generating requirements for multiple requirements.in files.
It also fixes the locking issues that seem to be caused by mounting directories using Docker.

* use reflect.DeepEqual to compare run structs (#3546)

* Fix list_run bug (#3539)

* add missing license for component image (#3543)

* [Backend][Multi-user] Support creating visualization in user namespace (#3495)

* Add namespace field to CreateVisualizationRequest

* Support getting visualization service URL with namespace

* fix typo

* Add auth checking & allow empty namespace

* Update check-build-image-status.sh (#3533)

* Services - Metadata Writer - Added support for custom_properties in all helper methods (#3556)

Fixes https://github.com/kubeflow/pipelines/issues/3552

* Sort resource references before checking for run struct equality (#3547)

* OSS 1.0 Kustomize part-2 parameterize & fix CloudSQL (#3540)

submit without wait for fix for following as no dependency
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubeflow_pipelines/3540/kubeflow-pipeline-e2e-test/1252173721301422090

* use better sample name (#3558)

* Fix test which uses Kustomize edit image but can't work with valueRef (#3572)

pass upgrade / installation test. submitting. now.

The e2e test fails but not due to this PR. Submit this PR to unlock KIR side

* Clusterrolebinding is using namespace which not parametrized (#3573)

submit quick to make sure others won't got confused

* Update ml-pipeline-ui-deployment.yaml (#3586)

* In single-user mode, experiment APIs shouldn't contain user namespace. (#3544)

* Update kustomization.yaml (#3582)

* Minor updates to packages (#3428)

* Qwiklab caip updates (#3512)

* updates to AI Platform sample for Qwiklabs

* notebook updates

* Qwiklab changes

* minor naming changes in sample notebook

* SDK - CLI - Fixed incompatibility with Python 3.5 (#3567)

* Enable cache-deployer as fixed the root cause in other PR (#3574)

* default to kubeflow

* done

* include cache as we found root cause is namespace

* fix

* change the default to kubeflow, more for manual upgrade

* Release ad9bd5648dd0453005225779f25d8cebebc7ca00 (#3560)

* Updated component images to version ad9bd5648dd0453005225779f25d8cebebc7ca00

* Updated components to version 01a23ae8672d3b18e88adf3036071496aca3552d

* update version to 0.5.0 (#3566)

* Metadata  Writer - Fixed pod name property setting (#3563)

* Remove compiled manifests (#3592)

* [UI] Multi user permission separation for artifact api (#3522)

* [UI Server] Proxy /namespaces/:namespace/artifacts/get requests to namespace specific artifact services

* [UI] Show artifacts by namespace

* Fix minio artifact link tests

* Fix DetailsTable tests

* Fix OutputArtifactLoader.test

* Change artifact proxy to use query param instead

* Add integration tests for artifact proxy

* Fix unit tests

* Rename service name

* Add comment

* add more comments

* Fix import

* Refactored how to spy on internal methods from tests

* [UI] Get execution details from metadata writer (#3553)

* [UI] fix - create run missing-param warning not reacting to param value changes (#3559)

* SDK - Removed the ArtifactLocation feature (#3517)

* SDK - Removed the ArtifactLocation feature

The feature was deprecated in v0.1.34 https://github.com/kubeflow/pipelines/pull/2326

* Removed the artifact_location sample

* fix #2802: Set ImagePullPolicy per pipeline.  (#3534)

* bump version

* default image pull policy

* Update sdk/python/kfp/dsl/_pipeline.py

* task setting should dominate

* Update sdk/python/kfp/dsl/_pipeline.py

* fixed merge misstake

* Add method to schedule a recurring run to python client (#2978)

* python_kfp_client: add method to create recurring run

* client: add list_recurring_runs, get_recurring_run

* kfp_client: swap _create_job_config <-> run_pipeline

* kfp_client: mk propper trigger

* [UI Server] Enable strict type checking and fix errors (#3593)

* wip

* Fix typing

* Fix build error

* Add type checking to tests

* Fix server typing

* Clean up

* Fix server typing

* AWS Sagemaker : Use json.dumps() to better organize the input and remove data_locations (#3518)

* construct channel input using json.dumps()

* remover data_location parameters

* add changelog

* Update version in license file and small changes to readme

* [API] Include namespace in visualization.swagger.json (#3588)

* include namespace in CreateVisualization

* include namespace in post body

* put namespace in the path and in front of visualization

* post /apis/v1beta1/visualizations/{namespace}

* [UI] Migrate to namespaced visualization request (#3603)

* Regenerate and use new visualization api

* [UI] Support namespaced visualization api

* [UI] Show cached steps (#3602)

* [UI] Show cached steps

* Tests for parseNodePhase

* Complete unit tests

* Update StatusUtils.test.tsx

* Add index on run details on experiment UUID & conditions & finished time (#3610)

* SDK - Compiler - Include the SDK version information in the compiled workflows (#3583)

* SDK - Compiler - Include the SDK version information in the compiled workflows

* Fixed the unit tests

* Removed the sdk_version annotation.

* [UI] get run id from both property and custom property (#3501)

* [UI] Groups executions by run if it exists

* Get run_id from both custom property and property

* [UI] Fix artifact url for multi namespace (#3605)

* Regenerate and use new visualization api

* [UI] Support namespaced visualization api

* Fix minio artifact link

* Fix tests

* Fixing volume size default value from 1 to 30 (#3598)

* SDK - Components - Task objects now have the .output attribute when component has only one output (#3622)

* [Caching] Add a cached label for cached pods (#3623)

* Update mutation.go

* Update mutation.go

* Update mutation.go

* Update mutation.go

* fix issue of creating default bucket (#3626)

* [Viewer] Service needs port name for istio (#3619)

* [Backend] Authorization service (#3627)

* Authorization service proto

* implement auth service

* Add unit tests

* [SDK] Add pod labels for telemetry purpose. (#3578)

* add telemetry pod labels

* revert the id label

* update compiler tests

* update cli arg

* bypass tfx

* update docstring

* SDK - Enabled file inputs to be optional (#3620)

* SDK - Enabled file inputs to be optional

* Added unit tests

* Components - Added readme for TFX components (#3637)

* Components - Added readme for TFX components

* Resolved review feedback

* Add two scripts to load test our api endpoints with measurement of run durations and api call latencies (#3587)

* script to profile pipeline api endpoint

* two plots

* another run api test

* clear cell output

* add some comments

* pipeline uses create pipeline

* add client

* checkpoint

* polish two scripts

* remove accidentally committed files

* added a success vs non-success plot; only measure run durations for succeeded runs

* Fix source for mlpipeline-ui-metadata in WorkflowParser (#3379)

* WorkflowParser->loadNodeOutputPaths
source: s3.endpoint === 's3.amazonaws.com' ? StorageService.S3 : StorageService.MINIO

* Use isS3Endpoint (server/aws-helper.ts) to identify artifact source

* npm run format

* created src/lib/AwsHelper.ts (for frontend code), because frontend client and frontend server do not share code for now.

* [UI] authorize tensorboard actions (#3639)

* Authorization service proto

* implement auth service

* Add unit tests

* Generate auth api client

* Authorization checks for tensorboard apis

* UI Server authorization checks

* Clean up error parsing

* Revert changes

* Fix portable-fetch not found bug

* Fix unit test

* Include portable-fetch required by api client

* Fix portable-fetch module import error

* Fix portable-fetch again

* Add unit tests

* Address CR comments

* add unit test for header

* Update readme

* Components - Upgraded the TFX components to 0.21.4 (#3641)

* Updated and synced the generated code

There is only 1 line of component specific code in each component function (apart frm the sunction signature).

* Updated some components that had older version of the generated code. The generated code is now the same everywhere.
* `input_channels_with_splits` is now generated based on the input artifact types
* TFX broke back compat: Removed `.split` from the artifacts. The components seem to now assume there is a single artifact in the channel.
* TFX broke back compat: changed the way artifact instances are created
* Updated container image to 0.21.4. There might have been backwards incompatible input/output changes - need to check and update.

* Updated component signatures

* Updated the generated component.yaml files

* Updated the sample notebook notebook

* Removed the optional output in Evaluator

Optional outputs are not supported yet. I'm not sure they're even correct according to MLMD.

* Updated the sample

* Sort job resource references before equality check (#3561)

* Cache - Stabilized the deployer script (#3634)

* Cache - Stabilized the deployer script

Fixed the bug in the waiting forever mechanism. Previously it would restart teh script if there is any Kubernetes connectivity problem. Should fix https://github.com/kubeflow/pipelines/issues/3609
The script no longer reinstalls the resources if the MutatingWebhookConfiguration already exists.

* Fixed the webhook detection

* Resolved review feedback

* Fix container.set_image_pull_policy documentation string (#3653)

The API function seemed to be not correct, which is now fixed.

Signed-off-by: Sascha Grunert <sgrunert@suse.com>

* kfp_client: fix wrong check (#3652)

* SDK - Components - Split load_component functions into loading the spec and creating task factory (#3614)

The PR is a refactoring.
Split all load_component* methods in _components and _component_store into _load_component_spec* and creating task factory from that spec.
This makes it easier to load the spec without having to create task factory functions.

* [Sample] Update base images used in pre-built samples (#3422)

* update base image

* refactor

* fix typo

* Revert "refactor"

This reverts commit 499f9604

* Revert "fix typo"

This reverts commit e2faeb46

* Revert "update base image"

This reverts commit 4f8d0977

* update docker file

* test

* insert line for changing default image

* Backend - Upgraded MLMD client to fix Metadata Writer (#3657)

There was a backwards-incompatible MLMD server-side change that caused the get_artifacts_by_uri API to start silently returning empty lists of artifacts.
Fixes https://github.com/kubeflow/pipelines/issues/3656

* [Backend] Add service account field to run and job api objects (#3649)

* Add service account field to run and job api objects

* Update description

* Fix field casing

* Add comment about the next field number

* Move namespace to cluster-scoped (#3662)

* move namespace to cluster-scoped-resource

* fix doc

* AWS Sagemaker : Add unit tests (#3642)

* Initial changes

* add one test for each component

* Add readme for unit tests

* add empty string test and dockerfile

* added dockerfile

* use python3 in dockerfile

* add coverage report to unit tests

* update readme for PR

* small changes to resolve git comments

* copy requirements.txt separately in dockerfile

* small changes

* pin pip package versions in unit_tests

* fix proxy URL issue (#3663)

* fix proxy URL issue

* fix another issue in same PR

* done (#3665)

* Add an operator to configure toleration of the GKE GPU node taints (#3671)

* Add an operator to configure toleration of the GKE GPU taints

https://cloud.google.com/kubernetes-engine/docs/how-to/gpus#gpu_pool

* Make sure of the google coding style with yapf

* Change the function name to add_gpu_toleration

* raise error when LRO has an error (#3666)

* [AWS SageMaker] Add CodeBuild Steps (#3668)

* Add initial unit test buildspec

* Add docker log output

* Add force no pytest color

* Update docker build to be quiet

* Add pass all environment variables

* Update unit test container env file

* Update env to use different syntax

* Remove daemon mode

* Remove TTY from docker run

* Add dryrun and dockercfg setup

* Update dryrun into CodeBuild logic

* Add mkdir for Docker config

* Update app version temporarily

* Revert app version temporarily

* Update unit test log file

* Add tag minor and major versions

* Update version temporarily

* Add print for major and minor tags

* Revert version back down

* Add deploy version override

* Update path to testing directories

* Fix tab formatting

* Fix pytest log directory

* gpu example wording cleanup (#3682)

* Katib Launcher Experiment Name Conflict (#3508)

* added uuid to experiment name so that it does not conflict when trying katib experiment with same name

* added uuid to experiment name so that it does not conflict when trying katib experiment with same name

* experiment name limited to 64 characters

* updated user name and email for the repo

* fixed the typo

Co-authored-by: Sandhya <sandhya@imanage.com>

* SDK - Components - Fixed bug in _strip_type_hints_using_lib2to3 (#3679)

* Support execution throttling for executing the pipelines (#3346) (#3439)

* Add parallelism limits to pipeline in kfp sdk

* fix lint error

* Add Nodeselector to pipelineconfig fix issue #2863 (#3616)

* updated version

* added pipeline nodeselector

* removed old legacy

* renaming

* update test

* Update sdk/python/kfp/compiler/compiler.py

* Travis - Disabled coveralls (#3659)

* Fixed the TFX conflicts in tests (#3690)

Fixes https://github.com/kubeflow/pipelines/issues/3689

* Update @kubeflow/frontend with fixes for #3625 and #3648 (#3695)

* [Backend] Use service account passed from api object (#3650)

* Add service account field to run and job api objects

* Update description

* Fix field casing

* Use service account from api object

* Fix bug and add unit test

* Save patched workflow spec into db instead

* Save service account to db model

* Fix unit tests

* Fix integration tests

* Fix upgrade test

* Update upgrade_test.go

* Experiment archiving related UI changes (#3615)

* archive button to experiment details page

* list of archived exp

* remove a prop

* restore; new run only in unarchived experiment

* in all experiments tab, only unarchived ones are listed

* refine dialog messages; disable selection of runs in archived experiment list

* unit test for experiment list component

* more unit tests

* remove unnecessary methods and props

* using tabs instead of radio buttons to switch between archived runs and archived experiments

* added tests

* fix rundetails test

* add tests

* sidenav snapshot

* sidenav snapshot update

* address comments

* format

* Cache - Fixed the cache deployer script (#3700)

The grep version seems to be different in the base image

* Cache - Add namespace to webhook config's name (#3702)

* Cache - Add namespace to webhook config's name

* Update deploy-cache-service.sh

* Updated the ComponentSpec schema (#3698)

* Components - Added support for Dataflow in TFX components (#3684)

* Components - Added support for Dataflow in TFX components

To use Dataflow, pass beam_pipeline_args to a component.
```
transformer_op(
    ...,
    beam_pipeline_args = [
        '--runner=DataflowRunner',
        '--experiments=shuffle_mode=auto',
        '--project=' + project_id,
        '--temp_location=' + gcs_bucket + '/tmp'),
        '--region=' + gcp_region,
        '--disk_size_gb=50',
    ],
)
```

These components use URI-based I/O since TFX with Beam's DataflowRunner only supports GCS URIs for inputs and outputs. With URI-based IO, the user must specify all output URIs themselves (e.g. `CsvEampleGen(..., output_examples_uri=...)`). Do not forget to do so. The `kfp.dsl.EXECUTION_ID_PLACEHOLDER` object can help construct execution-unique URIs, but if the component has multiple URIs, you will need to add some prefixes that are different for each output.

There is a bug in TFX+Beam which prevents using DataflowRunner, but these componenct contain a workaround. The workaround can be removed when the fixed verson of TFX is released https://github.com/tensorflow/tfx/commit/ddb01c02426d59e8bd541e3fd3cbaaf68779b2df

* Added the TFX on KFP Dataflow sample

* Updated the README.md file

* Enabled the blessing output of the Evaluator

The Evaluator does not always write to that URI, but for components with URI-based I/O this does not matter.

* Fixed the indent in YAML

* Addressed the review feedback

* Updated the sample after the component changes

* Fixed the Dataflow casing in the sample name

* Using channel_utils.unwrap_channel_dict

* Updated the sample pipeline

* Sjortened the .get expressions

* Updated the sample

* Make it possible to upload a version of the pipeline with CLI (#3672)

* Add upload_pipeline_version to kfp.Client

* Add the 'kfp pipeline upload_version' command to CLI

* Make sure to use upload_pipeline_version wihtout a helper func

* Make sure of the google coding style with yapf

* Fix up the pipeline id option

* Make the version and id options to the required options

* 3674: check if gcs bucket exist before creation (#3675)

* update version (#3694)

* [UI] textbox to select KSA when creating runs/jobs (#3651)

* Add service account field to run and job api objects

* Update description

* Fix field casing

* Use service account from api object

* Fix bug and add unit test

* [UI] Allow choosing Kubernetes service account

* fix unit tests

* fix format

* Also clone service account

* service account UI features

* Add unit test for cloning service account

* Fix frontend integration tests

* Integration tests for AWS SageMaker Components (#3654)

* integration tests for aws sagemaker components with comment

* address comment related to S3 dataset creation

* rev3: bug fix in conda env yaml and resuse sagemaker method to get image URI

* Add createModel test

	- reduce code duplication
	- add some utility methods

* 0.5.1 changelog (#3706)

* [UI] Improve TFX artifact visualization speed (#3712)

* Improve TFX artifact visualization speed

* Update OutputArtifactLoader.ts

* Fix loading

* Add nodeId back as an identifier

* [UI] Hide empty resource op manifest tab in run details page (#3713)

* Hide manifest tab when empty

* fix unit tests

* Fix tests

* [UI] Cleanup, remove types from urls in artifact/execution details page (#3715)

* Remove types from urls in artifact/execution details page

* Remove unused route params

* Fix snapshot tests

* Fixed small syntax error in a sample notebook (#3721)

* remove an accidentally committed debugging log (#3716)

* When patching the {{}} placeholder in parameter, check for possible nil pointer (#3714)

* check nil for a pointer before using it

* if parameter's value is nil pointer, use parameter's default

* [UI] Make visualization tab easier to understand (#3717)

* Rename artifacts tab to visualizations and add documentation link

* Show a banner when no visualizations

* Clean up code

* Update snapshots

* Fix banner tests

* Add unit test for visualization creator

* Update VisualizationCreator.tsx

* Update VisualizationCreator.tsx

* [frontend] Show artifact preview in UI (#2172) (#3707)

* show a preview of an artifact in the ui

* Add styling to preview box

* fix minor typo in unit test

* minor fixes + DetailsTable now accepts a ValueComponent again

* encode uricomponent for generate artifact url

* fix classname typo

* Added valueComponentProps for DetailsTable for better type checking.

* fix mock bug in test. peek -> maxbytes & maxlines for MinioArtifactPreview

* fix format

* mock Editor

* Travis - Made flake8 test optional (#3739)

* SDK - Annotate pods with component_ref (#3727)

* SDK - Annotate pods with component_ref

This preserves the information about the digest of the component and the location from which the component was loaded.

* Fixed compiler tests

* Travis - Use latest pip version (#3732)

* SDK - Prioritize lib2to3 when stripping type annotations (#3724)

* SDK - Prioritize lib2to3 when stripping type annotations

It's a standard python library (although not well supported) and it doe not leave training spaces.

* Fixed compiler test data

* [AWS SageMaker] Specify component input types (#3683)

* Replace all string types with Python types

* Update HPO yaml

* Update Batch YAML

* Update Deploy YAML

* Update GroundTruth YAML

* Update Model YAML

* Update Train YAML

* Update WorkTeam YAML

* Updated samples to remove strings

* Update to temporary image

* Remove unnecessary imports

* Update image to newer image

* Update components to python3

* Update bool parser type

* Remove empty ContentType in samples

* Update to temporary image

* Update to version 0.3.1

* Update deploy to login

* Update deploy load config path

* Fix export environment variable in deploy

* Fix env name

* Update deploy reflow env paths

* Add debug config line

* Use username and password directly

* Updated to 0.3.1

* Update field types to JsonObject and JsonArray

* Upgraded Argo to v2.7.5 (#3537)

* Upgraded Argo to v2.7.4

* Downgraded the Argo CLI version to 2.4.3

See https://github.com/argoproj/argo/issues/2793

* Removed the argo cli arg that had been removed

* Updated to Argo 2.7.5

* Added workflowtemplates and cronworkflows to the Role

* Added the new Argo CRDs

* [Backend] Allow capital letters and underscore in metric names (#3741)

* Allow capital letters and underscore in metric names

* Fix tests, add comments

* Update run_metric_util.go

* Fix bug in #3707 - href should show full artifact content instead of preview (#3745)

* MetadataStore: Upgrade tool (#3295)

* Show version tag in UI (#3743)

* Show version tag in UI

* Add new arguments to test cloudbuild configuration

* backend cloudbuild should use commit_sha as argument

* Fix minor bug during dev

* [UI] Wrap parameter/urls on overflow (#3747)

* [UI] Wrap parameter/urls on overflow

* Add comment about css

* Let artifact preview take over full width

* SDK - Components - Calculate component hash digest (#3726)

* SDK - Components - Calculate component hash digest

The digest is calculated when loading the component from URL, tfile or text.
Slightly refactored component loading - streams are no longer used, only bytes.
TODO: Calculate the digest if missing
TODO: Report possible digest conflicts

* Updated the test graph component

* Using the actual digest in the test

* SDK - Made outputs with original names available in ContainerOp.outputs (#3734)

* SDK - Made outputs with original names available in ContainerOp.outputs

Previously, ContainerOp had strict requirements for the output names, so we had to convert all the names before passing them to the ContainerOp constructor. Outputs with non-pythonic names could not be accessed using their original names.
Now ContainerOp supports any output names, so we're now using the original output names.
However to support legacy pipelines, we're also adding output references with pythonic names.

* Fixed the compiler test data

* Fixed the duplicate parameter outputs in the compiled workflow

* Fixed long line

* Stabilized the output naming conflict resolution

* Fix case of missing special outputs

* [UI] Fix artifact preview with outdated content (#3749)

* Fix preview with outdated content

* Update snapshot

* [UI] Show tooltip on long version names (#3750)

* [UI] Show tooltip on version name in selector

* Update snapshots

* Add tooltip for pipeline version in run list

* refactor code

* Metadata Writer - Preserve all Argo artifact information (#3725)

* Metadata Writer - Preserve all information in artifact URI

Previously only s3 artifacts were supported and only bucket and key were included (not endpoint, for example).

* Move Argo artifact information to artifact's custom_property

* [UI Server] Refactor for configurable auth header (#3753)

* [UI] Make auth header configurable

* Refactor authorizeFn to move side effects out

* Refactor tests to reduce duplication

* SDK - Components - Improved stability of the input and output renaming (#3738)

In some cases the input and output names need to be converted (for example, the input names need to be converted to python function parameter names).
With naive renaming, multiple inputs might be mapped to the same parameter name in some edge cases. The `generate_unique_name_conversion_table` creates a correct mapping.

However, in some really rare cases the resulting mapping could be confusing since it might rename an input whose name was already a correct parameter name and map a different input name to that parameter. E.g. {'AAA' -> 'aaa', 'aaa' -> 'aaa_2'}.
This PR fixes that. Names that do not change when applying the conversion_func will remain unchanged in the mapping. {'AAA' -> 'aaa_2', 'aaa' -> 'aaa'}.

* [AWS SageMaker] Unit tests for Training component (#3722)

* Added additional training unit tests

* Add main training function tests

* Add full training test coverage

* Fix import sys

* Fix poorly named test

* Components - Tensorboard visualization (#3760)

* [Servers] Add liveness and readiness probes (#3757)

* probes for ml-pipeline-ui

* clean up comments

* Use wget instead of curl, because wget is included in alpine

* Also update marketplace manifest

* Add readiness/liveness probe for api server

* Add probes for python vis server

* Add probes to metadata grpc service (#3765)

* Add probes to metadata grpc service

* Fix port name length limit

* Update README.md

* manual merge as the change it self is correct

but MKP mpdev:latest has an issue block our tests

* SDK - Moved some data from the component_ref annotation to the component_spec annotation (#3751)

Removing the component spec from component_ref (since it would be a duplicate), but making sure the whole spec if available in component_spec.

* AWS Sagemaker Components - enhance integration test coverage (#3720)

* AWS Sagemaker Components - enhance integration test coverage
	- Add tests for create endpoint, hpo job and batch transform
	- Minor bug fixes and documentation

* rev2: Address comments and clean up generated artifacts

* rev3: address more comments

* rev4: add canary test marker

* Trigger Build

* Add more approvers in AWS sagemaker components (#3740)

* SDK - Components - Removed the deprecated _python_op.get_default_base_image and set_default_base_image functions (#3773)

* SDK - Moved the tests closer to the code (#3774)

This makes switching from code to tests easier

* fix(testing) - Fix "1.14.10-gke.27" is unsupported (#3781)

* [Manifest] Use kustomize native image transformer to override image (#3776)

* [Manifest] Use kustomize native image transformer to override image

* Revert unintended changes

* Fix kustomization.yaml location

* Fix inverse proxy image

* SDK - Tests - Use relative imports (#3784)

This makes testing easier to run in local dev scenarios.

* [Backend] Make user identity header configurable (#3772)

* Make user identity header configurable

* use constants in UT.

* Allow PipelineParams in dict keys too. (#3565)

Co-authored-by: Thi Nguyen <duongnt@users.noreply.github.com>

* [ScheduledWorkflow] Fix events permission missing (#3785)

* Infer artifact store endpoint in metadata writer (#3530)

Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>

* Changing the default volume size to 30 (#3792)

* Client - Added documentation for the generated members (#3787)

* [AWS SageMaker] Integration tests automation (#3768)

* # This is a combination of 5 commits.
# This is the 1st commit message:

Add initial scripts

# This is the commit message #2:

Add working pytest script

# This is the commit message #3:

Add initial scripts

# This is the commit message #4:

Add environment variable files

# This is the commit message #5:

Remove old cluster script

* Add initial scripts

Add working pytest script

Add initial scripts

Add environment variable files

Remove old cluster script

Update pipeline credentials to OIDC

Add initial scripts

Add working pytest script

Add initial scripts

Add working pytest script

* Remove debugging mark

* Update example EKS cluster name

* Remove quiet from Docker build

* Manually pass env

* Update env list vars as string

* Update use array directly

* Update variable array to export

* Update to using read for splitting

* Move to helper script

* Update export from CodeBuild

* Add wait for minio

* Update kubectl wait timeout

* Update minor changes for PR

* Update integration test buildspec to quiet build

* Add region to delete EKS

* Add wait for pods

* Updated README

* Add fixed interval wait

* Fix CodeBuild step order

* Add file lock for experiment ID

* Fix missing pytest parameter

* Update run create only once

* Add filelock to conda env

* Update experiment name ensuring creation each time

* Add try/catch with create experiment

* Remove caching from KFP deployment

* Remove disable KFP caching

* Move .gitignore changes to inside component

* Add blank line to default .gitignore

* Add the 'kfp experiment' commands (#3705)

* Add the 'kfp experiment list' command

* Add the 'kfp experiment get' command

* Add the 'kfp experiment create' command

* Add the 'kfp experiment delete' command

* Add a caution to 'kfp experiment delete'

* Use directly the backend api method to list experiments

* Update a message based on the suggestion

https://github.com/kubeflow/pipelines/pull/3705#discussion_r424751792

* AWS SageMaker : Use IAM Roles for Service Account (#3719)

* don't use aws-secret and update readme for sample pipelines

* Addressed comments on PR and few more readme changes

* small changes to readme

* nit change

* Address comments

* [UI] Fix confusion matrix wrong axes (#3817)

* [UI] Fix confusion matrix wrong axes

* Fix confusion matrix background opacity

* Docs - Added kfp.dsl placeholders to docs (#3813)

* Adding HPO unit test (#3791)

* Adding HPO unit test

* Adding best training job

* Addressing comment

* Client - Allow specifying pipeline description when uploading (#3828)

* Client - Allow specifying pipeline description when uploading

Fixes https://github.com/kubeflow/pipelines/issues/3825

* Implemented review feedback

* [UI] Also cloning recurring run schedule, fixes #3761 (#3840)

* [UI] Also cloning recurring run schedule

* Fix unit test for trigger and utils

* Add and fix unit tests for Trigger

* Add NewRun page unit tests

* Fix unit tests

* Fix jest test timezone

* Testing - Pin numpy version to fix TFX installation instability in Travis tests (#3833)

TFX package is has inconsistent dependencies wwhich causes the installation to be flaky and install different numpy version every time leading to failures.

* [AWS SageMaker] Integration Test for AWS SageMaker GroundTruth Component (#3830)

* Integration Test for AWS SageMaker GroundTruth Component

* Unfix already fixed bug

* Fix the README I overwrote by mistake

* Remove use of aws-secret for OIDC

* Rev 2: Fix linting errors

* Components - Moved TFX components to deprecated directory (#3854)

* Added README for Amazon SageMaker Components for Kubeflow Pipelines (#3824)

* Create README.md

* Added README

Updated page to include information on Amazon SageMaker components

* Update README.md

* Integrated feedback

* A more accurate grpc error code for duplicate pipeline/pipeline version/experiment names (#3846)

* a more accurate grpc error code

* remove accidentally checked in file

* Add labels to plots (#3811)

* 5 runs

* 50 runs

* (1) add labels (2) instead of plotting kde, plotting histogram and rug

Co-authored-by: Yuan (Bob) Gong <gongyuan94@gmail.com>
Co-authored-by: Rui Fang <31815555+rui5i@users.noreply.github.com>
Co-authored-by: Alexey Volkov <avolkov@google.com>
Co-authored-by: Tommy Li <Tommy.chaoping.li@ibm.com>
Co-authored-by: Ajay Gopinathan <ajaygopinathan@google.com>
Co-authored-by: IronPan <panyang06231989@gmail.com>
Co-authored-by: Chen Sun <chensun@users.noreply.github.com>
Co-authored-by: Renmin <rmgogogo@users.noreply.github.com>
Co-authored-by: Renmin Gu <renming@google.com>
Co-authored-by: Eterna2 <eterna2@hotmail.com>
Co-authored-by: Rafael Barreto <rafaelbarreto87@gmail.com>
Co-authored-by: Johannes 'fish' Ziemke <github@freigeist.org>
Co-authored-by: Jiaxiao Zheng <jxzheng@google.com>
Co-authored-by: Ilias Katsakioris <elikatsis@arrikto.com>
Co-authored-by: dldaisy <37493043+dldaisy@users.noreply.github.com>
Co-authored-by: Samuel Ngahane <samuel.ngahane@gmail.com>
Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>
Co-authored-by: Shixin <luotigerlsx@users.noreply.github.com>
Co-authored-by: luoshixin <luoshixin@google.com>
Co-authored-by: Niklas Hansson <niklas.hansson@sandvik.com>
Co-authored-by: Paul Selden <pselden@users.noreply.github.com>
Co-authored-by: Kartik Kalamadi <akartsky@gmail.com>
Co-authored-by: jingzhang36 <jingzhangjz@google.com>
Co-authored-by: Suraj Kota <k.suraj1993@gmail.com>
Co-authored-by: dhodun <dhodun@google.com>
Co-authored-by: Kartik Kalamadi <kalamadi@amazon.com>
Co-authored-by: Suraj Kota <surakota@amazon.com>
Co-authored-by: hongye-sun <43763191+hongye-sun@users.noreply.github.com>
Co-authored-by: Mark Mirchandani <40443141+markmirch@users.noreply.github.com>
Co-authored-by: Jonas De Beukelaer <jonas.db@live.co.uk>
Co-authored-by: faweis <47742363+faweis@users.noreply.github.com>
Co-authored-by: frozeNinK <wenquan.xing@getcruise.com>
Co-authored-by: Gautam Kumar <gakumar@fb.com>
Co-authored-by: Dmitry B <dmitry.b@outlook.com>
Co-authored-by: Sascha Grunert <sgrunert@suse.com>
Co-authored-by: Shotaro Kohama <khmshtr28@gmail.com>
Co-authored-by: Nicholas Thomson <RedbackThomson@users.noreply.github.com>
Co-authored-by: Amy <amy@infosleuth.net>
Co-authored-by: Sandhya Gopchandani <s.gopchandani@gmail.com>
Co-authored-by: Sandhya <sandhya@imanage.com>
Co-authored-by: Pavel Taraskin <pahask8@gmail.com>
Co-authored-by: dushyanthsc <43390008+dushyanthsc@users.noreply.github.com>
Co-authored-by: Jiaxin Shan <seedjeffwan@gmail.com>
Co-authored-by: Thi Nguyen <thiduongnguyen@gmail.com>
Co-authored-by: Thi Nguyen <duongnt@users.noreply.github.com>
Co-authored-by: Gautam Kumar <gauta@amazon.com>
Co-authored-by: Meghna Baijal <30911248+mbaijal@users.noreply.github.com>
Co-authored-by: IvyBazan <45951687+IvyBazan@users.noreply.github.com>
Jeffwan pushed a commit to Jeffwan/pipelines that referenced this pull request Dec 9, 2020
Working around an Argo bug.
Revert this when we upgrade to Argo version which has the fix: argoproj/argo-workflows#1653
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Workflow fails when artifact file exists, but is empty
2 participants