Skip to content

Releases: awslabs/kubeflow-manifests

v1.7.0-aws-b1.0.3

01 Sep 22:22
7faf1a5
Compare
Choose a tag to compare

⚠️ New Installations of AWS Kubeflow v1.7.0-aws-b1.0.3 will not work because the old OIDC auth image has been taken down see kubeflow/manifests#2469 for more details. Existing Installations will still function.

What's Changed

  • Terraform related changes
    • Infrastructure and application deployment for terraform now use aws-eks-blueprints v4.32.1 in order to mitigate a breaking change caused by the aws-eks-blueprints github repository being refactored. (#776)
    • Fix to pass variable mysql_engine_version to RDS module by @elanv in #781
    • Increasing the default disk space for gpu instances and adding a variable to configure the disk size. (#782)
    • Update rds engine version in terraform deployments. (#785)
    • Migrate to aws-eks-blueprints v5 for Kubernetes Addon installation. For GPU deployments the Nvidia operator will be installed. (#779)
  • Helm/Kustomize related changes
    • Updated the AWS Load Balancer Controller’s Alb policy to include AddTag permissions in Create* operations in accordance with the new aws policy change. By @ananth102 in #777

New Contributors

Full Changelog: v1.7.0-aws-b1.0.2...v1.7.0-aws-b1.0.3

v1.7.0-aws-b1.0.2

09 Jun 21:17
f3f4125
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.7.0-aws-b1.0.0...v1.7.0-aws-b1.0.2

v1.7.0-aws-b1.0.1

11 May 20:17
71af1d0
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.7.0-aws-b1.0.0...v1.7.0-aws-b1.0.1

v1.7.0-aws-b1.0.0

25 Apr 00:27
bba1bc9
Compare
Choose a tag to compare

What’s New

This release offers the following features:

  • Added support for Kubeflow v1.7.0. Upstream Kubeflow components versions as listed in components versions table
  • Support IAM Role for Service Account (IRSA) for using Amazon S3 as artifact store for Kubeflow Pipelines
    • IRSA can be used to configure Amazon S3 as an artifact store for pipelines. IRSA allows to use temporary credentials to make API requests and to scope permissions at pod level via Kubernetes service accounts. Instead of creating static IAM User credentials to access S3, using IRSA implements the security best practices of principle of least privilege and credential isolation. (#571, #601, #613, #680, #685)
    • Starting this release, we are deprecating the use of IAM user/static credentials in favor of IRSA to configure S3 with Kubeflow pipelines. We highly recommend migrating to using IRSA. For more details about this change refer to the Github issue #704
  • Configure Server side encryption and block public access to S3 bucket used by Kubeflow Pipelines by default as security best practice (#517, #518)
  • Support using IRSA with KServe Inference Services. Use this feature to pull images from private ECR repository or load models directly from S3 bucket.
  • Support for using Amazon S3 as an object store backend for TensorBoard. Users can now visualize TensorBoard compatible logs stored in S3 published by model servers and training jobs(including TrainingJobs run on SageMaker) to track experiment metrics like loss and accuracy, visualizing the model graph etc.
  • Added ability to annotate the service account using AWSIAMforServiceAccount Plugin. Users can use this feature if their organizational policies restrict them from using profile controller for updating IAM policies.
    • Setting annotateOnly to true in AWSIAMforServiceAccount Plugin will only annotate the service account in user profile and skip mutating the IAM Policy.
  • Support configuring Amazon S3 as a remote backend for storing Terraform state (#674)
  • Support configuring auto stopping of idle Jupyter Notebook Servers
    • Enabled support for Notebook Culling. Users can save infrastructure costs by specifying notebook instance to stop if it stays idle for certain period of time. (#470)
  • Updated notebook containers with the latest AWS optimized Deep Learning Containers(DLC) based on Tensorflow 2.12.0 and PyTorch 2.0.0 (#676)
  • Updated Training and Inference containers with the latest AWS optimized Deep Learning Containers(DLC) based on Tensorflow 2.12 and PyTorch 2.0. Support for CPU/GPU based single node training, distributed training, and inference. For latest DLC images, refer to list of DLC images
  • Updated the following drivers to newer versions:
    • FSx CSI Driver to v0.9.0
    • EFS CSI Driver to v1.5.4
    • AWS Load Balancer Controller to v2.4.7
  • Updated SageMaker Operator for k8s (ACK) to v1.2.1
    • Training Job resource now supports Managed warm pool, heterogeneous clusters through Instance Groups and Retry Strategy
    • Added support for SageMaker Pipeline and Pipeline Execution
    • Training Job resource now supports Update Operations.
    • Support for Deployment guard rails for Endpoint Resource.
    • Support for Serverless Endpoint for Endpoint Config Resource.
    • Support for retaining AWS resources after CR deletion.
  • Supports latest versions of Amazon EKS - eks-compatibility
  • Support for Kustomize v5.0.1
  • Bugfixes and improvements to the automated scripts

Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.7.0-aws-b1.0.0/docs/

Known Issues:

Full Changelog: release-v1.6.1-aws-b1.0.2...release-v1.7.0-aws-b1.0.0

v1.6.1-aws-b1.0.2

07 Apr 18:57
38edf2b
Compare
Choose a tag to compare

What's Changed

Known Issues

  • #118 (Workaround documented in issue)

Full Changelog: v1.6.1-aws-b1.0.1...v1.6.1-aws-b1.0.2

v1.6.1-aws-b1.0.1

01 Mar 18:27
80094bc
Compare
Choose a tag to compare

What's Changed

Known Issues

  • #653 (Resolved, please use the latest version)
  • #118 (Workaround documented in issue)

Update Existing Kubeflow Installations with RDS or S3 integrations

On February 6th, the Kubernetes project announced changes to the existing community-owned image registry called k8s.gcr.io to host its container images. On the 3rd of April 2023, the old registry k8s.gcr.io will be frozen and no further images for Kubernetes and related subprojects will be pushed to the old registry. The Kubernetes community recommends to start using the new registry.k8s.io as soon as possible. For more information read the community blog.

Only the Secrets Store CSI Driver in the AWS Distribution of Kubeflow is effected. To update the image registry to point towards the new registry.k8s.io please follow the instructions documented in the below github issue comment.

Full Changelog: v1.6.1-aws-b1.0.0...v1.6.1-aws-b1.0.1

v1.6.1-aws-b1.0.0

14 Oct 22:52
a7e2ce1
Compare
Choose a tag to compare

What’s New

This release offers the following features:

  • Added support for Kubeflow v1.6.1.
    • Component versions as listed in components versions table
    • Updated SageMaker operator for k8s (ACK) to version 0.4.5
    • Updated notebook containers with the latest deep learning containers based on Tensorflow 2.10.0 and PyTorch 1.12.1 (#473)
  • Includes all the features from v1.6.0-aws-b1.0.0 (Preview)
    • Integration of SageMaker with Kubeflow to run hybrid machine learning workflows using SageMaker Operators for Kubernetes (ACK) and SageMaker Components for Kubeflow Pipelines. Documentation
    • Added support for Infrastructure as Code (IaaC) 1-click deployment for Kubeflow on AWS using Terraform (preview)
    • Added helm support for all supported deployment options (preview)
    • Added integration with Prometheus, Amazon Managed Service for Prometheus, and Amazon Managed Grafana to monitor metrics with Kubeflow on AWS. Documentation
    • Automated deployment options have been improved to be simplified and more stable (User-friendly make commands, Deterministic install/uninstall etc)
    • Integration with AWS Deep Learning Containers to run distributed training and inference workloads
    • Supports newer versions of EKS - eks-compatibility
  • Added Nvidia GPU support in Terraform (#396)
  • Enabled KFP Visualizations and Artifact Store with S3 as source (#456)
  • Bugfixes and improvements to the automated scripts

Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.6.1-aws-b1.0.0/docs/

Known Issues:

Update Existing Kubeflow Installations with RDS or S3 integrations

On February 6th, the Kubernetes project announced changes to the existing community-owned image registry called k8s.gcr.io to host its container images. On the 3rd of April 2023, the old registry k8s.gcr.io will be frozen and no further images for Kubernetes and related subprojects will be pushed to the old registry. The Kubernetes community recommends to start using the new registry.k8s.io as soon as possible. For more information read the community blog.

Only the Secrets Store CSI Driver in the AWS Distribution of Kubeflow is effected. To update the image registry to point towards the new registry.k8s.io please follow the instructions documented in the below github issue comment.

Full Changelog: release-v1.6.0-aws-b1.0.0...release-v1.6.1-aws-b1.0.0

v1.6.0-aws-b1.0.0 (Preview)

22 Sep 21:03
355b72c
Compare
Choose a tag to compare

What’s New

This is a preview release for Kubeflow v1.6. The Kubeflow working groups have identified some regressions in v1.6.0 which will be addressed in v1.6.1. More details can be found here.

This release offers the following features:

  • Added support for Kubeflow v1.6.0. Component versions as listed in components versions table
  • Integration of SageMaker with Kubeflow to run hybrid machine learning workflows using SageMaker Operators for Kubernetes (ACK) and SageMaker Components for Kubeflow Pipelines. Documentation
  • Added helm support for all supported deployment options
  • Automated deployment options have been improved to be simplified and more stable
  • Added support for Infrastructure as Code (IaaC) 1-click deployment for Kubeflow on AWS using Terraform (preview)
    • Terraform stacks added for all supported deployment options
      • Creates a VPC and EKS Cluster
      • Creates S3 buckets, RDS instances, and/or Cognito resources as needed
      • Configures and deploys Kubeflow
    • Configured using EKS Blueprints for improved customizability/extensability
  • Configurable S3 endpoint configuration for S3 and RDS-S3 deployment options, allowing PrivateLink and non-commercial region users to connect to their respective S3 endpoints
  • Added integration with Prometheus, Amazon Managed Service for Prometheus, and Amazon Managed Grafana to monitor metrics with Kubeflow on AWS. Documentation
  • Updated notebook containers with the latest deep learning containers based on Tensorflow 2.9.1 and PyTorch 1.12 (#363)
  • Integration with AWS Deep Learning Containers to run distributed training and inference workloads
  • Enable usage of HTTPs only S3 bucket (#335)
  • Support for EKS - 1.22, 1.23

This release includes the following bug fixes:

  • Re-enable mysql for s3-only pipelines deployment (#310)

Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.6.0-aws-b1.0.0/docs/

Known Issues

Full Changelog: release-v1.5.1-aws-b1.0.2...release-v1.6.0-aws-b1.0.0

v1.5.1-aws-b1.0.2

17 Sep 03:21
Compare
Choose a tag to compare

What’s New

This release includes the following bug fixes merged as part of #373:

  • Fix S3 bucket name substitution for all S3 related deployments, i.e. Cognito-RDS-S3, RDS-S3, S3 (#333)
    • See #336 for more details about the issue. This bug was introduced in v1.5.1-aws-b1.0.1.
  • Fix the missing mysql resources in S3 only deployment (#310)
  • Enable usage of HTTPs only S3 bucket (#244)
  • Fix for RDS-S3 test (#341)

Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.5.1-aws-b1.0.2/docs/

Known Issues

  • #117 (Workaround documented in issue)
  • #118 (Workaround documented in issue)
  • kubeflow/pipelines#7361 (Terminating the pipeline run does not trigger the deletion logic programmed via the signal handled in a component. This affects all components in general. Terminate functionality in SageMaker components for Kubeflow pipelines is also affected. Workaround is to manually stop the training jobs)

Full Changelog: v1.5.1-aws-b1.0.1...v1.5.1-aws-b1.0.2

v1.5.1-aws-b1.0.1

06 Aug 03:22
2a536c3
Compare
Choose a tag to compare

What’s New

This release includes the following bug fixes:

  • fix Kserve's ingress Gateway (#311)
  • Add support for non-root EFS files ownership( #268)
  • Hardcoded S3 endpoint url in workflow controller configmap (#257)
  • Add CDK created EKS cluster subnet tags for RDS script (#295)
  • Doc fixes: #304, #307

Updated documentation available at: https://awslabs.github.io/kubeflow-manifests/release-v1.5.1-aws-b1.0.1/docs/

Known Issues

  • #117 (Workaround documented in issue)
  • #118 (Workaround documented in issue)
  • kubeflow/pipelines#7361 (Terminating the pipeline run does not trigger the deletion logic programmed via the signal handled in a component. This affects all components in general. Terminate functionality in SageMaker components for Kubeflow pipelines is also affected. Workaround is to manually stop the training jobs)

Full Changelog: v1.5.1-aws-b1.0.0...v1.5.1-aws-b1.0.1