Hardcoded S3 endpoint url in workflow controller configmap #257

surajkota · 2022-06-09T20:29:36Z

Describe the bug
Although the S3 endpoint URL is configurable in S3 deployment option. It is hardcoded in workflow-controller-configmap which causes issue if the global S3 endpoint does not work in some region like gov cloud: s3.us-gov-west-1.amazonaws.com

Steps To Reproduce
run data passing sample pipeline in gov cloud region

Expected behavior
Successfully pass data between pipeline steps

Environment

Kubernetes version
Using EKS (yes/no), if so version?
Kubeflow version 1.4
AWS build number 1.0.0
AWS service targeted (S3, RDS, etc.) S3

Screenshots

Additional context
Original issue:
Hi everyone! @Suraj Kota I'm relatively new to Kubeflow and currently stuck and looking to see if anyone else encountered a similar issue. I've successfully run an example Kubeflow pipeline locally on minikube and I'm trying to bring it up to our new Kubeflow v1.4 installation in AWS EKS and now it appears I can't pass artifacts between stages. When passing artifacts between stages on minikube it was by default implicitly using MinIO when using InputPath/OutputPath types. Now when running the same pipeline I'm getting:

This step is in Error state with this message: Error (exit code 1): failed to put file: The AWS Access Key Id you provided does not exist in our records.

I have followed these instructions to create a Kubernetes Secret and attach AWS credentials. It's not clear to me when passing artifacts between stages with InputPath/OutputPath whether Kubeflow pipeline is still using MinIO or AWS S3. In either case the credentials should be there (I've verified that MinIO credentials are there and unchanged from default values).

Sample pipeline from Ryan McCaffrey:

import kfp
import kfp.components as comp


def extract_data(df_csv: comp.OutputPath('CSV')):
    """Imports and returns X,y iris data"""
    from sklearn import datasets
    import pandas as pd

    iris = datasets.load_iris(as_frame=True)
    X = iris.data
    y = iris.target

    df = pd.concat([X,y], axis=1)
    df.to_csv(df_csv, index=False, header=True)

create_step_extract_data = kfp.components.create_component_from_func(
    func=extract_data,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn'])


def split_data(df_path: comp.InputPath('CSV'),
               X_train_csv: comp.OutputPath('CSV'),
               X_test_csv: comp.OutputPath('CSV'),
               y_train_csv: comp.OutputPath('CSV'),
               y_test_csv: comp.OutputPath('CSV')):
    """Split data into train/test sets"""
    from sklearn.model_selection import train_test_split
    import pandas as pd

    df = pd.read_csv(df_path)

    X = df.drop(columns=['target'])
    y = df['target']

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

    X_train.to_csv(X_train_csv, index=False, header=False)
    X_test.to_csv(X_test_csv, index=False, header=False)
    y_train.to_csv(y_train_csv, index=False, header=False)
    y_test.to_csv(y_test_csv, index=False, header=False)


create_step_split_data = kfp.components.create_component_from_func(
    func=split_data,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn'])


def train_model(X_train_path: comp.InputPath('CSV'), y_train_path: comp.InputPath('CSV'), model_path: comp.OutputBinaryFile(bytes)):
    """Train model"""

    from sklearn.linear_model import LogisticRegression
    import pandas as pd
    from joblib import dump, load

    X_train = pd.read_csv(X_train_path, header=None)
    y_train = pd.read_csv(y_train_path, header=None)

    model = LogisticRegression()
    model.fit(X_train, y_train)

    dump(model, model_path)

create_step_train_model = kfp.components.create_component_from_func(
    func=train_model,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn', 'joblib'])

def make_preds(model_path: comp.InputBinaryFile(bytes), X_path: comp.InputPath('CSV'), preds_csv: comp.OutputPath('CSV')):
    """Make predictions"""

    from sklearn.linear_model import LogisticRegression
    import pandas as pd
    from joblib import dump, load

    model = load(model_path)
    X = pd.read_csv(X_path, header=None)
    preds = model.predict(X)
    preds = pd.Series(preds)

    preds.to_csv(preds_csv, index=False, header=False)

create_step_make_preds = kfp.components.create_component_from_func(
    func=make_preds,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn', 'joblib'])


def eval_model(preds_path: comp.InputPath('CSV'), y_true_path: comp.InputPath('CSV')):
    """Evaluate model"""
    from sklearn.metrics import accuracy_score
    import pandas as pd

    preds = pd.read_csv(preds_path, header=None)
    y_true = pd.read_csv(y_true_path, header=None)

    print(accuracy_score(preds, y_true))

create_step_eval_model = kfp.components.create_component_from_func(
    func=eval_model,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn'])

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline():
    extract_data_task = create_step_extract_data()
    split_data_task = create_step_split_data(df=extract_data_task.outputs['df_csv'])
    train_model_task = create_step_train_model(x_train=split_data_task.outputs['X_train_csv'], y_train=split_data_task.outputs['y_train_csv'])
    make_preds_task = create_step_make_preds(model_path=train_model_task.outputs['model_path'], x=split_data_task.outputs['X_train_csv'])
    eval_model_task = create_step_eval_model(preds=make_preds_task.outputs['preds_csv'], y_true=split_data_task.outputs['y_train_csv'])

kfp.compiler.Compiler().compile(
    pipeline_func=my_pipeline,
    package_path='pipeline.yaml')

The text was updated successfully, but these errors were encountered:

surajkota added the bug Something isn't working label Jun 9, 2022

rrrkharse mentioned this issue Jul 12, 2022

Set s3 endpoint in workflow-controller-configmap from pipeline-instal… #291

Merged

rrrkharse closed this as completed in #291 Jul 14, 2022

surajkota mentioned this issue Aug 4, 2022

[TRACKING] AWS Distribution for Kubeflow v1.6 #309

Closed

22 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hardcoded S3 endpoint url in workflow controller configmap #257

Hardcoded S3 endpoint url in workflow controller configmap #257

surajkota commented Jun 9, 2022

Hardcoded S3 endpoint url in workflow controller configmap #257

Hardcoded S3 endpoint url in workflow controller configmap #257

Comments

surajkota commented Jun 9, 2022