Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardcoded S3 endpoint url in workflow controller configmap #257

Closed
Tracked by #309
surajkota opened this issue Jun 9, 2022 · 0 comments · Fixed by #291
Closed
Tracked by #309

Hardcoded S3 endpoint url in workflow controller configmap #257

surajkota opened this issue Jun 9, 2022 · 0 comments · Fixed by #291
Labels
bug Something isn't working

Comments

@surajkota
Copy link
Contributor

Describe the bug
Although the S3 endpoint URL is configurable in S3 deployment option. It is hardcoded in workflow-controller-configmap which causes issue if the global S3 endpoint does not work in some region like gov cloud: s3.us-gov-west-1.amazonaws.com

Steps To Reproduce
run data passing sample pipeline in gov cloud region

Expected behavior
Successfully pass data between pipeline steps

Environment

  • Kubernetes version
  • Using EKS (yes/no), if so version?
  • Kubeflow version 1.4
  • AWS build number 1.0.0
  • AWS service targeted (S3, RDS, etc.) S3

Screenshots

Additional context
Original issue:
Hi everyone! @Suraj Kota I'm relatively new to Kubeflow and currently stuck and looking to see if anyone else encountered a similar issue. I've successfully run an example Kubeflow pipeline locally on minikube and I'm trying to bring it up to our new Kubeflow v1.4 installation in AWS EKS and now it appears I can't pass artifacts between stages. When passing artifacts between stages on minikube it was by default implicitly using MinIO when using InputPath/OutputPath types. Now when running the same pipeline I'm getting:

This step is in Error state with this message: Error (exit code 1): failed to put file: The AWS Access Key Id you provided does not exist in our records.  

I have followed these instructions to create a Kubernetes Secret and attach AWS credentials. It's not clear to me when passing artifacts between stages with InputPath/OutputPath whether Kubeflow pipeline is still using MinIO or AWS S3. In either case the credentials should be there (I've verified that MinIO credentials are there and unchanged from default values).

Sample pipeline from Ryan McCaffrey:

import kfp
import kfp.components as comp


def extract_data(df_csv: comp.OutputPath('CSV')):
    """Imports and returns X,y iris data"""
    from sklearn import datasets
    import pandas as pd

    iris = datasets.load_iris(as_frame=True)
    X = iris.data
    y = iris.target

    df = pd.concat([X,y], axis=1)
    df.to_csv(df_csv, index=False, header=True)

create_step_extract_data = kfp.components.create_component_from_func(
    func=extract_data,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn'])


def split_data(df_path: comp.InputPath('CSV'),
               X_train_csv: comp.OutputPath('CSV'),
               X_test_csv: comp.OutputPath('CSV'),
               y_train_csv: comp.OutputPath('CSV'),
               y_test_csv: comp.OutputPath('CSV')):
    """Split data into train/test sets"""
    from sklearn.model_selection import train_test_split
    import pandas as pd

    df = pd.read_csv(df_path)

    X = df.drop(columns=['target'])
    y = df['target']

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

    X_train.to_csv(X_train_csv, index=False, header=False)
    X_test.to_csv(X_test_csv, index=False, header=False)
    y_train.to_csv(y_train_csv, index=False, header=False)
    y_test.to_csv(y_test_csv, index=False, header=False)


create_step_split_data = kfp.components.create_component_from_func(
    func=split_data,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn'])


def train_model(X_train_path: comp.InputPath('CSV'), y_train_path: comp.InputPath('CSV'), model_path: comp.OutputBinaryFile(bytes)):
    """Train model"""

    from sklearn.linear_model import LogisticRegression
    import pandas as pd
    from joblib import dump, load

    X_train = pd.read_csv(X_train_path, header=None)
    y_train = pd.read_csv(y_train_path, header=None)

    model = LogisticRegression()
    model.fit(X_train, y_train)

    dump(model, model_path)

create_step_train_model = kfp.components.create_component_from_func(
    func=train_model,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn', 'joblib'])

def make_preds(model_path: comp.InputBinaryFile(bytes), X_path: comp.InputPath('CSV'), preds_csv: comp.OutputPath('CSV')):
    """Make predictions"""

    from sklearn.linear_model import LogisticRegression
    import pandas as pd
    from joblib import dump, load

    model = load(model_path)
    X = pd.read_csv(X_path, header=None)
    preds = model.predict(X)
    preds = pd.Series(preds)

    preds.to_csv(preds_csv, index=False, header=False)

create_step_make_preds = kfp.components.create_component_from_func(
    func=make_preds,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn', 'joblib'])


def eval_model(preds_path: comp.InputPath('CSV'), y_true_path: comp.InputPath('CSV')):
    """Evaluate model"""
    from sklearn.metrics import accuracy_score
    import pandas as pd

    preds = pd.read_csv(preds_path, header=None)
    y_true = pd.read_csv(y_true_path, header=None)

    print(accuracy_score(preds, y_true))

create_step_eval_model = kfp.components.create_component_from_func(
    func=eval_model,
    base_image='python:3.8',
    packages_to_install=['pandas==1.3.4', 'sklearn'])

@kfp.dsl.pipeline(
  name='My pipeline',
  description='My machine learning pipeline'
)
def my_pipeline():
    extract_data_task = create_step_extract_data()
    split_data_task = create_step_split_data(df=extract_data_task.outputs['df_csv'])
    train_model_task = create_step_train_model(x_train=split_data_task.outputs['X_train_csv'], y_train=split_data_task.outputs['y_train_csv'])
    make_preds_task = create_step_make_preds(model_path=train_model_task.outputs['model_path'], x=split_data_task.outputs['X_train_csv'])
    eval_model_task = create_step_eval_model(preds=make_preds_task.outputs['preds_csv'], y_true=split_data_task.outputs['y_train_csv'])

kfp.compiler.Compiler().compile(
    pipeline_func=my_pipeline,
    package_path='pipeline.yaml')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant