Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark DataFrames handled as a type if using spark #267

Merged
merged 7 commits into from
Dec 3, 2020

Conversation

kumare3
Copy link
Contributor

@kumare3 kumare3 commented Dec 2, 2020

  • This is also an overhaul of the schema system to support remote IO
    based dataframes and regular pandas style dataframes. All of them can
    be added at the user layer by implementing 2 classes SchemaReader
    / SchemaWriter.
  • If the new dataframe is to be added as a supported type then, it
    contributor needs to implement the TypeTransformer interface as well

 - This is also an overhaul of the schema system to support remote IO
   based dataframes and regular pandas style dataframes. All of them can
be added at the user layer by implementing 2 classes SchemaReader
/ SchemaWriter.
 - If the new dataframe is to be added as a supported type then, it
   contributor needs to implement the TypeTransformer interface as well
Copy link
Contributor

@wild-endeavor wild-endeavor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments

literals[k] = TypeEngine.to_literal(ctx, v, py_type, literal_type)
outputs_literal_map = _literal_models.LiteralMap(literals=literals)
return outputs_literal_map
def pre_execute(self, user_params: ExecutionParameters) -> ExecutionParameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of this function signature... can we think of a way around this? i'd rather pass in the parent FlyteContext and access the user space params from there. This function name makes it seem like a generic setup call, but it always takes in and returns just the user params? that seems limiting

flytekit/types/schema.py Outdated Show resolved Hide resolved
return my_spark(df=df)

x = my_wf()
reader = x.open(pandas.DataFrame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if pandas.DataFrame is not specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is pandas.DataFrame actually, I can drop it

@kumare3 kumare3 merged commit 74a566a into annotations Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants