Spark DataFrames handled as a type if using spark #267

kumare3 · 2020-12-02T06:26:37Z

This is also an overhaul of the schema system to support remote IO
based dataframes and regular pandas style dataframes. All of them can
be added at the user layer by implementing 2 classes SchemaReader
/ SchemaWriter.
If the new dataframe is to be added as a supported type then, it
contributor needs to implement the TypeTransformer interface as well

- This is also an overhaul of the schema system to support remote IO based dataframes and regular pandas style dataframes. All of them can be added at the user layer by implementing 2 classes SchemaReader / SchemaWriter. - If the new dataframe is to be added as a supported type then, it contributor needs to implement the TypeTransformer interface as well

wild-endeavor

some minor comments

wild-endeavor · 2020-12-02T16:48:54Z

flytekit/annotated/base_task.py

- literals[k] = TypeEngine.to_literal(ctx, v, py_type, literal_type)
- outputs_literal_map = _literal_models.LiteralMap(literals=literals)
- return outputs_literal_map
+ def pre_execute(self, user_params: ExecutionParameters) -> ExecutionParameters:


Not a huge fan of this function signature... can we think of a way around this? i'd rather pass in the parent FlyteContext and access the user space params from there. This function name makes it seem like a generic setup call, but it always takes in and returns just the user params? that seems limiting

flytekit/types/schema.py

wild-endeavor · 2020-12-02T17:01:07Z

tests/flytekit/unit/annotated/test_type_hints.py

+ return my_spark(df=df)
+
+ x = my_wf()
+ reader = x.open(pandas.DataFrame)


what happens if pandas.DataFrame is not specified?

default is pandas.DataFrame actually, I can drop it

Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

kumare3 requested review from matthewphsmith and wild-endeavor as code owners December 2, 2020 06:26

wild-endeavor approved these changes Dec 2, 2020

View reviewed changes

kumare3 and others added 6 commits December 2, 2020 11:12

Update flytekit/types/schema.py

ea7ffd2

Co-authored-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>

update 3.7 also to use spark3

f508bc9

updated

0c78dd4

Merge branch 'annotations' into sparkdataframe

0b9c336

unit test workaround

1bfc624

fmt fixed

de9e840

kumare3 merged commit 74a566a into annotations Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark DataFrames handled as a type if using spark #267

Spark DataFrames handled as a type if using spark #267

kumare3 commented Dec 2, 2020

wild-endeavor left a comment

wild-endeavor Dec 2, 2020

wild-endeavor Dec 2, 2020

kumare3 Dec 2, 2020

Spark DataFrames handled as a type if using spark #267

Spark DataFrames handled as a type if using spark #267

Conversation

kumare3 commented Dec 2, 2020

wild-endeavor left a comment

Choose a reason for hiding this comment

wild-endeavor Dec 2, 2020

Choose a reason for hiding this comment

wild-endeavor Dec 2, 2020

Choose a reason for hiding this comment

kumare3 Dec 2, 2020

Choose a reason for hiding this comment