Cleanup extensibility #202

kumare3 · 2020-10-07T21:04:20Z

Implement 2 extensions
Generic SQL task base and extended Spark Task

codecov-io · 2020-10-07T21:13:02Z

Codecov Report

❗ No coverage uploaded for pull request base (annotations@b8ab312). Click here to learn what that means.
The diff coverage is 79.19%.

@@              Coverage Diff               @@
##             annotations     #202   +/-   ##
==============================================
  Coverage               ?   80.50%           
==============================================
  Files                  ?      236           
  Lines                  ?    15590           
  Branches               ?     1343           
==============================================
  Hits                   ?    12550           
  Misses                 ?     2715           
  Partials               ?      325

Impacted Files	Coverage Δ
flytekit/bin/entrypoint.py	`69.56% <33.33%> (ø)`
flytekit/common/tasks/spark_task.py	`65.38% <50.00%> (ø)`
flytekit/annotated/interface.py	`78.82% <60.00%> (ø)`
flytekit/annotated/task.py	`80.51% <85.93%> (ø)`
flytekit/annotated/workflow.py	`81.69% <100.00%> (ø)`
flytekit/engine.py	`31.96% <100.00%> (ø)`
...unit/use_scenarios/unit_testing/test_type_hints.py	`95.03% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8ab312...2c8887e. Read the comment docs.

wild-endeavor · 2020-10-07T23:03:57Z

flytekit/annotated/workflow.py

- # logger.debug(f"Var name {output_name} wf output name {outputs[i]} type: {output_literal_type}")
- binding_data = _literal_models.BindingData(promise=out)
- bindings.append(_literal_models.Binding(var=output_name, binding=binding_data))
+ if len(output_names) > 0:


Why this change? Do we not trust workflow_outputs?

wild-endeavor · 2020-10-07T23:05:05Z

flytekit/annotated/task.py

 from flytekit.annotated.promise import Promise, create_task_output
 from flytekit.common import nodes as _nodes, interface as _common_interface
 from flytekit.common.exceptions import user as _user_exceptions
 from flytekit.common.promise import NodeOutput as _NodeOutput
 from flytekit.models import task as _task_model, literals as _literal_models
 from flytekit.models.core import workflow as _workflow_model, identifier as _identifier_model
-import inspect
-from flytekit.annotated.interface import transform_signature_to_typed_interface


 # This is the least abstract task. It will have access to the loaded Python function


Is this comment correct? Do you mean, "most abstract task"?

wild-endeavor · 2020-10-07T23:11:17Z

flytekit/annotated/task.py

@@ -155,53 +184,82 @@ def name(self) -> str:

 class PythonFunctionTask(Task):

- def __init__(self, task_function: Callable, metadata: _task_model.TaskMetadata, *args, **kwargs):
- interface = transform_signature_to_typed_interface(inspect.signature(task_function))
+ def __init__(self, task_function: Callable, metadata: _task_model.TaskMetadata, ignore_input_vars: List[str] = None,


What is ignore_input_vars for? Can we document the use case?

wild-endeavor · 2020-10-07T23:15:28Z

flytekit/annotated/interface.py


 from flytekit import logger
 from flytekit.annotated import type_engine
 from flytekit.common import interface as _common_interface
 from flytekit.models import interface as _interface_models


-def transform_signature_to_typed_interface(signature: inspect.Signature) -> _common_interface.TypedInterface:
+class Interface(object):


Can we talk about this next week?

wild-endeavor · 2020-10-07T23:17:14Z

flytekit/annotated/task.py

+class PysparkFunctionTask(PythonFunctionTask):
+ def __init__(self, task_function: Callable, metadata: _task_model.TaskMetadata, *args, **kwargs):
+ super(PysparkFunctionTask, self).__init__(task_function, metadata,
+ ignore_input_vars=["spark_session", "spark_context"], *args, **kwargs)


I feel like I'd rather add more stuff to PysparkFunctionTask and subclass Task directly, instead of adding functionality at the base layer to remove/add inputs. Or do you foresee other tasks needing this?

Should spark_session and spark_context be valid inputs for data catalog?

katrogan · 2020-10-07T21:58:33Z

tests/flytekit/unit/use_scenarios/unit_testing/test_type_hints.py

+ dt = t1()
+ sql(ds=dt)
+
+ my_wf()


did you want to test for something here?

katrogan · 2020-10-07T22:02:42Z

flytekit/annotated/interface.py

+ if vars is None:
+ return self
+ new_inputs = copy.copy(self._inputs)
+ for v in vars:


what happens if none of vars are in inputs? should we log or error?

katrogan · 2020-10-07T23:29:52Z

flytekit/annotated/interface.py


 from flytekit import logger
 from flytekit.annotated import type_engine
 from flytekit.common import interface as _common_interface
 from flytekit.models import interface as _interface_models


-def transform_signature_to_typed_interface(signature: inspect.Signature) -> _common_interface.TypedInterface:
+class Interface(object):


is this a general purpose interface? can we name it something more specific?

katrogan · 2020-10-08T17:59:29Z

flytekit/annotated/task.py

+ else:
+ # Question: How do you know you're going to enumerate them in the correct order? Even if autonamed, will
+ # output2 come before output100 if there's a hundred outputs? We don't! We'll have to circle back to
+ # the Python task instance and inspect annotations again. Or we change the Python model representation


do the annotations reference the correct/original order?

katrogan · 2020-10-08T18:04:45Z

flytekit/annotated/task.py

+ enumerate(native_outputs)}
+
+ # We manually construct a LiteralMap here because task inputs and outputs actually violate the assumption
+ # built into the IDL that all the values of a literal map are of the same type.


can we change idl then?

katrogan · 2020-10-08T18:04:53Z

flytekit/annotated/task.py

+ k: flytekit_engine.python_value_to_idl_literal(ctx, v, self.interface.outputs[k].type) for k, v in
+ native_outputs_as_map.items()
+ })
+ print("Outputs!")


Cleanup extensibility

2c8887e

kumare3 requested review from matthewphsmith and wild-endeavor as code owners October 7, 2020 21:04

kumare3 merged commit 9c118aa into annotations Oct 7, 2020

wild-endeavor reviewed Oct 7, 2020

View reviewed changes

katrogan reviewed Oct 8, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup extensibility #202

Cleanup extensibility #202

kumare3 commented Oct 7, 2020

codecov-io commented Oct 7, 2020 •

edited

Loading

wild-endeavor Oct 7, 2020

wild-endeavor Oct 7, 2020

wild-endeavor Oct 7, 2020

wild-endeavor Oct 7, 2020

wild-endeavor Oct 7, 2020

katrogan Oct 7, 2020

katrogan Oct 7, 2020

katrogan Oct 7, 2020

katrogan Oct 8, 2020

katrogan Oct 8, 2020

katrogan Oct 8, 2020

Cleanup extensibility #202

Cleanup extensibility #202

Conversation

kumare3 commented Oct 7, 2020

codecov-io commented Oct 7, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Oct 7, 2020 •

edited

Loading