-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Union Types RFCs #1926
Union Types RFCs #1926
Conversation
Thank you for opening this pull request! 🙌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome document. Once we are done with this work, I would love if you can/are willing to present this work (in OSS, Blog, Official Docs).. I've a few questions about the assumptions below... happy to jump on a call if it's easier to hash out the details.
|
||
# Proposed Implementation | ||
|
||
Use a string tag. In Haskell use the symbolic tag. In C++ use the index (serialized to a string) as the tag. In Python use the name of the type transformer (already present on all transformers). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with most of this and I appreciate the deep dive into various implementations in other languages… one goal that I would like to emphasize is that a task written in Java (or C++.. etc.) that accepts a Union should be usable from a workflow written in Python… and vis versa… I could be wrong but I got the sense that there is an assumption of homogeneity between tasks/wfs when it comes to the language used…
This will manifest in how the tag will be filled and subsequently used. We need to be able to consistently populate it across languages as well as consistently consume it in any of the languages…
|
||
In Python the correspondence between a tag and the choice of type transformer must be 1-to-1 i.e. type transformer names must be made unique which is already the case but not formalized. | ||
|
||
The matching procedure for Haskell and C++ is trivial. In Python, we must deal specially with duplicates. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we do "compile" the workflows on the server side, we will need to be able to validate something like this:
@task
def task2(typing.Union[int, str]) -> int:
return 5
@workflow
def my_wf() -> int:
a = task1()
return task2(in=a)
The system will need to know that task2's input accept task1's output (even if task1 is declared in a completely different language)... so the way we represent the type will need to be language agnostic.
Moreover, since the system is able to make this connection, it should also be able to, at compile time, to define the target type binding/choice (i.e. will it be the int
or the str
)...
The target container's SDK/language will also need to understand that of course and be able to map between flyteidl's representation of the target type choice (e.g. int
) to its needed Union's representation (index based tag or otherwise)...
... | ||
``` | ||
|
||
In this case, having the `MyList` tag is not enough to disambiguate the choice of variant. We iterate each candidate variant and try to match it recursively with the literal. It is not required that all type transformers can fail gracefully when given a value of any incorrect type, only that they fail on union `LiteralRepr`s and that the union type transformer can recognize its own literals. The only possible difference between how the types are resolved is in the choice of type transformers, which is made completely unambiguous by traversing the type tree with its tags (since the tags resolve the only ambiguity). This procedure thus guarantees a value will be recovered appropriately from a union IDL representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal is to make this matching a compile-time matching. Based on the LiteralType of the connection (task1
returns foo
of type blah
, try to match blah
against one of the union variants of x
)...
At runtime, the SDK, if needed, can use the compile-time matching information to instruct the language of which variant is desired... I say 'if needed' because in Python, as you said, we don't actually need to pass the typing information of the literal...
|
||
We could use an index-based tag, but there are multiple reasons why that is not a good idea either: | ||
|
||
- `typing.Union` semantics are that of a set (the order does not matter for equality comparison, duplicates are eliminated), though in practice it the source ordering of the variants can be recovered in cases without duplicates. Note that the CPython implementation actually uses a tuple behind the scenes, so this is more about intent and less about factual behavior of the class. The [PEP](https://www.python.org/dev/peps/pep-0484/#union-types) also specifies that Union is expected to accept a "set" of types, though it is unclear whether this is referring to a specific data structure or just a figure of speech |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new insight for me. The whole section around colliding types within a Union and the equivalence of Unions of different variant ordering... Much appreciated. I think as long as we do not store the tag in the Literal itself and rather in the BindingInfo (that's computed for a particular task version), we will be fine, right? because as you said, a Literal produced by task1 v1 may be used to call task2 v1 in one case or task2 v2 in another... and it has to be portable... The binding, however, is computed when compiling the workflow closure (where all versions are pinned)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind checking DCO guide to make it happy? then we can merge this!
Signed-off-by: maximsmol <maximsmol@gmail.com>
Signed-off-by: maximsmol <maximsmol@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Congrats on merging your first pull request! 🎉 |
PR'd to allow comments/discussion