Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic and Discriminated Unions #1467

Open
irgolic opened this issue Sep 13, 2023 · 14 comments
Open

Dynamic and Discriminated Unions #1467

irgolic opened this issue Sep 13, 2023 · 14 comments
Labels
topic: feature Discussions about new features for Python's type annotations

Comments

@irgolic
Copy link

irgolic commented Sep 13, 2023

When modeling semi-structured data, discriminated unions are an invaluable tool in your arsenal.
It allows you to model a variable unit of data, where each unit concisely specifies its type and payload. This is a common pattern in event-driven architectures, where you have a stream of "events", "commands", or "actions", each of which is a discriminated union of event types.

I'd like to propose two new features to the typing module that formalize dynamic and discriminated unions.

1️⃣ Dynamic Unions

Current state

Assume we have a Status model that is a discriminated union of StatusA and StatusB:

from typing import Literal, Union
from pydantic import BaseModel


class StatusBase(BaseModel):
    status: str


class StatusA(StatusBase):
    status: Literal['a']


class StatusB(StatusBase):
    status: Literal['b']

To define a union that is understood by the type-checker at static-analysis time, you declare it by mentioning every member of the union:

Status = Union[StatusA, StatusB]
reveal_type(Status)  # Type of "Status" is "type[StatusA] | type[StatusB]"
print(Status)  # typing.Union[__main__.StatusA, __main__.StatusB]

Whenever adding a new subclass of StatusBase, this declaration must be updated.

I often abuse a pattern of lazily declaring a dynamic union instead of maintaining a static one, by tying all the members of the union to the subclasses of a parent class (in this case StatusBase):

Status = Union[tuple(StatusBase.__subclasses__())]
reveal_type(Status)  # Type of "Status" is "Unknown"
print(Status)  # typing.Union[__main__.StatusA, __main__.StatusB]

The type-checker is not able to infer the type of Status, because it's only compiled at runtime.

Proposed feature

I propose a new feature that would allow the type-checker to infer a dynamic union of all StatusBase subclasses, and finally error if more subclasses are declared afterward.

from typing import DynamicUnion  # Currently not a thing

Status = DynamicUnion[StatusBase]
reveal_type(Status)  # Type of "Status" is "type[StatusA] | type[StatusB]"
print(Status)  # typing.Union[__main__.StatusA, __main__.StatusB]

(Maybe call it SubclassUnion instead of DynamicUnion?)

The type-checker should detect if a subclass of StatusBase is declared after Status, and raise an error.

Status = DynamicUnion[StatusBase]

class StatusC(StatusBase):  # Error: Subclass of StatusBase declared after declaration of DynamicUnion[StatusBase]
    status: Literal['c']

This could work like the @sealed decorator was proposed to work in the Sealed Typing PEP draft. More useful information found on this email chain.

The @sealed decorator was designed as part of PEP 622 (structural pattern matching). Think of it as confining a type and its subclasses to a single module, so that the type-checker can dynamically infer a union of the subclasses. These are also known as algebraic data types.

I'd like to also explore the possibility of scoping algebraic data types to a package, or a dynamic scope bound between the parent class declaration and the union declaration.

2️⃣ Discriminated unions

Current state

For lack of a formalization of discriminated unions in core python, data (de)serialization libraries turn to their own objects. For example, pydantic specifies a discriminator value with Field, and uses Annotated to tag the union:

from typing import Annotated
from pydantic import Field

AnnotatedStatus = Annotated[
    Status,
    Field(discriminator='status')
]

This does not raise an error at runtime if any member of the Status union does not have a status field UNTIL AnnotatedStatus is used as a type on the field of a pydantic model declaration.

Annotated is an "escape hatch" that allows clients to express things that type-checkers
might not understand
.

However, there is a push by third-party libraries for objects in the typing module that would canonically be used with the Annotated type. It signals a willingness to move from redundant proprietary internals toward ubiquitous core python. See PEP 727 and related discussion.

Proposed feature

I propose a new feature that would allow the type-checker to infer the type of AnnotatedStatus, by declaring it as a dynamic discriminated union:

from typing import DiscriminatedUnion  # Currently not a thing

AnnotatedStatus = DiscriminatedUnion[Status, 'status']
reveal_type(AnnotatedStatus)  # Type of "AnnotatedStatus" is "type[StatusA] | type[StatusB]"
print(AnnotatedStatus)  # typing.Union[__main__.StatusA, __main__.StatusB]

This would grant data (de)serialization libraries unified plumbing to declare discriminated unions, and would allow the type-checker to throw an error at static-analysis time if any member of the Status union does not have a status field, or if any member of the Status union has a status field with a value that is not unique among all members of the union.

The latter hinges on type-checkers being able to implicitly detect disjoint unions. See pyright#5933 for more information on mutable field invariance and implicit disjoint unions.

Motivation

When I'm writing JSONSchemas or deserializing in event-driven architectures, and I'm iterating fast, I gravitate toward easily being able to add new members to a union.

I usually do this in one of two ways. Either:

  • all in one file, declaring parent, subclasses, then the union (as shown in the example above), or
  • all in one directory, declaring the parent in base.py, each of the subclasses in a separate file, and the union in the module’s __init__.py

While this approach feels more magical than maintaining a static union yourself, it facilitates an accessible plug-in architecture, where users can implement new subclasses of the parent class in a new file, and have it automatically become part of the union without deeply understanding the internals of the library.

@irgolic irgolic added the topic: feature Discussions about new features for Python's type annotations label Sep 13, 2023
@erictraut
Copy link
Collaborator

This concept was discussed at some length in this thread.

@irgolic
Copy link
Author

irgolic commented Sep 14, 2023

Thanks for the link @erictraut! A few thoughts from reading that thread:

The @sealed decorator was faced with backlash

It seems like the consensus in the thread was that @sealed should not exist in its proposed fashion, namely because there's no precedent for restricting type behavior to a module.

The decorator was inspired by a couple of different languages:

Scala

Scala has sealed types.

sealed trait Device
case class Phone(model: String) extends Device:
  def screenOff = "Turning screen off"

case class Computer(model: String) extends Device:
  def screenSaverOn = "Turning screen saver on..."


def goIdle(device: Device): String = device match
  case p: Phone => p.screenOff
  case c: Computer => c.screenSaverOn

It throws an error if extending the type from another file,

case class Telepathy(message: String) extends Notification
           ^
        Cannot extend sealed trait Notification in a different source file

and ALSO throws an error on non-exhaustive pattern matching of the type (like match's assert_never trick).

def showNotification(notification: Notification): String =
  notification match
    case Email(sender, title, _) =>
      s"You got an email from $sender with title: $title"
    case SMS(number, message) =>
      s"You got an SMS from $number! Message: $message"
match may not be exhaustive.

It would fail on pattern case: VoiceRecording(_, _)
Kotlin

Kotlin has sealed classes and interfaces.

sealed interface Error

sealed class IOError(): Error

class FileReadError(val file: File): IOError()
class DatabaseError(val source: DataSource): IOError()

object RuntimeError : Error

The key difference between Scala sealed types and Kotlin sealed classes is that Kotlin allows the subclasses to be scoped to a PACKAGE, not just a file.

They describe the key benefit of using these as removing the need for an else clause in their when expression:

fun log(e: Error) = when(e) {
    is FileReadError -> { println("Error while reading file ${e.file}") }
    is DatabaseError -> { println("Error while reading from database ${e.source}") }
    is RuntimeError ->  { println("Runtime error") }
    // the `else` clause is not required because all the cases are covered
}

Though Kotlin doesn't have unions, sealed classes are the closest thing they have to them. See Union types discussion

Non-exhaustiveness match check is already supported

There's a neat trick that supports writing a pattern match over a union, and making sure it's exhaustive at static analysis time:

match status:
    case StatusA(s=status):
        ...
    case StatusB(s=status):
        ...
    case _ as x:
        assert_never(x)

If a member gets added to the union and is not handled in the match block, a type error is raised.

This is great, my only concern here is that it's an obscure "trick". Not sure about python's philosophy behind the match statement, but perhaps we could do something similar to Kotlin, and infer the assert_never default case if matching on a DynamicUnion?

TypeScript parity

For completeness, see also what TypeScript does:

TypeScript

You can define a discriminated union of interfaces.

interface Circle {
  kind: "circle";
  radius: number;
}
 
interface Square {
  kind: "square";
  sideLength: number;
}
 
type Shape = Circle | Square;

The exhaustiveness check happens similarly to python's assert_never:

function getArea(shape: Shape) {
  switch (shape.kind) {
    case "circle":
      return Math.PI * shape.radius ** 2;
    case "square":
      return shape.sideLength ** 2;
    default:
      const _exhaustiveCheck: never = shape;
      return _exhaustiveCheck;
  }
}

However, TypeScript also allows you to define discriminated unions more functionally:

type Shape =
  | { kind: "circle"; radius: number }
  | { kind: "square"; x: number }
  | { kind: "triangle"; x: number; y: number };

This looks clean and elegant, but I don't think we can use it as inspiration for a base class/union solution. It does highlight a need for easier syntax in defining unions.

It seems like manually declaring a union with all its members and using assert_never for exhaustive matching is pretty good already, and TypeScript supports a similar system.

Where python lacks parity with TypeScript is an easier syntax for writing discriminated unions. Herein lies one of the core differences between python and TypeScript – the latter has anonymous types as first-class citizens.

type Shape =
  | { kind: "circle"; radius: number }
  | { kind: "square"; x: number }
  | { kind: "triangle"; x: number; y: number };

This syntax makes sense in the context of TypeScript, but not in the current form of python. Depending on how inline syntax for TypedDict resolves, this may become viable.

However, I believe DynamicUnion and DiscriminatedUnion are more pythonic; they would simplify, formalize, and unify already established patterns in the python ecosystem.

Let me know what you think, or if you believe we should move this discussion to another platform like the mailing list 🙏 Thanks

@erictraut
Copy link
Collaborator

Pyright already supports an option called reportMatchNotExhaustive. When this option is enabled, pyright reports non-exhaustive match statements without the need for an assert_never. Mypy hasn't yet implemented such an option, but it is an open feature request.

# pyright: reportMatchNotExhaustive=true

from typing import Literal
from pydantic import BaseModel

class StatusBase(BaseModel):
    status: str

class StatusA(StatusBase):
    status: Literal["a"]

class StatusB(StatusBase):
    status: Literal["b"]

class StatusC(StatusBase):
    status: Literal["c"]

Status = StatusA | StatusB | StatusC

def func(status: Status) -> None:
    match status: # Pyright error: Cases within match statement do not handle all values; Unhandled type: "StatusC"
        case StatusA():
            ...
        case StatusB():
            ...

Pyright also supports experimental support for inline syntax for TypedDict. I added this support to inform the discussion about this feature. I recommend not taking a dependency on it because it's likely to change or be removed depending on the final resolution of this topic, but you can play with it to get a sense for what this approach might feel like. Here's what your example would look like with inlined TypedDict support and PEP 695 syntax.

# pyright: enableExperimentalFeatures=true

from typing import Literal

type Shape = (
  dict[{ "kind": Literal["circle"], "radius": float }] |
  dict[{ "kind": Literal["square"], "x": float }] |
  dict[{ "kind": Literal["triangle"], "x": float, "y": float }]
)

@JelleZijlstra
Copy link
Member

Non-exhaustiveness match check is already supported as of python3.11

Just for completeness, there's really nothing new in Python 3.11 here. We added typing.assert_never in 3.11, but (a) you can use it on any other supported Python version with typing_extensions and (b) it's not treated specially by type checkers, you can write def assert_never(x: NoReturn) -> NoReturn: in your own code and it will work the same.

What we did in 3.11 (put this in typing) does help address your point that this is an "obscure trick". Hopefully now that it is in typing it will be more discoverable.

@irgolic
Copy link
Author

irgolic commented Sep 23, 2023

Sorry for the delay in my reply :)

Thanks @JelleZijlstra , I thought match was necessary for the trick but now I'm realizing you could probably achieve the same with an if/elif chain. Great job on getting assert_never in, it's definitely nicer to have standardized syntax for this over rolling your own NoReturn sentinel.

And thanks @erictraut for reportMatchNonExhaustive and the experimental inline TypedDict unions. Was the type keyword preceding Shape on line 5 a typo or intended? Interested to see how people adopt it, and how it complements unions of objects. Similar to protocols, just for properties, if I understand correctly?

I thought to document python/TypeScript parity regarding this feature because it felt sensible given the comparison to Scala and Kotlin. However, my use case extends beyond TypeScript-style TypedDict unions.

The same with a metaclass

Hoping to better illustrate my use case by highlighting an alternate solution...

I was taught that you should really only be using a metaclass if you're:

  • refactoring class attributes
  • registering class declarations

Is there a canonical metaclass implementation that registers class declarations of a base class? Any time I did it, it looked something like this:

classes = {}


class MyMetaclass(type):
    def __new__(meta, name, bases, attrs):
        cls = super().__new__(meta, name, bases, attrs)
        if name != "MyBaseClass":
            classes[name] = cls
        return cls


class MyBaseClass(metaclass=MyMetaclass):
    pass


class A(MyBaseClass):
    pass


class B(MyBaseClass):
    pass


print(classes)  # {'A': <class '__main__.A'>, 'B': <class '__main__.B'>}

As described in the OP motivation, and as implemented in Kotlin (see dropdown), this paradigm allows for declaring subclasses in separate files.

And the subclasses need not only be dataclasses, they can fill abstract methods too. It’s an easily accessible way to add new functionality in a plugin-based architecture, without

Otherwise…

Is the only suggested pattern for fulfilling this use case to manually define the Union?

If there's no way to statically detect a union of a dynamic collection of objects, neither as a union of subclasses, nor as registered by a metaclass, could there be a way to raise an error at static-analysis time if a subclass is declared, but not written down as part of a union's definition? Ideally I would also like to have the ability to raise an error if a subclass is declared outside of a specified file or package, but given how the @sealed decorator flopped, I'm not sure that would be accepted.

@erictraut
Copy link
Collaborator

Is the only suggested pattern for fulfilling this use case to manually define the Union?

That's the pattern I recommend. I use it extensively in TypeScript and Python, and I find that it works great, especially when paired with exhaustion verification techniques.

Any mechanism that involves dynamic registration is going to be a problem for static type analysis, especially if those registrations can be performed anywhere by any code. This is why, for example, ABC.register is not supported by any Python type checkers.

@irgolic
Copy link
Author

irgolic commented Sep 27, 2023

Alright, thanks for the confirmation.

Regarding the DiscriminatedUnion proposal, given that PEP 727 got added to typing_extensions, would it make sense to add something similar for annotating a union type as discriminated? It would allow type-checkers to throw an error on the line of declaration instead of raising an exception at run-time if the type is used as part of a model.
Though I'm not a maintainer of a data serialization library, really it would be nice to hear from @tiangolo or @samuelcolvin if they think it would be useful.

Unrelated sidenote regarding inlined typed dicts
# pyright: enableExperimentalFeatures=true

from typing import Literal

type Shape = (
dict[{ "kind": Literal["circle"], "radius": float }] |
dict[{ "kind": Literal["square"], "x": float }] |
dict[{ "kind": Literal["triangle"], "x": float, "y": float }]
)

This will ultimately encourage errors similar to those inferred by pylyzer to come to light.

image

Are you planning to add similar inference capabilities to pyright?

@samuelcolvin
Copy link

1️⃣ Dynamic Unions

I'm not sure your proposal makes much sense to me, surely you should just make StatusBase an ABC or a protocol - no need for a union.

2️⃣ Discriminated unions

👍 👍 👍 👍

Yes we would love this, I haven't (yet) read through the whole of this issue, let me know if you need me to read it thoroughly, and or respond to specific points.

I actually asked for it at the typing summit at pycon, in that case it was actually to help mypy's performance - see python/mypy#14034. But it would also be very helpful in pydantic itself.

A few unordered thoughts:

  • we're currently working on a dedicated Discriminator marker type, to be used as in foobar: Annotated[Union[Foo, Bar], Discriminator('x')], see Proper support for functional discriminators pydantic/pydantic#7462
  • as per that issue, we will also want Discriminator to be able to take: a function, a tuple/list (indicating the path to the value to use as the discriminator), or AliasPath / AliasChoice which do the same thing more explicitly - obviously this won't be any help to type checkers, I guess in this case they would just need to fallback to the normal union logic
  • if there isn't willingness to add DiscriminatedUnion to typing-extensions yet, I would guess @adriangb could be persuaded to add Discriminator to annotated-types - this might be the best first step?

@JelleZijlstra
Copy link
Member

The policy of typing-extensions is to add new objects when there is a PEP for them. We'll add DiscriminatedUnion if and when there is such a PEP.

I haven't thought about DiscriminatedUnion much recently, but I saw Samuel's talk at PyCon and as I recall it wasn't clear to me how this feature would affect type checker behavior, since a sufficiently strong type checker could already discriminate unions this way, even without the annotations. If a PEP is going to propose DiscriminatedUnion, it will have to be clear about how this new construct is different from Union from a type checker's perspective.

@adriangb
Copy link

I agree with both @samuelcolvin and @JelleZijlstra: I think Pydantic would benefit from a shared Discriminator marker for Annotated that works 3ith other libraries, but (1) Pydantic supports much more complex things than type checkers would (callables, the AliasPath stuff) and (2) like Jelle said it's unclear what benefit this would have for type checkers aside from performance (and in the end mypy solved the issue without this; I'm not a type checker author so I can't say if it would help but they all seem to think it won't so I'll have to go with that).

So overall I don't see a huge benefit to adding anything to typing{_extensions} or annotated-types.

@irgolic
Copy link
Author

irgolic commented Sep 27, 2023

DiscriminatedUnion mostly made sense to me in the context of this proposal (including DynamicUnion). If there's no interest for DynamicUnion, the Discriminator marker type to be used with Annotated seems more elegant.

From a type-checking perspective, I'd just want for there to be an error raised upon declaring the type marked as discriminated.

Pass
class A(BaseModel):
    prop: str

class B(BaseModel):
    prop: str

MyUnion = Union[A, B]

MyDiscriminatedUnion = Annotated[MyUnion, Discriminator("prop")]
Fail
class A(BaseModel):
    prop: str

class B(BaseModel):
    prop: str
    
class C(BaseModel):
    not_prop: str

MyUnion = Union[A, B, C]

MyDiscriminatedUnion = Annotated[MyUnion, Discriminator("prop")]  # static analysis error raised here

Not sure what the precedent is on static analysis over marker types. Sounds like a great avenue for community plugins.

By the way, why did doc go into typing_extensions and not into annotated-types?

@samuelcolvin
Copy link

Your examples would fail at runtime with Pydantic I believe, if that helps.

By the way, why did doc go into typing_extensions and not into annotated-types?

Because it had a PEP.

@samskiter
Copy link

samskiter commented Oct 30, 2023

Just to throw some more cents in here (and while we're comparing to other languages). I come from Swift programming where 'enum' acts as a really powerful discriminated union. It's the number 1 feature I miss from Swift as it allows you to be super expressive and type safe with return types, data etc etc.

In Swift a discriminated union is done with an 'enum with associated value'. I think it's quite an intuitive way to represent the feature as we're already familiar with the idea of an Enum having a limited number of options and 'switching' across them (I see a bunch of thoughts about about using match in Python which I think is perfect for this concept...).

Example:

enum Barcode {
    case upc(Int, Int, Int, Int)
    case qrCode(String)
}
var productBarcode = Barcode.upc(8, 85909, 51226, 3)

switch productBarcode {
case .upc(let numberSystem, let manufacturer, let product, let check): // here the .upc union is matched and it's inner values are assigned out to the 4 new variables
    print("UPC: \(numberSystem), \(manufacturer), \(product), \(check).")
case .qrCode(let productCode):
    print("QR code: \(productCode).")
}

Multiple values in a case are like having a tuple in python:

Barcode = tuple[int, int, int, int] | String

What's interesting is the ability to mix a 'valueless' enum with those with a value:

enum BlogPostStatus {
    case draft
    case published(Date)
    case removed(String)
}

...the ability to have the same type discriminated:

enum SomeStringRequestResponse {
    case success(String)
    case failure(String)
}

.... the abiility to name the values for better readability:

enum Pizza {

  case small(inches: Int) // without 'inches' here, you might be left guessing what 'int' means
}

We could improve Barcode readabilty like this:

enum Barcode {
    case upc(numberSystem: Int, manufacturer: Int, product: Int, check: Int)
    case qrCode(String)
}

And to do more complex matching in a switch statement using the 'where' clause:

func handle(_ error: Error) {
    switch error {
    // Matching against a group of offline-related errors:
    case URLError.notConnectedToInternet,
         URLError.networkConnectionLost,
         URLError.cannotLoadFromNetwork:
        showOfflineView()
    // Matching against a specific error:
    case let error as HTTPError where error == .unauthorized:
        logOut()
    // Matching against our networking error type:
    case is HTTPError:
        showNetworkErrorView()
    // Fallback for other kinds of errors:
    default:
        showGenericErrorView(for: error)
    }
}

Not to mention all the usual niceties of encapsulation that an Enum provides (utility functions etc):

enum CustomTextFieldTypes {

  case cardType
  case cardNumber
  case cardExpiryDate
  case cardName
  case ccvNumber

  func inputCardNumber(cardNumber: String!, cardNumberTextField: XCUIElement?) {
    if case .cardNumber = self {
        cardNumberTextField?.typeText(cardNumber)
    }
  }
}

These are all 1st class language features and have made using other languages feel a little jarring/limiting for me and others I know - either indicating the Swift enum has become a crutch or something very useful... ;)

I'm not knowledgeable or involved enough with the Python language to think about writing a PEP or contributing to implementation or understand the nuances of how proposals interact with other language features, but as a 'user' this thread makes me excited for the direction of the language and I wanted to show the Swift perspective to hopefully act as inspiration/conversation :)

@irgolic
Copy link
Author

irgolic commented Jan 13, 2024

How Documentation Metadata in Typing resolves will likely inform the resolution for this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: feature Discussions about new features for Python's type annotations
Projects
None yet
Development

No branches or pull requests

6 participants