From 791f6451774bc15a0dbf71d5229ae118d1470199 Mon Sep 17 00:00:00 2001 From: Tim Chevalier Date: Tue, 4 Jun 2024 15:46:34 +0200 Subject: [PATCH 1/7] DESIGN: Add a sequel to the design doc on function composition This document sketches out some alternatives for the machinery provided to enable function composition. The goal is to provide an exhaustive list of alternatives. --- exploration/function-composition-part-2.md | 394 +++++++++++++++++++++ 1 file changed, 394 insertions(+) create mode 100644 exploration/function-composition-part-2.md diff --git a/exploration/function-composition-part-2.md b/exploration/function-composition-part-2.md new file mode 100644 index 000000000..d36656943 --- /dev/null +++ b/exploration/function-composition-part-2.md @@ -0,0 +1,394 @@ +# Function Composition - Part 2 + +Status: **Proposed** + +
+ Metadata +
+
Contributors
+
@catamorphism
+
First proposed
+
2024-06-xx
+
Pull Requests
+
#000
+
+
+ +## Objective + +_What is this proposal trying to achieve?_ + +[Part 1](https://github.com/unicode-org/message-format-wg/blob/main/exploration/function-composition-part-1.md) of this document +explained ambiguities in the existing spec +when it comes to function composition. + +The goal of this document is to present a _complete_ list of +alternatives that may be considered by the working group. + +Each alternative corresponds to a different concrete +definition of "resolved value". + +This document is meant to logically precede +[the "Data Flow for Composable Functions" design document](https://github.com/catamorphism/message-format-wg/blob/79ceb57fa305204f26c6635fd586d0e3057cf460/exploration/dataflow-composability.md). +Once an alternative from this document is chosen, +then that document will be revised. + +## Background + +See https://github.com/unicode-org/message-format-wg/blob/main/exploration/function-composition-part-1.md for more details. + +Depending on the chosen semantics for composition, +functions can either "pipeline the input" (preservation model) or +"operate on the output" (formatted value model), +or both. + +Also, depending on the chosen functions, resolved options +might or might not be part of the value returned +by a function implementation. + +This suggests several alternatives: +1. Pipeline input, but don't pass along options +2. Pipeline input and pass along options +3. Don't pipeline input (one function operates on the output of another) but do pass along options (is this useful?) +4. Don't pipeline input and don't pass along options + +Options 1 and 3 do not seem useful. +This document presents options 2 and 4, and a few variations on them. + +Not addressed here: the behavior of compositions of built-in functions +(but the choice here will determine what behaviors are possible). + +Not addressed here: the behavior of compositions of custom functions +(which is up to the custom function implementor). + +## Requirements + +A message that has a valid result in one implementation +should not result in an error in a different implementation. + +## Constraints + +One prior decision is that the same definition of +"resolved value" appears in multiple places in the spec. +If "resolved value" is defined broadly enough +(an annotated value with rich metadata), +then this prior decision need not be changed. + +A second constraint is +the difficulty of developing a precise definition of "resolved value" +that can be made specific in the interface for custom functions, +which is implementation-language-neutral. + +A third constraint is the "typeless" nature of the existing MessageFormat spec. +The idea of specifying which functions are able to compose with each other +resembles the idea of specifying a type system for functions. +Specifying rules for function composition, while also remaining typeless, +seems difficult and potentially unpredictable. + +## Introducing type names + +It's useful to be able to refer to two types: + +* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). +* `ValueType`: This type encompasses strings, numbers, date/time values, +all other possible implementation-specific types that input variables can be +assigned to, +and all possible implementation-specific types that custom and built-in +functions can construct. +Conceptually it's the union of an "input type" and a "formatted value". + +It's tagged with a string tag so functions can do type checks. + +``` +interface ValueType { + type(): string + value(): unknown +} +``` + +## Alternatives to consider + +In lieu of the usual "Proposed design" and "Alternatives considered" sections, +we offer some alternatives already considered in separate discussions. + +Because of our constraints, implementations are **not required** +to use the `MessageValue` interface internally as described in +any of the sections. +The purpose of defining the interface is to guide implementors. +An implementation that uses different types internally +but allows the same observable behavior for composition +is compliant with the spec. + +Five alternatives are presented: +1. Typed functions +2. Formatted value model +3. Preservation model +4. Allow both kinds of composition +5. Don't allow composition + +Alternatives 2 and 3 should be familiar to readers of part 1. +Alternative 4 is an idea from a prior mailing list discussion +of this problem. Alternative 1 is similar to Alternative 3 +but introduces additional notation to make composition +easier to think about (which is why it's presented first). +Alternative 5 is included for completeness. + +### Typed functions + +The following option aims to provide a general mechanism +for custom function authors +to specify how functions compose with each other. + +This is an extension of the "preservation model" +from part 1 of this document. + +Here, `ValueType` is the most general type +in a system of user-defined types. +Using the function registry, +each custom function could declare its own argument type +and result type. + +This does not imply the existence of any static typechecking. +A function passed the wrong type could signal a runtime error. +This does require some mechanism for dynamically inspecting +the type of a value. + +Consider Example B1 from part 1 of the document: + +Example B1: +``` + .local $age = {$person :getAge} + .local $y = {$age :duration skeleton=yM} + .local $z = {$y :uppercase} +``` + +Informally, we can write the type signatures for +the three custom functions in this example: + +``` +getAge : Person -> Number +duration : Number -> String +uppercase : String -> String +``` + +`Number` and `String` are assumed to be subtypes +of `MessageValue`. Thus, + +The [function registry data model](https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md) +attempts to do some of this, but does not define +the structure of the values produced by functions. + +An optional static typechecking pass (linting) +would then detect any cases where functions are composed in a way that +doesn't make sense. For example: + +Semantically invalid example: +``` +.local $z = {$person: uppercase} +``` + +A person can't be converted to uppercase; or, `:uppercase` expects +a `String`, not a `Person`. So an optional tool could flag this +as an error, assuming that enough type information +was included in the registry. + +The resolved value type is similar to what was proposed in +[PR 728](https://github.com/unicode-org/message-format-wg/pull/728/). + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getValue(): ValueType + properties(): { [key: string]: MessageValue } + selectKeys(keys: string[]): string[] +} +``` + +The `resolvedOptions()` method is renamed to `properties`. +This is to suggest that individual function implementations +may not pass all of the options through into the resulting +`MessageValue`. + +Instead of using `unknown` as the result type of `getValue()`, +we use `ValueType`, mentioned previously. +Instead of using `unknown` as the value type for the +`properties()` object, we use `MessageValue`, +since options can also be full `MessageValue`s with their own options. + +Because `ValueType` has a type tag, +custom function implementations can easily +signal dynamic errors if passed an operand of the wrong type. + +The advantage of this approach is documentation: +with type names that can be used in type signatures +specified in the registry, +it's easy for users to reason about functions and +understand which combinations of functions +compose with each other. + +### Formatted value model (Composition operates on output) + +This is an elaboration on the "formatted model" from part 1. + +A less general solution is to have a single "resolved value" +type, and specify that if function `g` consumes the resolved value +produced by function `f`, +then `g` operates on the output of `f`. + +``` + .local $x = {$num :number maxFrac=2} + .local $y = {$x :number maxFrac=5 padStart=3} +``` + +In this example, `$x` would be bound to the formatted result +of calling `:number` on `$num`. So the `maxFrac` option would +be "lost" and when determining the value of `$y`, the second +set of options would be used. + +For built-ins, it suffices to define `ValueType`as something like: + +``` +FormattedNumber | FormattedDateTime | String +``` + +because no information about the input needs to be +incorporated into the resolved value. + +However, to make it possible for custom functions to return +a wider set of types, a wider `ValueType` definition would be needed. + +The `MessageValue` definition would look as in #728, but without +the `resolvedOptions()` method: + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getValue(): ValueType + selectKeys(keys: string[]): string[] +} +``` + +`MessageValue` is effectively a `ValueType` with methods. + +Using this definition would make some of the use cases from part 1 +impractical. + +### Preservation model (composition can operate on input and options) + +This is an extension of +the "preservation model" from part 1, +if resolved options are included in the output. +This model can also be thought of as functions "pipelining" +the input through multiple calls. + +A JSON representation of an example resolved value might be: +``` +{ + input: { type: "number", value: 1 }, + output: { type: "FormattedNumber", value: FN } + properties: { "maximumFractionDigits": 2 } +} +``` + +(The number "2" is shown for brevity, but it would +actually be a `MessageValue` itself.) + +where `FN` is an instance of an implementation-specific +`FormattedNumber` type, representing the number 1. + +The resolved value interface would include both "input" +and "output" methods: + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getInput(): ValueType + getOutput(): ValueType + properties(): { [key: string]: MessageValue } + selectKeys(keys: string[]): string[] +} +``` + +Without a mechanism for type signatures, +it may be hard for users to tell which combinations +of functions compose without errors, +and for implementors to document that information +for users. + +### Allow both kinds of composition (with different syntax) + +By introducing new syntax, the same function could have +either "preservation" or "formatted value" behavior. + +Consider (this suggestion is from Elango Cheran): + +``` + .local $x = {$num :number maxFrac=2} + .pipeline $y = {$x :number maxFrac=5 padStart=3} + {{$x} {$y}} +``` + +If `$num` is `0.33333`, +then the result of formatting would be + +``` +0.33 000.33333 +``` + +An extra argument to function implementations, +`pipeline`, would be added. + +`.pipeline` would be a new keyword that acts like `.local`, +except that if its expression has a function annotation, +the formatter would pass in `true` for the `pipeline` +argument to the function implementation. + +The `resolvedOptions()` method should be ignored if `pipeline` +is `false`. + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getInput(): MessageValue + getOutput(): unknown + properties(): { [key: string]: MessageValue } + selectKeys(keys: string[]): string[] +} +``` + +### Don't allow composition for built-in functions + +Another option is to define the built-in functions this way, +notionally: + +``` +number : Number -> FormattedNumber +date : Date -> FormattedDate +``` + +Then it would be a runtime error to pass a `FormattedNumber` into `number` +or to pass a `FormattedDate` into `date`. + +The resolved value type would look like: + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getValue(): ValueType + selectKeys(keys: string[]): string[] +} +``` + +As with the formatted value model, this restricts the +behavior of custom functions. + +### Non-alternative: Allow composition in some implementations + +Allow composition only if the implementation requires functions to return a resolved value as defined in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). + +This violates the portability requirement. From d28e494d1c6aa2204b621ab8346349b46243fedd Mon Sep 17 00:00:00 2001 From: Tim Chevalier Date: Wed, 5 Jun 2024 10:47:34 +0200 Subject: [PATCH 2/7] Remove 'part 2' document and move contents to the end of part 1 --- exploration/function-composition-part-1.md | 332 ++++++++++++++++- exploration/function-composition-part-2.md | 394 --------------------- 2 files changed, 315 insertions(+), 411 deletions(-) delete mode 100644 exploration/function-composition-part-2.md diff --git a/exploration/function-composition-part-1.md b/exploration/function-composition-part-1.md index ca392386f..347e43b87 100644 --- a/exploration/function-composition-part-1.md +++ b/exploration/function-composition-part-1.md @@ -838,7 +838,10 @@ so that functions can be passed the values they need. It also needs to provide a mechanism for declaring when functions can compose with each other. -Other requirements: +### Guarantee portability + +A message that has a valid result in one implementation +should not result in an error in a different implementation. ### Identify a set of use cases that must be supported @@ -975,26 +978,321 @@ Hence, revisiting the extensibility of the runtime model now that the data model is settled may result in a more workable solution. -## Proposed design and alternatives considered +## Alternatives to be considered + +The goal of this section is to present a _complete_ list of +alternatives that may be considered by the working group. + +Each alternative corresponds to a different concrete +definition of "resolved value". + +## Introducing type names + +It's useful to be able to refer to two types: + +* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). +* `ValueType`: This type encompasses strings, numbers, date/time values, +all other possible implementation-specific types that input variables can be +assigned to, +and all possible implementation-specific types that custom and built-in +functions can construct. +Conceptually it's the union of an "input type" and a "formatted value". -These sections are omitted from this document and will be added in -a future follow-up document, -given the length so far and need to agree on a common vocabulary. +It's tagged with a string tag so functions can do type checks. -We expect that any proposed design -would fall into one of the following categories: +``` +interface ValueType { + type(): string + value(): unknown +} +``` -1. Provide a general mechanism for custom function authors +## Alternatives to consider + +In lieu of the usual "Proposed design" and "Alternatives considered" sections, +we offer some alternatives already considered in separate discussions. + +Because of our constraints, implementations are **not required** +to use the `MessageValue` interface internally as described in +any of the sections. +The purpose of defining the interface is to guide implementors. +An implementation that uses different types internally +but allows the same observable behavior for composition +is compliant with the spec. + +Five alternatives are presented: +1. Typed functions +2. Formatted value model +3. Preservation model +4. Allow both kinds of composition +5. Don't allow composition + +Alternatives 2 and 3 were presented earlier in this document. +Alternative 4 is an idea from a prior mailing list discussion +of this problem. Alternative 1 is similar to Alternative 3 +but introduces additional notation to make composition +easier to think about (which is why it's presented first). +Alternative 5 is included for completeness. + +### Typed functions + +The following option aims to provide a general mechanism +for custom function authors to specify how functions compose with each other. -1. Specify composition rules for built-in functions, -but not in general, allowing custom functions -to cooperate in an _ad hoc_ way. -1. Recommend a rich representation of resolved values -without specifying any constraints on how these values -are used. -(This is the approach in [PR 645](https://github.com/unicode-org/message-format-wg/pull/645).) -1. Restrict function composition for built-in functions -(in order to prevent unintuitive behavior). + +This is an extension of the "preservation model" +from part 1 of this document. + +Here, `ValueType` is the most general type +in a system of user-defined types. +Using the function registry, +each custom function could declare its own argument type +and result type. + +This does not imply the existence of any static typechecking. +A function passed the wrong type could signal a runtime error. +This does require some mechanism for dynamically inspecting +the type of a value. + +Consider Example B1 from part 1 of the document: + +Example B1: +``` + .local $age = {$person :getAge} + .local $y = {$age :duration skeleton=yM} + .local $z = {$y :uppercase} +``` + +Informally, we can write the type signatures for +the three custom functions in this example: + +``` +getAge : Person -> Number +duration : Number -> String +uppercase : String -> String +``` + +`Number` and `String` are assumed to be subtypes +of `MessageValue`. Thus, + +The [function registry data model](https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md) +attempts to do some of this, but does not define +the structure of the values produced by functions. + +An optional static typechecking pass (linting) +would then detect any cases where functions are composed in a way that +doesn't make sense. For example: + +Semantically invalid example: +``` +.local $z = {$person: uppercase} +``` + +A person can't be converted to uppercase; or, `:uppercase` expects +a `String`, not a `Person`. So an optional tool could flag this +as an error, assuming that enough type information +was included in the registry. + +The resolved value type is similar to what was proposed in +[PR 728](https://github.com/unicode-org/message-format-wg/pull/728/). + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getValue(): ValueType + properties(): { [key: string]: MessageValue } + selectKeys(keys: string[]): string[] +} +``` + +The `resolvedOptions()` method is renamed to `properties`. +This is to suggest that individual function implementations +may not pass all of the options through into the resulting +`MessageValue`. + +Instead of using `unknown` as the result type of `getValue()`, +we use `ValueType`, mentioned previously. +Instead of using `unknown` as the value type for the +`properties()` object, we use `MessageValue`, +since options can also be full `MessageValue`s with their own options. + +Because `ValueType` has a type tag, +custom function implementations can easily +signal dynamic errors if passed an operand of the wrong type. + +The advantage of this approach is documentation: +with type names that can be used in type signatures +specified in the registry, +it's easy for users to reason about functions and +understand which combinations of functions +compose with each other. + +### Formatted value model (Composition operates on output) + +This is an elaboration on the "formatted model" from part 1. + +A less general solution is to have a single "resolved value" +type, and specify that if function `g` consumes the resolved value +produced by function `f`, +then `g` operates on the output of `f`. + +``` + .local $x = {$num :number maxFrac=2} + .local $y = {$x :number maxFrac=5 padStart=3} +``` + +In this example, `$x` would be bound to the formatted result +of calling `:number` on `$num`. So the `maxFrac` option would +be "lost" and when determining the value of `$y`, the second +set of options would be used. + +For built-ins, it suffices to define `ValueType`as something like: + +``` +FormattedNumber | FormattedDateTime | String +``` + +because no information about the input needs to be +incorporated into the resolved value. + +However, to make it possible for custom functions to return +a wider set of types, a wider `ValueType` definition would be needed. + +The `MessageValue` definition would look as in #728, but without +the `resolvedOptions()` method: + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getValue(): ValueType + selectKeys(keys: string[]): string[] +} +``` + +`MessageValue` is effectively a `ValueType` with methods. + +Using this definition would make some of the use cases from part 1 +impractical. + +### Preservation model (composition can operate on input and options) + +This is an extension of +the "preservation model" from part 1, +if resolved options are included in the output. +This model can also be thought of as functions "pipelining" +the input through multiple calls. + +A JSON representation of an example resolved value might be: +``` +{ + input: { type: "number", value: 1 }, + output: { type: "FormattedNumber", value: FN } + properties: { "maximumFractionDigits": 2 } +} +``` + +(The number "2" is shown for brevity, but it would +actually be a `MessageValue` itself.) + +where `FN` is an instance of an implementation-specific +`FormattedNumber` type, representing the number 1. + +The resolved value interface would include both "input" +and "output" methods: + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getInput(): ValueType + getOutput(): ValueType + properties(): { [key: string]: MessageValue } + selectKeys(keys: string[]): string[] +} +``` + +Without a mechanism for type signatures, +it may be hard for users to tell which combinations +of functions compose without errors, +and for implementors to document that information +for users. + +### Allow both kinds of composition (with different syntax) + +By introducing new syntax, the same function could have +either "preservation" or "formatted value" behavior. + +Consider (this suggestion is from Elango Cheran): + +``` + .local $x = {$num :number maxFrac=2} + .pipeline $y = {$x :number maxFrac=5 padStart=3} + {{$x} {$y}} +``` + +If `$num` is `0.33333`, +then the result of formatting would be + +``` +0.33 000.33333 +``` + +An extra argument to function implementations, +`pipeline`, would be added. + +`.pipeline` would be a new keyword that acts like `.local`, +except that if its expression has a function annotation, +the formatter would pass in `true` for the `pipeline` +argument to the function implementation. + +The `resolvedOptions()` method should be ignored if `pipeline` +is `false`. + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getInput(): MessageValue + getOutput(): unknown + properties(): { [key: string]: MessageValue } + selectKeys(keys: string[]): string[] +} +``` + +### Don't allow composition for built-in functions + +Another option is to define the built-in functions this way, +notionally: + +``` +number : Number -> FormattedNumber +date : Date -> FormattedDate +``` + +Then it would be a runtime error to pass a `FormattedNumber` into `number` +or to pass a `FormattedDate` into `date`. + +The resolved value type would look like: + +```ts +interface MessageValue { + formatToString(): string + formatToX(): X // where X is an implementation-defined type + getValue(): ValueType + selectKeys(keys: string[]): string[] +} +``` + +As with the formatted value model, this restricts the +behavior of custom functions. + +### Non-alternative: Allow composition in some implementations + +Allow composition only if the implementation requires functions to return a resolved value as defined in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). + +This violates the portability requirement. ## Acknowledgments diff --git a/exploration/function-composition-part-2.md b/exploration/function-composition-part-2.md deleted file mode 100644 index d36656943..000000000 --- a/exploration/function-composition-part-2.md +++ /dev/null @@ -1,394 +0,0 @@ -# Function Composition - Part 2 - -Status: **Proposed** - -
- Metadata -
-
Contributors
-
@catamorphism
-
First proposed
-
2024-06-xx
-
Pull Requests
-
#000
-
-
- -## Objective - -_What is this proposal trying to achieve?_ - -[Part 1](https://github.com/unicode-org/message-format-wg/blob/main/exploration/function-composition-part-1.md) of this document -explained ambiguities in the existing spec -when it comes to function composition. - -The goal of this document is to present a _complete_ list of -alternatives that may be considered by the working group. - -Each alternative corresponds to a different concrete -definition of "resolved value". - -This document is meant to logically precede -[the "Data Flow for Composable Functions" design document](https://github.com/catamorphism/message-format-wg/blob/79ceb57fa305204f26c6635fd586d0e3057cf460/exploration/dataflow-composability.md). -Once an alternative from this document is chosen, -then that document will be revised. - -## Background - -See https://github.com/unicode-org/message-format-wg/blob/main/exploration/function-composition-part-1.md for more details. - -Depending on the chosen semantics for composition, -functions can either "pipeline the input" (preservation model) or -"operate on the output" (formatted value model), -or both. - -Also, depending on the chosen functions, resolved options -might or might not be part of the value returned -by a function implementation. - -This suggests several alternatives: -1. Pipeline input, but don't pass along options -2. Pipeline input and pass along options -3. Don't pipeline input (one function operates on the output of another) but do pass along options (is this useful?) -4. Don't pipeline input and don't pass along options - -Options 1 and 3 do not seem useful. -This document presents options 2 and 4, and a few variations on them. - -Not addressed here: the behavior of compositions of built-in functions -(but the choice here will determine what behaviors are possible). - -Not addressed here: the behavior of compositions of custom functions -(which is up to the custom function implementor). - -## Requirements - -A message that has a valid result in one implementation -should not result in an error in a different implementation. - -## Constraints - -One prior decision is that the same definition of -"resolved value" appears in multiple places in the spec. -If "resolved value" is defined broadly enough -(an annotated value with rich metadata), -then this prior decision need not be changed. - -A second constraint is -the difficulty of developing a precise definition of "resolved value" -that can be made specific in the interface for custom functions, -which is implementation-language-neutral. - -A third constraint is the "typeless" nature of the existing MessageFormat spec. -The idea of specifying which functions are able to compose with each other -resembles the idea of specifying a type system for functions. -Specifying rules for function composition, while also remaining typeless, -seems difficult and potentially unpredictable. - -## Introducing type names - -It's useful to be able to refer to two types: - -* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). -* `ValueType`: This type encompasses strings, numbers, date/time values, -all other possible implementation-specific types that input variables can be -assigned to, -and all possible implementation-specific types that custom and built-in -functions can construct. -Conceptually it's the union of an "input type" and a "formatted value". - -It's tagged with a string tag so functions can do type checks. - -``` -interface ValueType { - type(): string - value(): unknown -} -``` - -## Alternatives to consider - -In lieu of the usual "Proposed design" and "Alternatives considered" sections, -we offer some alternatives already considered in separate discussions. - -Because of our constraints, implementations are **not required** -to use the `MessageValue` interface internally as described in -any of the sections. -The purpose of defining the interface is to guide implementors. -An implementation that uses different types internally -but allows the same observable behavior for composition -is compliant with the spec. - -Five alternatives are presented: -1. Typed functions -2. Formatted value model -3. Preservation model -4. Allow both kinds of composition -5. Don't allow composition - -Alternatives 2 and 3 should be familiar to readers of part 1. -Alternative 4 is an idea from a prior mailing list discussion -of this problem. Alternative 1 is similar to Alternative 3 -but introduces additional notation to make composition -easier to think about (which is why it's presented first). -Alternative 5 is included for completeness. - -### Typed functions - -The following option aims to provide a general mechanism -for custom function authors -to specify how functions compose with each other. - -This is an extension of the "preservation model" -from part 1 of this document. - -Here, `ValueType` is the most general type -in a system of user-defined types. -Using the function registry, -each custom function could declare its own argument type -and result type. - -This does not imply the existence of any static typechecking. -A function passed the wrong type could signal a runtime error. -This does require some mechanism for dynamically inspecting -the type of a value. - -Consider Example B1 from part 1 of the document: - -Example B1: -``` - .local $age = {$person :getAge} - .local $y = {$age :duration skeleton=yM} - .local $z = {$y :uppercase} -``` - -Informally, we can write the type signatures for -the three custom functions in this example: - -``` -getAge : Person -> Number -duration : Number -> String -uppercase : String -> String -``` - -`Number` and `String` are assumed to be subtypes -of `MessageValue`. Thus, - -The [function registry data model](https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md) -attempts to do some of this, but does not define -the structure of the values produced by functions. - -An optional static typechecking pass (linting) -would then detect any cases where functions are composed in a way that -doesn't make sense. For example: - -Semantically invalid example: -``` -.local $z = {$person: uppercase} -``` - -A person can't be converted to uppercase; or, `:uppercase` expects -a `String`, not a `Person`. So an optional tool could flag this -as an error, assuming that enough type information -was included in the registry. - -The resolved value type is similar to what was proposed in -[PR 728](https://github.com/unicode-org/message-format-wg/pull/728/). - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getValue(): ValueType - properties(): { [key: string]: MessageValue } - selectKeys(keys: string[]): string[] -} -``` - -The `resolvedOptions()` method is renamed to `properties`. -This is to suggest that individual function implementations -may not pass all of the options through into the resulting -`MessageValue`. - -Instead of using `unknown` as the result type of `getValue()`, -we use `ValueType`, mentioned previously. -Instead of using `unknown` as the value type for the -`properties()` object, we use `MessageValue`, -since options can also be full `MessageValue`s with their own options. - -Because `ValueType` has a type tag, -custom function implementations can easily -signal dynamic errors if passed an operand of the wrong type. - -The advantage of this approach is documentation: -with type names that can be used in type signatures -specified in the registry, -it's easy for users to reason about functions and -understand which combinations of functions -compose with each other. - -### Formatted value model (Composition operates on output) - -This is an elaboration on the "formatted model" from part 1. - -A less general solution is to have a single "resolved value" -type, and specify that if function `g` consumes the resolved value -produced by function `f`, -then `g` operates on the output of `f`. - -``` - .local $x = {$num :number maxFrac=2} - .local $y = {$x :number maxFrac=5 padStart=3} -``` - -In this example, `$x` would be bound to the formatted result -of calling `:number` on `$num`. So the `maxFrac` option would -be "lost" and when determining the value of `$y`, the second -set of options would be used. - -For built-ins, it suffices to define `ValueType`as something like: - -``` -FormattedNumber | FormattedDateTime | String -``` - -because no information about the input needs to be -incorporated into the resolved value. - -However, to make it possible for custom functions to return -a wider set of types, a wider `ValueType` definition would be needed. - -The `MessageValue` definition would look as in #728, but without -the `resolvedOptions()` method: - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getValue(): ValueType - selectKeys(keys: string[]): string[] -} -``` - -`MessageValue` is effectively a `ValueType` with methods. - -Using this definition would make some of the use cases from part 1 -impractical. - -### Preservation model (composition can operate on input and options) - -This is an extension of -the "preservation model" from part 1, -if resolved options are included in the output. -This model can also be thought of as functions "pipelining" -the input through multiple calls. - -A JSON representation of an example resolved value might be: -``` -{ - input: { type: "number", value: 1 }, - output: { type: "FormattedNumber", value: FN } - properties: { "maximumFractionDigits": 2 } -} -``` - -(The number "2" is shown for brevity, but it would -actually be a `MessageValue` itself.) - -where `FN` is an instance of an implementation-specific -`FormattedNumber` type, representing the number 1. - -The resolved value interface would include both "input" -and "output" methods: - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getInput(): ValueType - getOutput(): ValueType - properties(): { [key: string]: MessageValue } - selectKeys(keys: string[]): string[] -} -``` - -Without a mechanism for type signatures, -it may be hard for users to tell which combinations -of functions compose without errors, -and for implementors to document that information -for users. - -### Allow both kinds of composition (with different syntax) - -By introducing new syntax, the same function could have -either "preservation" or "formatted value" behavior. - -Consider (this suggestion is from Elango Cheran): - -``` - .local $x = {$num :number maxFrac=2} - .pipeline $y = {$x :number maxFrac=5 padStart=3} - {{$x} {$y}} -``` - -If `$num` is `0.33333`, -then the result of formatting would be - -``` -0.33 000.33333 -``` - -An extra argument to function implementations, -`pipeline`, would be added. - -`.pipeline` would be a new keyword that acts like `.local`, -except that if its expression has a function annotation, -the formatter would pass in `true` for the `pipeline` -argument to the function implementation. - -The `resolvedOptions()` method should be ignored if `pipeline` -is `false`. - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getInput(): MessageValue - getOutput(): unknown - properties(): { [key: string]: MessageValue } - selectKeys(keys: string[]): string[] -} -``` - -### Don't allow composition for built-in functions - -Another option is to define the built-in functions this way, -notionally: - -``` -number : Number -> FormattedNumber -date : Date -> FormattedDate -``` - -Then it would be a runtime error to pass a `FormattedNumber` into `number` -or to pass a `FormattedDate` into `date`. - -The resolved value type would look like: - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getValue(): ValueType - selectKeys(keys: string[]): string[] -} -``` - -As with the formatted value model, this restricts the -behavior of custom functions. - -### Non-alternative: Allow composition in some implementations - -Allow composition only if the implementation requires functions to return a resolved value as defined in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). - -This violates the portability requirement. From 66640845ed772809ba41cea249d3d1c704ef5f3e Mon Sep 17 00:00:00 2001 From: Tim Chevalier Date: Wed, 5 Jun 2024 10:51:43 +0200 Subject: [PATCH 3/7] Revise introduction to reflect the changed goal --- exploration/function-composition-part-1.md | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/exploration/function-composition-part-1.md b/exploration/function-composition-part-1.md index 347e43b87..b0b4d83ec 100644 --- a/exploration/function-composition-part-1.md +++ b/exploration/function-composition-part-1.md @@ -14,19 +14,11 @@ Status: **Proposed** -## Objective +## Objectives -_What is this proposal trying to achieve?_ - -### Non-goal - -The objective of this design document is not to make -a concrete proposal, but rather to explore a problem space. -This space is complicated enough that agreement on vocabulary -is desired before defining a solution. - -Instead of objectives, we present a primary problem -and a set of subsidiary problems. +* Present a complete list of alternative designs for how to +provide the machinery for function composition. +* Create a shared vocabulary for discussing these alternatives. ### Problem statement: defining resolved values From 3c87cb4ec38399ecaf3cae0d6451cb1152fbecf9 Mon Sep 17 00:00:00 2001 From: Tim Chevalier Date: Wed, 5 Jun 2024 12:51:57 +0200 Subject: [PATCH 4/7] Edited for conciseness --- exploration/function-composition-part-1.md | 194 +++++---------------- 1 file changed, 40 insertions(+), 154 deletions(-) diff --git a/exploration/function-composition-part-1.md b/exploration/function-composition-part-1.md index b0b4d83ec..bdd020d66 100644 --- a/exploration/function-composition-part-1.md +++ b/exploration/function-composition-part-1.md @@ -1028,25 +1028,18 @@ Alternative 5 is included for completeness. ### Typed functions -The following option aims to provide a general mechanism -for custom function authors -to specify how functions compose with each other. +Types are a way for users of a language +to reason about the kinds of data +that functions can operate on. +The most ambitious solution is to specify +a type system for MessageFormat functions. -This is an extension of the "preservation model" -from part 1 of this document. - -Here, `ValueType` is the most general type +`ValueType` is the most general type in a system of user-defined types. Using the function registry, each custom function could declare its own argument type and result type. - This does not imply the existence of any static typechecking. -A function passed the wrong type could signal a runtime error. -This does require some mechanism for dynamically inspecting -the type of a value. - -Consider Example B1 from part 1 of the document: Example B1: ``` @@ -1055,8 +1048,9 @@ Example B1: .local $z = {$y :uppercase} ``` -Informally, we can write the type signatures for -the three custom functions in this example: +In an informal notation, +the three custom functions in this example +have the following type signatures: ``` getAge : Person -> Number @@ -1064,94 +1058,21 @@ duration : Number -> String uppercase : String -> String ``` -`Number` and `String` are assumed to be subtypes -of `MessageValue`. Thus, - The [function registry data model](https://github.com/unicode-org/message-format-wg/blob/main/spec/registry.md) -attempts to do some of this, but does not define -the structure of the values produced by functions. +could be extended to define `Number` and `String` +as subtypes of `MessageValue`. +A custom function author could use the custom +registry they define to define `Person` as +a subtype of `MessageValue`. An optional static typechecking pass (linting) would then detect any cases where functions are composed in a way that -doesn't make sense. For example: - -Semantically invalid example: -``` -.local $z = {$person: uppercase} -``` - -A person can't be converted to uppercase; or, `:uppercase` expects -a `String`, not a `Person`. So an optional tool could flag this -as an error, assuming that enough type information -was included in the registry. - -The resolved value type is similar to what was proposed in -[PR 728](https://github.com/unicode-org/message-format-wg/pull/728/). - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getValue(): ValueType - properties(): { [key: string]: MessageValue } - selectKeys(keys: string[]): string[] -} -``` - -The `resolvedOptions()` method is renamed to `properties`. -This is to suggest that individual function implementations -may not pass all of the options through into the resulting -`MessageValue`. - -Instead of using `unknown` as the result type of `getValue()`, -we use `ValueType`, mentioned previously. -Instead of using `unknown` as the value type for the -`properties()` object, we use `MessageValue`, -since options can also be full `MessageValue`s with their own options. - -Because `ValueType` has a type tag, -custom function implementations can easily -signal dynamic errors if passed an operand of the wrong type. - -The advantage of this approach is documentation: -with type names that can be used in type signatures -specified in the registry, -it's easy for users to reason about functions and -understand which combinations of functions -compose with each other. +doesn't make sense. The advantage of this approach is documentation. ### Formatted value model (Composition operates on output) -This is an elaboration on the "formatted model" from part 1. - -A less general solution is to have a single "resolved value" -type, and specify that if function `g` consumes the resolved value -produced by function `f`, -then `g` operates on the output of `f`. - -``` - .local $x = {$num :number maxFrac=2} - .local $y = {$x :number maxFrac=5 padStart=3} -``` - -In this example, `$x` would be bound to the formatted result -of calling `:number` on `$num`. So the `maxFrac` option would -be "lost" and when determining the value of `$y`, the second -set of options would be used. - -For built-ins, it suffices to define `ValueType`as something like: - -``` -FormattedNumber | FormattedDateTime | String -``` - -because no information about the input needs to be -incorporated into the resolved value. - -However, to make it possible for custom functions to return -a wider set of types, a wider `ValueType` definition would be needed. - -The `MessageValue` definition would look as in #728, but without +To implement the "formatted value" model, +the `MessageValue` definition would look as in [PR 728](https://github.com/unicode-org/message-format-wg/pull/728), but without the `resolvedOptions()` method: ```ts @@ -1165,31 +1086,13 @@ interface MessageValue { `MessageValue` is effectively a `ValueType` with methods. -Using this definition would make some of the use cases from part 1 +Using this definition would make some of the use cases impractical. -### Preservation model (composition can operate on input and options) - -This is an extension of -the "preservation model" from part 1, -if resolved options are included in the output. -This model can also be thought of as functions "pipelining" -the input through multiple calls. - -A JSON representation of an example resolved value might be: -``` -{ - input: { type: "number", value: 1 }, - output: { type: "FormattedNumber", value: FN } - properties: { "maximumFractionDigits": 2 } -} -``` +### Preservation model (Composition can operate on input and options) -(The number "2" is shown for brevity, but it would -actually be a `MessageValue` itself.) - -where `FN` is an instance of an implementation-specific -`FormattedNumber` type, representing the number 1. +In the preservation model, +functions "pipeline" the input through multiple calls. The resolved value interface would include both "input" and "output" methods: @@ -1205,6 +1108,18 @@ interface MessageValue { } ``` +Compared to PR 728: +The `resolvedOptions()` method is renamed to `properties`. +Individual function implementations +choose which options to pass through into the resulting +`MessageValue`. + +Instead of using `unknown` as the result type of `getValue()`, +we use `ValueType`, mentioned previously. +Instead of using `unknown` as the value type for the +`properties()` object, we use `MessageValue`, +since options can also be full `MessageValue`s with their own options. + Without a mechanism for type signatures, it may be hard for users to tell which combinations of functions compose without errors, @@ -1224,34 +1139,12 @@ Consider (this suggestion is from Elango Cheran): {{$x} {$y}} ``` -If `$num` is `0.33333`, -then the result of formatting would be - -``` -0.33 000.33333 -``` -An extra argument to function implementations, -`pipeline`, would be added. `.pipeline` would be a new keyword that acts like `.local`, except that if its expression has a function annotation, -the formatter would pass in `true` for the `pipeline` -argument to the function implementation. - -The `resolvedOptions()` method should be ignored if `pipeline` -is `false`. - -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getInput(): MessageValue - getOutput(): unknown - properties(): { [key: string]: MessageValue } - selectKeys(keys: string[]): string[] -} -``` +the formatter would apply the "preservation model" semantics +to the function. ### Don't allow composition for built-in functions @@ -1263,19 +1156,12 @@ number : Number -> FormattedNumber date : Date -> FormattedDate ``` -Then it would be a runtime error to pass a `FormattedNumber` into `number` -or to pass a `FormattedDate` into `date`. - -The resolved value type would look like: +The resolved value type would be the same as +in the formatted value model. -```ts -interface MessageValue { - formatToString(): string - formatToX(): X // where X is an implementation-defined type - getValue(): ValueType - selectKeys(keys: string[]): string[] -} -``` +The difference is that built-in functions +would not accept a "formatted result" +(would signal a runtime error in these cases). As with the formatted value model, this restricts the behavior of custom functions. From aaff56ccd1860a3b57ac4391dab96c42205e109e Mon Sep 17 00:00:00 2001 From: Tim Chevalier Date: Wed, 5 Jun 2024 12:54:20 +0200 Subject: [PATCH 5/7] Further edits for conciseness --- exploration/function-composition-part-1.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/exploration/function-composition-part-1.md b/exploration/function-composition-part-1.md index bdd020d66..7312164b4 100644 --- a/exploration/function-composition-part-1.md +++ b/exploration/function-composition-part-1.md @@ -1019,13 +1019,6 @@ Five alternatives are presented: 4. Allow both kinds of composition 5. Don't allow composition -Alternatives 2 and 3 were presented earlier in this document. -Alternative 4 is an idea from a prior mailing list discussion -of this problem. Alternative 1 is similar to Alternative 3 -but introduces additional notation to make composition -easier to think about (which is why it's presented first). -Alternative 5 is included for completeness. - ### Typed functions Types are a way for users of a language @@ -1139,8 +1132,6 @@ Consider (this suggestion is from Elango Cheran): {{$x} {$y}} ``` - - `.pipeline` would be a new keyword that acts like `.local`, except that if its expression has a function annotation, the formatter would apply the "preservation model" semantics From 34b5723b36afa971d2fe310e11e72ddf632e74da Mon Sep 17 00:00:00 2001 From: Tim Chevalier Date: Wed, 5 Jun 2024 13:03:20 +0200 Subject: [PATCH 6/7] Give a name to InputType and use it --- exploration/function-composition-part-1.md | 33 ++++++++++++++-------- 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/exploration/function-composition-part-1.md b/exploration/function-composition-part-1.md index 7312164b4..01afc7a5c 100644 --- a/exploration/function-composition-part-1.md +++ b/exploration/function-composition-part-1.md @@ -980,15 +980,13 @@ definition of "resolved value". ## Introducing type names -It's useful to be able to refer to two types: +It's useful to be able to refer to three types: -* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). -* `ValueType`: This type encompasses strings, numbers, date/time values, +* `InputType`: This type encompasses strings, numbers, date/time values, all other possible implementation-specific types that input variables can be -assigned to, -and all possible implementation-specific types that custom and built-in -functions can construct. -Conceptually it's the union of an "input type" and a "formatted value". +assigned to. The details are implementation-specific. +* `MessageValue`: The "resolved value" type; see [PR 728](https://github.com/unicode-org/message-format-wg/pull/728). +* `ValueType`: This type is the union of an `InputType` and a `MessageValue`. It's tagged with a string tag so functions can do type checks. @@ -1027,8 +1025,10 @@ that functions can operate on. The most ambitious solution is to specify a type system for MessageFormat functions. -`ValueType` is the most general type +In this solution, `ValueType` is not what is defined above, +but instead is the most general type in a system of user-defined types. +(The internal definitions are omitted.) Using the function registry, each custom function could declare its own argument type and result type. @@ -1087,6 +1087,15 @@ impractical. In the preservation model, functions "pipeline" the input through multiple calls. +The `ValueType` definition is different: + +```ts +interface ValueType { + type(): string + value(): InputType | MessageValue +} +``` + The resolved value interface would include both "input" and "output" methods: @@ -1096,7 +1105,7 @@ interface MessageValue { formatToX(): X // where X is an implementation-defined type getInput(): ValueType getOutput(): ValueType - properties(): { [key: string]: MessageValue } + properties(): { [key: string]: ValueType } selectKeys(keys: string[]): string[] } ``` @@ -1110,7 +1119,7 @@ choose which options to pass through into the resulting Instead of using `unknown` as the result type of `getValue()`, we use `ValueType`, mentioned previously. Instead of using `unknown` as the value type for the -`properties()` object, we use `MessageValue`, +`properties()` object, we use `ValueType`, since options can also be full `MessageValue`s with their own options. Without a mechanism for type signatures, @@ -1147,8 +1156,8 @@ number : Number -> FormattedNumber date : Date -> FormattedDate ``` -The resolved value type would be the same as -in the formatted value model. +The `MessageValue` type would be defined the same way +as in the formatted value model. The difference is that built-in functions would not accept a "formatted result" From 00fbc02524a5cff15fafaccfd363c7e71b6d876a Mon Sep 17 00:00:00 2001 From: Tim Chevalier Date: Wed, 5 Jun 2024 13:21:57 +0200 Subject: [PATCH 7/7] Refer to motivating examples --- exploration/function-composition-part-1.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/exploration/function-composition-part-1.md b/exploration/function-composition-part-1.md index 01afc7a5c..7ae42a803 100644 --- a/exploration/function-composition-part-1.md +++ b/exploration/function-composition-part-1.md @@ -1080,7 +1080,10 @@ interface MessageValue { `MessageValue` is effectively a `ValueType` with methods. Using this definition would make some of the use cases -impractical. +impractical. For example, the result of Example A4 +might be surprising. Also, Example 1.3 from +[the dataflow composability design doc](https://github.com/unicode-org/message-format-wg/blob/main/exploration/dataflow-composability.md) +wouldn't work because options aren't preserved. ### Preservation model (Composition can operate on input and options) @@ -1121,6 +1124,13 @@ we use `ValueType`, mentioned previously. Instead of using `unknown` as the value type for the `properties()` object, we use `ValueType`, since options can also be full `MessageValue`s with their own options. +(The motivation for this is Example 1.3 from +[the "dataflow composability" design doc](https://github.com/unicode-org/message-format-wg/blob/main/exploration/dataflow-composability.md).) + +This solution allows functions to pipeline input, +operate on output, or both; as well as to examine +previously passed options. Any example from this +document can be implemented. Without a mechanism for type signatures, it may be hard for users to tell which combinations