Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: are refinement types viable? #429

Open
dzmitry-lahoda opened this issue Dec 12, 2024 · 6 comments
Open

question: are refinement types viable? #429

dzmitry-lahoda opened this issue Dec 12, 2024 · 6 comments

Comments

@dzmitry-lahoda
Copy link

dzmitry-lahoda commented Dec 12, 2024

what

Refined types allow producing new types by limiting some values of the base type by predicates.

Wuffs is one industry-successful example of an application. Jolie is something like WIT, but not popular.

I do not know what the exact best syntax is,
or if it should be mixed with attributes.

Most typical refinements are ranges, and possibly we can define a dictionary via refinements too.

I will share some examples, and then some reflection on who should validate refinement.

examples

type a = u8; // unrefined base type
type b = u8  where range(1..42]; // refined type, assumed 8-bit on the wire and in mem, but limited in usage
type c = u32 where range[1..42]; // refined type, different from b by size in mem or on the wire

record x {
   a: a,
   b: c,
}

type z = list<(a, b)> where sorted by 0; # sorted by tuple property of its first element

type y = list<x> where sorted by a; # just sorted, in any direction

type k = list<x> where sorted asc by a; # sorted asc

type u = list<x> where sorted by a, b; # sorted by a and by b

type d = list<x> where sorted by a unique by a; # dictionary

type o = x where a < b; // some equality for a and b to be possible

type l = list<u8> where length(2..42);

type p  = x where a + b < u64::MAX ; // fails to compile because max cannot be reached ever

type t = x where  a + b < 4096;

how to validate

Validating numeric ranges is easy.

Validating sorted is easy too; the first 2 elements define order.

Requires < or > refinements too (need to do comparison for sortedness).

Sorted can be desc or asc if needed.

Sorted unique is also easy to validate.

IMHO it should fail to allow defining unique but not sorted, at least feature-gate it behind a DDoS flag :).

who to validate

By default, nobody should validate; there is no spec.

If the reader trusts the writer of ABI, then the reader does not need to validate.

If the reader does not trust the writer, it should validate.

The wit-bindgen generator can generate code, or the reader can do manual validation.

Wit-bindgen can validate data provided by the writer, maybe in debug mode.

type system

It may be safe to allow validation only of fixed-size entities by default, other features gated.
Except length, it specifically for variable length.

Generators for target languages which support some kind of narrowing,
for example via inheritance.
Can generate code for each refinement type, and allow a function for a broader type to receive a more narrow type.
Implementer which reads refined type data, can decide to store refined u8 in u5 as part of some stucture if that fits, for example.

But that is optional to add type safety to targets.

So refinements act as:

- documentation (always)
- generation of validator for reader or writer (optional)
- for type systems of target languages (optional)

Refinements will be less ad hoc and more clear, rather than people start put refinements into #58 .

options

Seems Wuffs and https://github.com/dfinity/candid/blob/master/spec/Candid.md and Jolie do not come with dependent types.

So many of validation types are attempts on refinements.

WIT does not have in place backward compatiblity. Protobuf has it implicitly embedded and ad hoc.
Candid has has in place versioning, but seems it pays with extra bytes https://github.com/dfinity/candid/blob/master/spec/Candid.md.
May be refinements give other option for versioning without binary overhead.

links

https://en.wikipedia.org/wiki/Refinement_type
https://www.jolie-lang.org/index.html
https://github.com/google/wuffs/blob/main/doc/glossary.md#refinement-type
https://www.reddit.com/r/dependent_types/comments/ay7d86/what_can_refinement_types_do_that_dependent_types/
https://ucsd-progsys.github.io/liquidhaskell/
https://json-schema.org/understanding-json-schema/reference
https://zod.dev/
https://github.com/gcanti/io-ts
https://github.com/stefan-hoeck/idris2-refined/blob/main/docs/src/Intro.md
https://dafny.org/dafny/DafnyRef/DafnyRef#sec-module-refinement
https://flux-rs.github.io/flux/
https://plv.mpi-sws.org/refinedrust/
microsoft/TypeScript#43505
microsoft/TypeScript#4895
https://fstar-lang.org/tutorial/book/part1/part1_getting_off_the_ground.html
https://www.w3.org/TR/xmlschema11-1/
https://smithy.io/2.0/spec/constraint-traits.html

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Dec 13, 2024

Speaking entirely for myself here: I do believe refinement types might eventually make sense to include for WIT.

In Rust we've been toying with the idea of introducing a limited form of this in the form of pattern types. This is more limited because it doesn't allow for arbitrary pre/post-conditions, but basically just makes Rust's pattern language work in more places. Here is a basic example of what pattern types in Rust would look like using number ranges:

// a function which can only operate 
// on numbers between 1 and 10
fn refined(num: u8 is 1..10) { .. }

Earlier this year I wrote a section on the challenges Rust will face when trying to extend existing APIs with pattern types (link). I suspect that WIT will have an easier time adding refinement/pattern types than a language like Rust does. That's because even if we refine APIs in existing WIT docs, that doesn't mean that bindings generators need to change their codegen to match. And vice versa: codegen may change even if the WIT docs remain stable. I believe there may be relatively few downsides to extending WITs type system with some version of refinement. Though there are probably a number of other features we should prioritize first.

@lukewagner
Copy link
Member

I've definitely seen some use cases come up that seem like they'd benefit from refinement types. One part of the design space that has stumped me in the past that you also touch concerns runtime validation. While we could consider the predicates to be optionally validated, I worry this will create scenarios of mismatched expectations where a consumer of a value assumes the predicate holds and this leading to, ultimately, security vulnerabilities for cases that should have been guaranteed traps. This makes me wonder if instead we should define a small expression language that can be validated by the C-M lifting/lowering machinery so that a value consumer could trust that the predicate held, but this of course raises a lot of design questions.

@yoshuawuyts
Copy link
Member

Yeah, arbitrary pre/post conditions seem like they would be hard to encode. I believe that if we could constrain refinements to some declarative subset, we might get most of the benefits without the downsides.

I wonder how well pattern types could map onto WIT? WIT doesn't have patterns, but it does have strings, enums, and numbers — maybe we could define a pattern language? Is that roughly what you were thinking too?

@dzmitry-lahoda
Copy link
Author

dzmitry-lahoda commented Dec 15, 2024

(I am for defining constraint expressions/predicates on types, which can be matched as needed in wit-bindgen extensions. But against explicit pattern matching in functions in WIT language. wit-bindgen extensions can match expression and decide what way they express these in generated code).

Refinement types will lead to more consistency in WIT language usage and good competition in runtimes (forks of source generator extensions).

Part of refinements are simple arithmetic predicates.
Some refinements are categorical, like being ordered or unique.

A mix of arithmetic rules, numeric limits, and categorical properties allows WIT to validate certain constraints at compile time. Forked wit-bindgen extensions can extend this validation as needed.

Extensions may pattern-match expressions to determine the types they generate and decide which WIT definitions to reject.

Matching refinements can help decide on fixed types, dictionaries/maps/lists of tuples, attributes, and traits to generate in code—all from simple math or categorical rules.

Industry Impact

Refinements will make WIT more consistent across the industry.
Just as SQL enabled compatibility across database systems, WIT can enable compatibility across WASM runtimes and solutions.

Each solution/runtime can fork the extensions they need, deciding how to handle source generation, runtime execution, and expression matching. Canonical ABI and official guidance (focused on safety) will serve as the default.

Look at official Rust extension, it already defines 2 cases, one in which Strings to be validated to be UTF8 and other case when they are just bytes. So it seems will progress in other directions regarding types and validations.

Examples

  1. Native host of wasm allows wasm to call call other wasm, and wasm to call host , and host to call wasm. Different codes have different levels of trust and maturity and different performance requirements. Generated code can vary in how typed/validated things are.
  2. Language with some (approximation of) refinement types, may run solver in WIT AST and generate code which allows one type to be used in place of other, as long as they fit.
  3. wit-bindgen extension fork can conditionally add JsonSchema or BorshSchema attributes to types.
  4. For expression pattern of unique, some extension forks can generate list of tuples, or hash map or btree map, or warning during compilation because unique but not sorted. allowing to fit whatever requirements are about usage pattern and performance.

Relevant Links

  • Example of an expression language used in the industry:
    Google CEL (Common Expression Language)
    (Protobuf appears to have introduced refinements retroactively and they are not available for use by codegen.)

  • Non-composable refinements with single runtime validation policies:
    Smithy Constraint Traits.
    These are not composable with each other, so it is half broken solution.

While we could consider the predicates to be optionally validated, I worry this will create scenarios of mismatched expectations where a consumer of a value assumes the predicate holds and this leading to, ultimately, security vulnerabilities for cases that should have been guaranteed traps. This makes me wonder if instead we should define a small expression language that can be validated by the C-M lifting/lowering machinery so that a value consumer could trust that the predicate held, but this of course raises a lot of design questions.

mismatched expectations will be for sure regardless of what will be done to WIT, that is kind of expected.
But refinements expressions(predicates) can reduce mismatches and make these more manageable.

@oovm
Copy link

oovm commented Dec 24, 2024

Refinement types as an enhancement to type alias can be helpful.

For example, define

type non-nan-f32 = f32 where not nan // NonNan<f32>

NonNan<f32> usually has better properties, supports Eq, Ord

Another example

type vec = list<u8>
type hash-set<u8> = list<u8> where elements unique
type btree-set<u8> = list<u8> where elements unique and sorted
type index-set<u8> = list<u8> where elements unique and indexed

The corresponding specific interfaces usually have some better mechanisms

I think we can formulate a set of syntax through @refine(...), so that we can temporarily stop thinking about coding and do some harmless attempts.


I suggest using @predicate to export some check functions for guest language to do type cast, compile-time check, or runtime check.

A predicate function is a transformation of T -> result<refined<T>, ?> and should have its function signature checked.

Its mapping type is try_refine(raw: T) -> Result<T, _> { }

The boolean version is_ok(T) -> bool is easily inlined and does not need to be exported.

It is up to the user to decide whether to check.

@oovm
Copy link

oovm commented Dec 24, 2024

Another benefit is to add an OpenAPI-like extension definition to wrpc, which can then generate a good user interface similar to Swagger UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants