Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type Inference on function arguments #15114

Closed
rtpg opened this issue Apr 11, 2017 · 26 comments
Closed

Type Inference on function arguments #15114

rtpg opened this issue Apr 11, 2017 · 26 comments
Labels
Duplicate An existing issue was already created

Comments

@rtpg
Copy link

rtpg commented Apr 11, 2017

TypeScript Version:
2.3.0

Code

function f(x: number){
   return x;
}

function g(x){ return x};

function h(x){ return f(x); };

Expected behavior:
I would expect the inferred type of g to be (x:any)=>any, and the infered type of h to be (x:number) => number (due to the restrictions placed by the call of f)

When compiling with noImplicitAny, I would only expect an error on g, not on h

Actual behavior:
Running tsc --declaration on this snippet gives me:

declare function f(x: number): number;
declare function g(x: any): any;
declare function h(x: any): number;

the x argument on h does not seem to use the f(x) call to tighten the definition.

Considering the simplicity of the example, I imagine I might be missing an important detail in the type system that makes this the proper inference result, but I haven't figured it out yet.

@vkurchatkin
Copy link
Contributor

Typescript never infers function arguments. So, unless you specify them, they are always going to be any

@rtpg
Copy link
Author

rtpg commented Apr 11, 2017

Right, but is there a reason for them not to? It feels within reach, and not much different than other kinds of inference

@RyanCavanaugh
Copy link
Member

It's very different and very complex. When you consider that any of those calls might have union types or overloaded signatures in their parameter types, it's not even clear what should happen for a single call (let alone multiple calls):

declare function f(n: number, s: string): void;
declare function f(s: string, n: number): void;
declare function f(x: boolean, y: boolean): void;

// g(a: ??, b: ??)
function g(a, b) {
    f(a, b);
}

Current inference is very straightforward: types almost always come from initializers or contextual types. Both of those are static one-pass things that you can follow back as a human. Trying to do inference from call sites looks neat in simple examples but can make it very hard to trace errors to their actual sources, and is unscalable to large programs.

@bcherny
Copy link

bcherny commented Apr 11, 2017

@RyanCavanaugh Is there some sort of design doc for how TypeScript's type inferer works, and what kind of system it's based on?

Eg. I believe a Hindley-Milner inferer would give a: boolean | number | string, b: boolean | number | string for your example. This seems to be @rtpg's expected behavior.

@RyanCavanaugh
Copy link
Member

That type would be incorrect because it allows string, string as an argument list.

The TypeScript spec https://github.com/Microsoft/TypeScript/blob/master/doc/spec.md defines how inference works at each declaration site. There is no global inference algorithm to describe because TS doesn't do global inference ala H-M.

@bcherny
Copy link

bcherny commented Apr 11, 2017

I see. So we would have to infer the same signature as f() itself.

What was the rationale for not using global inference? Is it that TS tries to do the best it can in the absence of a sound type system?

@RyanCavanaugh
Copy link
Member

RyanCavanaugh commented Apr 11, 2017

Soundness is mostly orthogonal.

I think the general problem is that global type inference in an imperative language is much more expensive than in a functional language. The difference between OCaml and Scala is instructive here. There's a reason Flow limits its "global" inference to file boundaries - it's not practical to scale up global inference to an entire large JS program.

Global inference also doesn't detect a lot of errors in imperative programs. Consider this code, which is a totally legal JavaScript program:

var x = { kind: 'a' };
if (Math.random() > 0.5) {
  x = { knd: 'b' };
}
x.type = "foo";

Flow says this program has one error, TypeScript says this program has two errors, H-M I believe says this program has zero errors (x ends up with as { kind: string, type: string } | { knd: string, type: string }). Who's right? I think one vs two is arguable, but it's almost certainly not zero (despite being a program that does not throw exceptions or observably mistype a value).

Global inference is also bad at producing sane errors

function doSomething(x) {
  console.log(x.y);
}

var arr = [{x: 100, y: 100}, {x: 200, z: 200}];
doSomething(arr.pop());

Flow helpfully says

2:   x.y;
       ^ property `y`. Property not found in
2:   x.y;
     ^ object literal

which does not help you at all at figuring out the typo elsewhere in the program.

So basically any sane programmer is going to put type annotations on parameters anyway (since you must at module boundaries anyway, and you must if you want sane errors). Once you have type annotations on type parameters, local inference is sufficient for the vast majority of cases.

@bcherny
Copy link

bcherny commented Apr 11, 2017

@RyanCavanaugh As always, thank you for the detailed and well thought out responses.

I think one vs two is arguable, but it's almost certainly not zero (despite being a program that does not throw exceptions or observably mistype a value).

This seems to be a philosophical question with tradeoffs.

1-2 errors is restrictive because it requires me to explicitly declare a type with 2 optional members before my program will compile, but as you said, the benefit is errors are local, which makes debugging fast and pleasant.

0 errors is more correct (the union type should be handled downstream, and if it's not, the compiler can warn about an unexhaustive match or possibly missing member), but the drawback is nonlocal errors when you're wondering "why is the compiler saying this a union type?".

[From an earlier comment] Trying to do inference from call sites looks neat in simple examples but can make it very hard to trace errors to their actual sources, and is unscalable to large programs.

How do Haskell, OCaml, and Elm engineers manage to use their languages at scale? From this SO post, it sounds like those languages are more restrictive than JS in what they can express (eg. no class extensions), which makes H-M usable.

@rtpg
Copy link
Author

rtpg commented Apr 12, 2017

@RyanCavanaugh relative to your initial example, where you call f (with 3 different type definitions):

I feel like in that case it would be reasonable to have this generate 3 different types for g. Maybe this leads to combinatorial explosion of types in some cases, but I imagine in most cases it won't because in real world examples function wrappers exist to make more specific calls to the inner functions.

TypeScript already has the control flow analysis in place to keep track of when x:string|number ends up actually being string or number, so it feels like you could do some of this work in reverse? Obviously easier said than done.

For full context: I really want to use noImplicitAny, but our codebase includes a lot of wrapper functions of the form function(a){ return g(a,'some_literal')}, and these end up being any. If we have to explicitly declare the type signatures on all of these it adds a lot of busy work and makes the "strict type" pitch that much harder.

Is it not possible to do local inference based on the function body (that is to say: ignoring the function's calling context)?

Bringing back the flow example (which I agree is weird):

function doSomething(x) {
  console.log(x.y);
}

var arr = [{x: 100, y: 100}, {x: 200, z: 200}];
doSomething(arr.pop());

I feel like the "proper" set of errors that should happen would be:

  • doSomething gets an inferred type of (x:{y: any}):void
  • arr is a little weird, so two possibilities:
    • arr requires an explicit type, given that the dictionaries are very different
    • arr gets a type like ({x:number, y:number} | {x:number, z:number})[]
  • In the case of the union type, doSomething(arr.pop()) fails because the union type doesn't align with the type of {x: any}

I realize "well let's just have giant types" feels very hand-wavy...

I think a reasonable alternative would be "try to deduce argument types from usage within the function, and opt for any if all collected possibilities in the function body have any inconsistencies (for example number and string, but not {x:string, y:string} and {x:string}, which ends up with {x:string, y:string}). This would allow for small anonymous functions to get the "obvious" typing while still requiring to be explicit in cases where there's no good answer.

Purescript (and its record types) has some functionality close to this, so inference works pretty well (though you tend to lose type aliases. This is already the case in TypeScript though)

@aluanhaddad
Copy link
Contributor

aluanhaddad commented Apr 14, 2017

So basically any sane programmer is going to put type annotations on parameters anyway (since you must at module boundaries anyway, and you must if you want sane errors). Once you have type annotations on type parameters, local inference is sufficient for the vast majority of cases.

I fully agree with this. The reasoning is very straightforward. It is also worth noting that no type annotations need be placed on the parameters declared by function literals passed as callback or assigned to an already typed local.

I am glad that you referenced Scala because it also has essentially the same requirements for where type annotations must be placed as TypeScript does under --noImplicitAny, they are only required on the parameters of functions that are not callbacks and not a value assigned to a typed local.

You also mention OCaml which while it has much stronger type inference than TypeScript or Scala, exhibits some very counterintuitive behavior with respect to inference of function parameter types
Consider this OCaml:

let square x = x * x 
let i = square 4
(* 16 *)
let f = square 4.0
(* Error: This expression has type float but an expression was expected of type int *)

@bcherny
Copy link

bcherny commented Apr 14, 2017

@aluanhaddad Nit: your example is an Ocamlism, and is not a statement about H-M in general. Eg. Haskell infers x as Num, which compiles:

let square x = x * x
let i = square 4 -- 16
let f = square 4.0 -- 16.0

@vkurchatkin
Copy link
Contributor

@bcherny well, this works in Haskell because of type classes, but they are not a part of H-M

@mhegazy
Copy link
Contributor

mhegazy commented Apr 27, 2017

This is a duplicate of #15196, related discussion can be found in #1265, #15114 and #11440.

Please see my response in #15196 (comment)

@mhegazy mhegazy closed this as completed Apr 27, 2017
@mhegazy mhegazy added the Duplicate An existing issue was already created label Apr 27, 2017
@rtpg
Copy link
Author

rtpg commented Apr 28, 2017

@mhegazy I don't want to litigate this too much, but what I'm asking for (using the function definition itself to infer parameter types) doesn't seem like non-local type inference? Since the function body is well scoped and doesn't belong to multiple files.

I see the argument for not having Haskell-style square inference, but what I'm talking about feels like it's much smaller and more well-defined in scope (see usages of parameters in the function body only, infer from that).

Maybe I'm missing a detail here or misunderstanding the meaning of "non-local" here.

@RyanCavanaugh
Copy link
Member

@rtpg there's nothing in TypeScript today that infers information about a variable from its usage. I agree it's smaller than Haskell-style inference... but it would be a very large architectural change. And it's still nonlocal since presumably code like this would be expected to produce inferences.

function fn(x) {
  bar(x);
}
function bar(y) {
  baz(y);
}
function baz(z) {
  return Math.sqrt(z);
}

@joewood
Copy link

joewood commented Jun 7, 2017

@RyanCavanaugh I thought type guards and flow analysis is doing exactly that - narrowing a type based on usage? Isn't the example above just an extension of that? At least for the simplest cases, it would be nice for TypeScript to achieve parity with Flow.

@RyanCavanaugh
Copy link
Member

@joewood type guards / flow analysis are straightforward (so to speak...) because they're "top-down" - given a statement in a function, it's relatively easy to determine which control paths it's reachable from, because JavaScript doesn't have control flow structures that can go "up" (e.g. COMEFROM). This is very different from a function call - you can be in a function on line 1500 and the first call might be on line 4000. Or maybe you're indirectly called via some callback, etc..

The dream of inference from function calls is really not clear as some people would imply. Here's an example in Flow:

declare function swapNumberString(n: string): number;
declare function swapNumberString(n: number): string;

// Pass-through function
function subs(s) {
  return swapNumberString(s);
}

// This stops being an error if you comment
// out the call in g() below. ??
const s: string = subs(12);

function g() {
  subs("");
}

This is "spooky action at a distance" at its most maximal. You can have the call to subs in g, or the call in the s initializer, but not both. Huh? And why does the wrapper function change the behavior of this at all (replacing subs with swapNumberString makes the errors go away), if it's all inferential magic?

Realistically there are two cases that usually happen if you use inference from call sites / data flow analysis:

  • Your file typechecks
  • You get an error in a correctly-implemented function body due to a bad call

If your file typechecks, cool, no work required.

In the other case, the first thing you do to diagnose the problem is... drumroll... add a parameter type annotation so you can figure out where the bad call is coming from. If there's indirection, you'll probably have to do this in layers. You'll end up with a file full of parameter type annotations, which is good since you'll need them anyway for cross-file typechecks. So it's great for a single-file playground demo but for "real" software development, you'll end up with approximately the same number of parameter type annotations as if you used a more local inference algorithm.

@joewood
Copy link

joewood commented Jun 9, 2017

Thanks @RyanCavanaugh for the extensive explanation. I agree, this could become a nightmare of different permutations with contradicting definitions. But, I'm wondering if this is all or nothing? Where a parameter is used definitively inside a function (e.g. against an explicit declared type with no overloads), can a deduction be made about its type? Then, if there's any whiff of ambiguity, including Unions and Overloads, then it backs off to the default any. I agree, the bottom approach is different to the type guard processing logic.

Another approach is to make this a tooling feature. An "autofix" feature could be added to fix untyped parameters based on their usage inside the function. This could speed up fixing "no implicit any" migrations.

@bcherny
Copy link

bcherny commented Jun 9, 2017

An "autofix" feature could be added to fix untyped parameters based on their usage inside the function. This could speed up fixing "no implicit any" migrations.

That reminds me of Flow's existential types, which is a sort of opt-in H-M prover. I imagine it's a lot of work to build this if it's just some optional feature, though (as opposed to feature everyone would use). The very fact that it's opt-in (while the default type is still any) signals that it may not always be helpful.

@vkurchatkin
Copy link
Contributor

That reminds me of Flow's existential types, which is a sort of opt-in H-M prover

It's not like H-M in any way

The very fact that it's opt-in (while the default type is still any)

This is not true

@bcherny
Copy link

bcherny commented Jun 9, 2017

@vkurchatkin Can you explain? This is how I understand the docs I linked.

@vkurchatkin
Copy link
Contributor

@bcherny If you omit type annotation it behaves kind of like *, i.e. type is inferred

@masaeedu
Copy link
Contributor

masaeedu commented Jul 28, 2017

@RyanCavanaugh There is a simple mechanism for producing sound (but rarely useful) signatures for intersections and unions of functions, described in #14107 (comment). It would work for your example with g, and would infer a as number | string | boolean and b as string & number & boolean. This is a totally valid use of fs supported API, but is unfortunately useless because in this instance no one can actually supply a string & number & boolean (except by casting).

In reality g should be inferred as having an identical overloaded signature to f, but it requires some thought to figure out some sensible rules from which this behavior emerges. I am confident it isn't impossible to do "intuitive" type inference, even in a type system with the constraints TypeScript is under.

@masaeedu
Copy link
Contributor

@RyanCavanaugh For the following example:

declare function swapNumberString(n: string): number;
declare function swapNumberString(n: number): string;

// Pass-through function
function subs(s) {
  return swapNumberString(s);
}

// This stops being an error if you comment
// out the call in g() below. ??
const s: string = subs(12);

function g() {
  subs("");
}

The ask is for subs to be inferred as having the overloaded type ((s: string) => number) & ((s: number) => string), based on the usage of s within the body of subs.

If you only add this feature, and nothing else, how would this give rise to the "spooky action at a distance" problem you are describing? Both subs("") and subs(12) are valid uses of such an overloaded function.

@masaeedu
Copy link
Contributor

masaeedu commented Jul 28, 2017

Similarly, for the following example:

function fn(x) {
  bar(x);
}
function bar(y) {
  baz(y);
}
function baz(z) {
  return Math.sqrt(z);
}

The inference would be non-local only in the sense that the return type inference is non-local. Just as an inferred return type may affect the inferred return type of a caller, the inferred parameter type of a function may affect the inferred parameter type of a caller. The inference is in unidirectional (barring edge cases like recursion), and in inverse direction to the call graph.

To avoid complicated edge cases, you could start with inferring types for parameters that:

  • don't already have type constraints
  • are only used once
  • are used in positions that do not involve interaction with overloaded functions

It is fine to bail out and infer any for any cases for which a reasonable inference strategy is currently unknown, at which point a user who has noImplicitAny enabled will go through the usual rigmarole.

@masaeedu
Copy link
Contributor

masaeedu commented Aug 10, 2017

I've made a rudimentary attempt at this in master...masaeedu:master that makes the case in the OP (and some others I am interested in) work. The compiler can build itself, although not all tests pass.

The algorithm is as follows. For all parameters that cannot otherwise be typed, an additional step is tried before bailing out and inferring any:

  1. The function body is searched for invocation expressions that pass the parameter as an argument
  2. When such an invocation is found, we attempt to determine the type of the corresponding parameter in the invoked function's signature (which may result in recursion)
  3. If a type is determined, we add said type to a list of "usage types" of the parameter
  4. Finally, we infer the parameter to have an intersection of all its "usage types".

Obviously this is nowhere near complete, and the implementation does not even fit with the codebase style, but it is something concrete to play with.

Some simple work to add to this:

  • Look for other kinds of "usage" of the parameter, such as assignment to well-typed references, property access, use as a function with well-typed parameters, etc.

Some hard work to add to this:

  • Work out all the inevitable hairy edge cases, none of which I've thought about, and bail out in some sensible and graceful way (usually just return undefined, which results in the parameter being anyly typed as usual)
  • Deal with cycles properly (the algorithm is identical to the stack based one for determining the function's return type, just need to work it into the currently naive implementation)
  • Deal with invocation expressions involving overloaded functions by inferring an overloaded signature for the current function with respect to the used parameter

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Duplicate An existing issue was already created
Projects
None yet
Development

No branches or pull requests

8 participants