Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback from PhpStorm #5

Closed
wbars opened this issue Jan 14, 2021 · 9 comments
Closed

Feedback from PhpStorm #5

wbars opened this issue Jan 14, 2021 · 9 comments

Comments

@wbars
Copy link

wbars commented Jan 14, 2021

Hello, I want to share purely technical feedback about possible support for this RFC in PhpStorm.

Disclaimer

In the following text by "support" I will mean general IDE features like code analysis and completion.
Also, this feedback is preliminary because it's based on the draft, and it's neither a commitment nor a refusal of any of the support, be it current or future one. We still evaluate the necessity of features and needed effort for them individually based on various factors.

Support for correctly parsing and highlighting generic PHP doc tags is out of scope of this feedback because PhpStorm will most certainly be able to support any formal syntax for this kind of job.

Also please note that this is overview of a current PhpStorm engine state and it may change in the future.

Technical problems

In its current form it’s unlikely that PhpStorm will be able to fully support the current proposal. To find out why I want to shed some light on internals of our engine.

  1. Most of our local code analysis and type inference is performed “on the fly”. For the sake of simplicity we can assume that code analysis pass is being restarted on every change of the editor content, for example, typing of a letter. It means that analysis has to be “fast”, and we achieve this by ensuring that we don’t directly access the content of other files during analysis since it is a very expensive operation in our terms (a file needs to be read, parsed, etc.). So instead we use indices built for such files.

Consider the following example:

<?php
// a.php

$a = f();
$a->x();
<?php
// b.php
class A {
function m() {}
}

function f(): A {
return new A();
}

Suppose we run a local inspection "undefined method" on the file "a.php". To know which type the function call f() will have, we need to access the content of the file "b.php". Since we're not allowed to access this content directly, we need to build indices for the file "b.php" and access them instead. In those indices there will be some information like "Function \f has return type \A".

  1. Speaking about indices, they are built for each file separately and are rebuilt for appropriate file changes. It means that if the content of the file 'b.php' changes, then only the indices for the file 'b.php' will be rebuilt and the indices for 'a.php' will stay the same.

It's an important property of indices and to preserve it, there is the following principle for building them:
index information for the file should be computed based solely on the content of the file being indexed.
It means that when indices for "b.php" is built we:

  • can't access other indices (they are not built yet)
  • can't access the content of other files (to ensure granular invalidation).
  1. Since PHP is a dynamically typed language, we also keep the inferred types of a function in indexes in case of no explicit return type declaration.
<?php
function f() { // we still want to be able to work with this function calls in different files, so inferring type locally during indexing
return new A();
}

If the content of the current file is not enough to infer the full type of the function, we encode non-complete type information which will be used as a guide to resolve the full type during code analysis.

<?php
function f() {
return b(); // type of `\b()` is unknown during indexing, so in indexes there will be encoded information about the fact that we need to search for the return type of function `\b`.
            // this info can be used only after all indices are built including local code analysis
}

For this file, such guide may look like this:
Find function \b(); its return type is the return type of \f().
It will be stored on disk in indices, while the original content of the function f will be inaccessible during type resolve.

These 3 points can be summarized as the following statement: the more IDE is able to know about the code based solely on the content of the current file, the better support will be.

Now, with understanding of these points let's try to imagine how support of the current proposal may look like.
Consider the following code:

<?php
function a() {
$f = another_create(new F(), f(), 1);
return create(new A(), f(), 1)->x($f);
}

Suppose we want to index this file and provide a return type of a(). Since the type is not fully known during indexing, we will encode some guiding information on how to resolve this type.
If there are no generics, a guide may look like this:

  1. Find function create(), see its return type
  2. In this return type, find method x(). Its return value is the return type of the function \\a.

The important property of a computed guide is that its size is a linear function of the size of the method being indexed. That said, if we add one more method call into a chain, the size of this "guide" will only grow one step.

Now let's try to solve the same task in the potential presence of generics. It’s irrelevant what the actual declarations of these functions are because an index has to assume every possible scenario based on the content of only the current file. That said, a file will be indexed in a manner that every function call can potentially be a generic one:

  1. Find function 'create'. Since during indexing we don't know whether the function is a generic one and if so, on which parameter, we need to record all argument types:
    1. Class \A
    2. Result of calling \f()
    3. int
  2. For each of this type suppose that ‘create’ can be generic over this type, map these types to an incomplete type for ‘create’ functions, and for each of these types find the method x().
  3. Again, since for the method x we don't know whether the method is a generic one, we need to record all argument types (in this case, $f) and in doing so basically enumerate all pairs from
    parameters lists of 'another_create' call and ‘create’.

The size of such a guide grows way faster than linear progression and to the point where performance becomes unacceptable considering the size of real PHP projects.

This is the problem. There are potential ways to solve this both from the IDE side and via standard means:

  1. IDE may provide a hard limit on added types. From our experience, this produces unstable code analysis results (e.g. flaky behavior) which we consider a major problem.
  2. IDE may take into account generics only during the "global" inspections pass or invent new lifecycle. This will make the task pretty much the same as the task that current static analysis tools are facing. Unfortunately we didn't invent a lot in this direction before, since our users mostly benefit from "on-the-fly" analysis. As a result, a huge effort is required here.
  3. The proposal itself may be adjusted to follow the earlier statement: the more IDE is able to know about the code based solely on the content of the current file, the better support will be.
    This is not an easy task to come up with such a proposal, so I don't have a concrete scheme that may suite us, but I do have general examples which may work and show possible paths to investigate:
    1. Some way to indicate from call-site that this function is a generic one and if so, on which type from. For this example, it may be some PHP syntax construct like create<F>(f()): from this signature, it's clear which type the function is returning.
    2. It may be relaxed a bit like create<>(f()): from this call, there is no way to tell which type is returned, but it serves an important purpose of marking generic calls.

Hope you find it useful.
Kirill.

@DaveLiddament
Copy link
Owner

Thanks @wbars for this, I really appreciate you taking the time to read the proposal and respond in such detail.

I'm digesting this information and will follow up with any questions / suggestions I've got. Thanks once again for your input.

@DaveLiddament
Copy link
Owner

@wbars thanks once again for your input to this.

Considering the constraints placed by using indicies and the requirement for fast analysis, given the following code:

Queue.php

/** @template T */
class Queue {

  /** @param T $item */
  public function add($item): void {...}

  /** @return T */
 public function next() {...}
}

Cat.php

class Cat {
  public function meow(): void {...}
}

Dog.php

class Dog {
  public function bark(): void {...}
}

I assume the following could be inferred on the fly by looking at the indicies:

file_1.php

// Look up Queue from indicies, see that it takes template type T. 
// From comment see that for $cats instance of Queue, T is Cat. Cat can also be looked up from indicies.
/** @var Queue<Cat> $cats */
$cats = new Queue(); // 

$cats->add(new Cat()); // OK
$cats->add(new Dog()); // ERROR: Expecting $item of type Cat

$cats->next()->meow(); // OK
$cats->next()->bark(); // ERROR 

// Look up Queue from indicies. In this case no substitute value for T is given.
$items = new Queue(); // No idea what T is for Queue

$items->add(new Cat); // Not definitely an error so assume OK 
$items->add(new Dog); // Not definitely an error so assume OK

$items->next()->bark(); // As we don't know type of $item can not say for sure this is an error. So don't report an error. (which is same as today's behaviour for unknown types).

Is it also correct that even this would also be possible within indicies and fast analysis constraints?

CatQueueProvider.php

// Stores in indicies that getCatQueue returns Queue where T is Cat
/** @return Queue<Cat> */
function getCatQueue(): Queue {...}

file2.php

// From indicies look see that getCatQueue returns Queue where T is Cat.
// Then lookup Queue and Cat from indicies.
$cats = getCatQueue();

$cats->add(new Cat()); // OK
$cats->add(new Dog()); // ERROR: Expecting $item of type Cat

$cats->next()->meow(); // OK
$cats->next()->bark(); // ERROR: Cats can't bark

If I have understood your initial comment it appears to me the above could be done given the indicies and fast analysis constraints?

I'm aware that both PHPStan and Psalm have a caching mechanism. I assume when this is used it places similar limits as PhpStorm has. Looking at the documentation for PHPStan's result cache are the reasons for partial or complete result cache invalidation similar to triggers for rebuilding some or all of PhpStorm's indicies?

I realise if people do lots of wired and wonderful things in PHP due to its dynamic nature. But people who write "simple" code like the above will get lots of benefits from this level of support for generics. And the people that do wired and wonderful PHP probably don't care about type safety (at static analysis time) anyway.

@wbars
Copy link
Author

wbars commented Jan 28, 2021

While inline phpdocs could help IDE I afraid people won't be happy to write such docs everywhere.

@wbars
Copy link
Author

wbars commented Jan 28, 2021

Regarding last paragraph I get your point. Issue is that once enabled we have to assume that every projects uses it while indexing and we won't be able to differentiate projects which contains and not contains with current architecture, so overhead will be applied to every user.

@muglug
Copy link

muglug commented Feb 2, 2021

As long as its own index is populated, Psalm is able to analyse a single file without parsing & analysing other files.

It seems the only big difference is Psalm's inference of return types – it doesn't store any inferred return type for B::foo when it scans the following:

class B {
  public static function foo() {
    return new A();
  }
}

(it can add return types to such functions as PhpStorm can, but that's a separate thing).

Psalm has a language server, and is still able to perform autocomplete in a moderately timely fashion (even though it uses PHP). I'm not quite understanding what the issue is. @wbars happy to talk offline to describe how Psalm handles this!

@wbars
Copy link
Author

wbars commented Feb 8, 2021

Adding return types based on content of local file is the core functionality which will have all stated issues. Since generic can't be resolved locally we'll have to add all possible generic variants to index to resolve later, this will end up with exponential index growth.

@muglug
Copy link

muglug commented Feb 8, 2021

But you don’t have to add the variants, right? You can just type it as mixed if not given. And it’s a reasonably safe assumption that the sort of developers who use templates are also likely to be the ones to use explicit return types for functions.

@muglug
Copy link

muglug commented Feb 8, 2021

To put this another way: how does WebStorm handle TypeScript, which has the same basic constructs?

@wbars
Copy link
Author

wbars commented Feb 10, 2021

You can just type it as mixed if not given

Unfortunately, many of our code analysis skip elements with mixed to avoid false positives so this may bring more harm than good.

And it’s a reasonably safe assumption that the sort of developers who use templates are also likely to be the ones to use explicit return types for functions.

We have diverse user base and for us such assumptions may not always be true.

To put this another way: how does WebStorm handle TypeScript, which has the same basic constructs?

I didn't manage to make non-trivial example type work for WS (notice any type instead of string). Also we have different infrastructure (also it of course means we can change it if necessary to suite more needs)

Screenshot 2021-02-10 at 21 37 31

Folks, it seems this thread becoming a bit vague to follow, main reason of it was to provide our feedback on the proposal. With concrete feature requests please open new issues in our issue tracker. I hope you found this feedback valuable regarding proposal itself, if you will need additional feedback on changes on this proposal, or another one, please let us know, thank you for consideration.

@wbars wbars closed this as completed Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants