-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generics experimentation (squashed) #4
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Arnaud Le Blanc <arnaud.lb@gmail.com>
1e0c564
to
4ff8ada
Compare
Updated the branch with partial support for declaration-site variance, bounds, and type inference. Opcache is probably broken again. The branch can be built without opcache. When testing, the result of inference can be observed with
My goal with type inference was to find blockers and to measure the impact on performance. I didn't find blockers, but performance will be an issue and a considerable amount of work is still necessary to make it acceptable. Here are some details on semantics and implementation: As a reminder, type inference allows to omit type arguments in The basic principle is that we can determine types by matching the type of constructor parameters with constructor arguments. The semantics can be described as follows:
This is applied to all constructor parameters. Whenever a type variable is matched multiple times, its type is the union of all its instances. Maybe this is better described as pseudo code:
Examples:
The last case would result in The end result is a Now the non-trivial part is to find the type of the constructor arguments. Using their runtime type would be too specific. To illustrate why, consider the following snippet:
The variable That's why we need to infer types statically, ideally at compile time. We do so with a forward data flow analysis reusing the same infrastructure as zend_inference.c. Unlike zend_inference.c however, the analysis doesn't have to be conservative, and can be designed to be unsound if it improves usability. For example we ignore references. We could also consider that arithmetic operations never promote to Since we don't have access to all function signatures and classes at compile time, we have to represent some types in an abstract way, with symbols. For example the type of Symbolic types can be resolved at runtime when needed after functions and classes have been loaded, at a fraction of what is would cost to run the entire analysis at runtime. This operation can be cached (in runtime cache) for the duration of the request. These can represent an infinity of types and prevent the analysis from reaching a fixpoint. E.g. the variable $a in the following code will successively have the types
We handle these by using a variation of the algorithm decribed in: Weiss G, Schonberg E. Typefinding recursive structures: A data-flow analysis in the presence of infinite type sets. New York University. Courant Institute of Mathematical Sciences. Computer Science Department; 1986. This can be summarized as follows: Operands that depend on the operation that uses them are considered recursive, and are represented by a placeholder. We use SCCs to find such operands. After a fixpoint is reached we can resolve the placeholders. Based on the assumption that operations are indempotent (type wise), we can eliminate recursivity: Now for some issues: Symbolic types are contagious: If an operation has a symbolic operand, its type must also be represented as a symbolic type. For example, the operation I anticipated that symbolic type expressions could often be folded, but it's not the case for two reasons: First, operations can be overloaded so we can not assume basic properties about them. And second, functions may be generic so their return type depend on arguments and we have to represent them. We only need to resolve them when the function happens to be actually generic, though, so most An other issue is that merging union types when resolving symbols is O(mn), or worse when intersections are involved. There are potential mitigations to these, such as inline caches, or generalizing the link/inheritance cache to types, or eagerly loading functions/class, but these will take time to implement. Additionally, type checking is also affected: Type variables imply that some assumptions we make in zend_type_check are not valid anymore (e.g. a type list can effectively contain simple types, and an intersection may contain unions or other intersections). Also, while checking a union type against a value is O(n), checking it against an other union type (when checking In theory more types should result in fewer type checks at runtime, which would mitigate that, but we are not there yet (one reason is we can not rely on external functions/classes at compile time). So overall performance may be an issue, and there is a lot of work remaining. This branch implements what is described above, but not the potential performance mitigations, and is in a PoC state. Many things will not work. Generic class properties are broken (type checking is not fully implemented). Memory management needs to be reworked (everything is allocated on the arena). Opcache is not supported. |
@arnaud-lb Can you please check this particular example again? From my understanding, inference should only work if there is no ambiguity and by assigning
|
We can experiment with different behaviors before we decide which one is the most useful and practicable. Could you elaborate on why this would be a better behavior? TypeScript and PHPStan behave like in the original example, and from experience this is useful. From a performance perspective, this may not be the best choice however. |
TypeScript is a bit different, because in that example the type is inferred to From my point of view, static typing is a way of avoiding errors by being explicit about the types. Inferring T is fine when all uses of T are of the same type, but once you mix types (that possibly cannot be replaced with a superset, like string) there is a significant chance that this was unintentional. In your example it is clear that the different types are intended, but variables often originate from somewhere else so it may not always be obvious. Erring on asking the user to be explicit when the type cannot be inferred without doubt may make more sense in my opinion. |
even without sanitizers, it is reproducible but with the following ``` <?php $g = gmp_init(256); var_dump(gmp_pow($g, PHP_INT_MAX)); ``` we get this ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==286922==ERROR: AddressSanitizer: FPE on unknown address 0x03e8000460ca (pc 0x7faf6c69de5c bp 0x400000000000004 sp 0x7ffe9843c740 T0) #0 0x7faf6c69de5c in __pthread_kill_implementation nptl/pthread_kill.c:44 #1 0x7faf6c649c81 in __GI_raise ../sysdeps/posix/raise.c:26 #2 0x7faf6db9386c in __gmp_exception (/lib/x86_64-linux-gnu/libgmp.so.10+0xd86c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #3 0x7faf6db938d3 in __gmp_overflow_in_mpz (/lib/x86_64-linux-gnu/libgmp.so.10+0xd8d3) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #4 0x7faf6dbac95c in __gmpz_realloc (/lib/x86_64-linux-gnu/libgmp.so.10+0x2695c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #5 0x7faf6dba9038 in __gmpz_n_pow_ui (/lib/x86_64-linux-gnu/libgmp.so.10+0x23038) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #6 0x5565ae1ccd9f in zif_gmp_pow /home/dcarlier/Contribs/php-src/ext/gmp/gmp.c:1286 #7 0x5565aee96ea9 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:1312 #8 0x5565af144320 in execute_ex /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:56075 #9 0x5565af160f07 in zend_execute /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:60439 #10 0x5565aed6fafe in zend_execute_scripts /home/dcarlier/Contribs/php-src/Zend/zend.c:1842 php#11 0x5565aeae70a8 in php_execute_script /home/dcarlier/Contribs/php-src/main/main.c:2578 php#12 0x5565af532f4e in do_cli /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:964 php#13 0x5565af535877 in main /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:1334 php#14 0x7faf6c633d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 php#15 0x7faf6c633e24 in __libc_start_main_impl ../csu/libc-start.c:360 php#16 0x5565adc04040 in _start (/home/dcarlier/Contribs/php-src/sapi/cli/php+0x2604040) (BuildId: 949049955bdf8b7197390b1978a1dfc3ef6fdf38) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: FPE nptl/pthread_kill.c:44 in __pthread_kill_implementation ==286922==ABORTING ```
even without sanitizers, it is reproducible but with the following ``` <?php $g = gmp_init(256); var_dump(gmp_pow($g, PHP_INT_MAX)); ``` we get this ``` AddressSanitizer:DEADLYSIGNAL ================================================================= ==286922==ERROR: AddressSanitizer: FPE on unknown address 0x03e8000460ca (pc 0x7faf6c69de5c bp 0x400000000000004 sp 0x7ffe9843c740 T0) #0 0x7faf6c69de5c in __pthread_kill_implementation nptl/pthread_kill.c:44 #1 0x7faf6c649c81 in __GI_raise ../sysdeps/posix/raise.c:26 #2 0x7faf6db9386c in __gmp_exception (/lib/x86_64-linux-gnu/libgmp.so.10+0xd86c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #3 0x7faf6db938d3 in __gmp_overflow_in_mpz (/lib/x86_64-linux-gnu/libgmp.so.10+0xd8d3) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #4 0x7faf6dbac95c in __gmpz_realloc (/lib/x86_64-linux-gnu/libgmp.so.10+0x2695c) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #5 0x7faf6dba9038 in __gmpz_n_pow_ui (/lib/x86_64-linux-gnu/libgmp.so.10+0x23038) (BuildId: 1af68a49fe041a5bb48a2915c3d47541f713bb38) #6 0x5565ae1ccd9f in zif_gmp_pow /home/dcarlier/Contribs/php-src/ext/gmp/gmp.c:1286 #7 0x5565aee96ea9 in ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:1312 #8 0x5565af144320 in execute_ex /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:56075 #9 0x5565af160f07 in zend_execute /home/dcarlier/Contribs/php-src/Zend/zend_vm_execute.h:60439 #10 0x5565aed6fafe in zend_execute_scripts /home/dcarlier/Contribs/php-src/Zend/zend.c:1842 php#11 0x5565aeae70a8 in php_execute_script /home/dcarlier/Contribs/php-src/main/main.c:2578 php#12 0x5565af532f4e in do_cli /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:964 php#13 0x5565af535877 in main /home/dcarlier/Contribs/php-src/sapi/cli/php_cli.c:1334 php#14 0x7faf6c633d67 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 php#15 0x7faf6c633e24 in __libc_start_main_impl ../csu/libc-start.c:360 php#16 0x5565adc04040 in _start (/home/dcarlier/Contribs/php-src/sapi/cli/php+0x2604040) (BuildId: 949049955bdf8b7197390b1978a1dfc3ef6fdf38) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: FPE nptl/pthread_kill.c:44 in __pthread_kill_implementation ==286922==ABORTING ``` close phpGH-16384
This is nikic#3, rebased from 7.x to master.
Can be built with
--disable-all --enable-opcache --disable-opcache-jit
.I didn't review the rebase yet, and may have introduced errors, but Zend/tests pass.
TODO:
new Foo<A,B>()
)new T()
)zend_object now has a zend_class_reference member instead of a zend_class_entry one. Existing code accessing the
ce
member was updated to useOBJ_CE()
instead, for now.The
ZEND_NEW
OP1 is now a PNR (zend_packed_name_reference
) literal, to encode generic type arguments in addition to the class name.Instantiating generic classes requires a zend_class_reference, which are allocated on the fly and cached in an hashtable (
EG(generic_class_entries)
) to reduce overhead and to simplify memory management. The lookup key is stored in PNRs during compilation (but will need to be computed at runtime for dynamic instantiations).The overhead on ZEND_NEW is light, especially after the first instantiation, and negligible for non-generic instantiations. Here are the possible pathes, from the happiest to the less happy ones:
Type argument checks are done when allocating a zend_class_reference.
OP1+1 remains a
zend_string
representation of the lower case class name. We still need this in both non-generic and generic instantiations: In the former case the PNR doesn't have a key because it's just a packedzend_string
, and in the latter case we need both the generic key (e.g.foo<bar>
) and the base key (e.g.foo
) for the unhappy path.PNR literals have a new zval type (IS_PNR), similarly to IS_ALIAS_PTR, allowing op array consumers (e.g. Opcache) to be aware of them. Currently they are always allocated during compilation, on the arena, which simplifies memory management (but still requires some special cases in Opcache). In order to support dynamic instantiations, and to remove Opcache special cases, we need to eventually refcount or intern PNR literals.
Micro benchmark shows a small (1-2%) regression in this branch when using a simple generic class compared to the base branch when using a non-generic fully typed class. The overhead can be fully attributed to type checking (zend_check_type), and can be reversed with some type checking optimizations (which also benefits non-generic code).
On complexity: