IOperation representational issues when raising IL #8640
Labels
Area-Analyzers
Concept-API
This issue involves adding, removing, clarification, or modification of an API.
Feature - IOperation
IOperation
Milestone
This is a list of the representational issues that I've encountered with raising IL method bodies to IOperation trees.
I believe that we can add expressiveness to IOperation after the initial release, but I'd like some eyes on this list to help validate that there isn't anything that would require a breaking design-change before v1.
See microsoft/binskim#10 for the (still prototype quality) IL -> IOperation code that triggered these obervations. In particular, ILOperations.Custom.cs shows the nodes that I've invented to cover the gaps below.
cc @JohnHamby @genlu @jaredpar
Operations that are possible in C# but not modeled by IOperation
Bugs are filed for these because there should not (even in v1) be valid C# syntax that cannot be surfaced as IOperation.
stackalloc
(stackalloc should have a public-facing IOperation representation #8392)checked
(IOperation nodes should indicate when they are checked for arithmetic overflow #8395)__makeref
,__refvalue
,__reftype
and__arglist
(Should __arglist, __makeref, __refvalue, __reftype have public IOperation nodes? #8394)Nodes for these would be useful for corresponding IL instructions.
By-ref handling
In IL, managed pointers (type
&
) are basically just like unmanaged pointers except they are reported to the GC. Passing something by reference is explicitly done by taking its address as a managed pointer and passing that. This is distinct from the IOperation model where the by-ref semantics are captured by how IReferenceExpressions are used (e.g. arguments toIParameterSymbol
withIsByRef == true
or as a value typeIInvocationExpresion.Instance
).For the common cases, we can (and do) raise the IL up to the IOperation model, but there are a few cases that are problematic:
By-ref locals and returns
This should be solved once Ref returns and Locals #8030 lands. There will be a few amendments to IOperation needed to accommodate it. Discussion on that is in progress on the PR.
Note that this is not just an issue for when the actual IL has byref locals and returns, but also for when we need to put managed pointer stack values in to generated temporaries to deal with branching with a non-empty stack.
By-ref pointer arithmetic and comparison
Right now, I've taken the approach of adding explicit conversions to unmanaged pointer types for these cases, but it incorrectly looks as though GC tracking would stop.
Exception handling
Fault blocks
A fault block is effectively a finally block that only runs when an exception is raised. There is no direct C# or VB equivalent.
This is close:
but gets the semantics as to when the exception is considered unhandled wrong.
This is closer:
but a
succeeded = true
would need to be injected at every branch out (leave instruction) in the try block, which would be painful to implement and difficult for analyzers to reason about.Given that, it would be best IMHO to just model fault blocks directly in IOperation. Adding
IBlockStatement ITryStatement.FaultHandler
would work. It adds a slight wrinkle of having to define an ordering between fault and finally if both were present on the same try node, but I think it's perfectly intuitive to say that finally is always last. (Nested tries can be used to force a different order.)Filters
EDIT: This section was written before seeing PR #8624, which changed the filter to an IOperation and blurred expression/statement boundary). I have tentatively represented this as a BlockStatement node with a non-null Type and the implied semantics that the type of the final operation in the block is its value in an expression context. I am still discussing on #8624 to understand if this is a suitable direction in general (reusing
XxxStatement
nodes as expressions).ICatch
hasIExpression Filter
, which is sufficient for C# or VB's usage, but an arbitrary filter in IL cannot necessarily be expressed as a singleIExpression
.We could resolve this with a new node of the form:
The semantics would be to evaluate
Block
, thenExpression
for its result.Alternatively, we could move the filter body into a lambda and invoke it from the filter:
The latter would capture the semantics and decompile to C#, but convey an expense (closure + delegate allocation, invocation) that does not exist in the IL.
In general, we have to decide if we want to make these sorts of transformations that lose fidelity to the IL. I'd prefer that we didn't. I can see use cases evolving (decompilation) where we'd want to raise the tree up to something C# can produce to the full extent possible, but I think those should be handled with separate, optional passes.
Opcodes that have no direct C#/VB/IOperation equivalents
I've noted possible ways to raise them to higher level constructs, but again I would prefer that there were explicit nodes for them.
"unordered" floating point comparison
There are no
BinaryOperationKind
s to express the different handling of floating point comparisons vs. NaN values when e.g.cgt.un
is used in place ofcgt
to compare floating point values.Could possibly add explicit NaN checks instead.
unbox instruction
Unlike the
unbox.any
variant,unbox
puts a by-ref to the value onthe heap on to the stack. It is therefore not just a conversion from
reference type to value type.
Adding
interface IUnboxReferenceExpression : IReferenceExpression
couldresolve this.
ldftn/ldvirtftn instructions
Although they can be pattern-matched to delegate creation in the common case, we can't represent arbitrary use with the current set of nodes.
calli instruction
I don't see an alternative to having an explict node.
jmp instruction
Could possibly raise as an explicit tail call (but see
tail.
) below.cpblk instructions
Could possibly raise as naive loop or Buffer.BlockCopy call.
ckfinite instruction
Could possibly raise as explicit:
initblk instruction
Could possibly raise as naive loop.
break instruction
Could possibly raise as call to
Debugger.Break
.ldtoken instruction
For types, We can pattern match to typeof(X) and fall back to typeof(X).TypeHandle, but there is no way to represent loading field and method tokens.
volatile. prefix
Could possibly raise as
Volatile.Read/Write
calls.unaligned. prefix
Might be OK to ignore.
tail. prefix
If not represented, analyzers would be given a false impression of stack growth.
no.{typecheck|rangecheck|nullcheck} prefix
Probably OK to ignore. It is only well-defined to use these when the skipped check could not have failed. A CLI implementation is allowed to ignore them.
readonly. prefix
Probably OK to ignore. It would just put limitations on the subsequent use that would not be infringed upon by valid IL.
The text was updated successfully, but these errors were encountered: