IOperation representational issues when raising IL #8640

nguerrera · 2016-02-12T19:20:13Z

This is a list of the representational issues that I've encountered with raising IL method bodies to IOperation trees.

I believe that we can add expressiveness to IOperation after the initial release, but I'd like some eyes on this list to help validate that there isn't anything that would require a breaking design-change before v1.

See microsoft/binskim#10 for the (still prototype quality) IL -> IOperation code that triggered these obervations. In particular, ILOperations.Custom.cs shows the nodes that I've invented to cover the gaps below.

cc @JohnHamby @genlu @jaredpar

Operations that are possible in C# but not modeled by IOperation

Bugs are filed for these because there should not (even in v1) be valid C# syntax that cannot be surfaced as IOperation.

stackalloc (stackalloc should have a public-facing IOperation representation #8392)
checked (IOperation nodes should indicate when they are checked for arithmetic overflow #8395)
__makeref, __refvalue, __reftype and __arglist (Should __arglist, __makeref, __refvalue, __reftype have public IOperation nodes? #8394)

Nodes for these would be useful for corresponding IL instructions.

By-ref handling

In IL, managed pointers (type &) are basically just like unmanaged pointers except they are reported to the GC. Passing something by reference is explicitly done by taking its address as a managed pointer and passing that. This is distinct from the IOperation model where the by-ref semantics are captured by how IReferenceExpressions are used (e.g. arguments to IParameterSymbol with IsByRef == true or as a value type IInvocationExpresion.Instance).

For the common cases, we can (and do) raise the IL up to the IOperation model, but there are a few cases that are problematic:

By-ref locals and returns

This should be solved once Ref returns and Locals #8030 lands. There will be a few amendments to IOperation needed to accommodate it. Discussion on that is in progress on the PR.

Note that this is not just an issue for when the actual IL has byref locals and returns, but also for when we need to put managed pointer stack values in to generated temporaries to deal with branching with a non-empty stack.
By-ref pointer arithmetic and comparison

Right now, I've taken the approach of adding explicit conversions to unmanaged pointer types for these cases, but it incorrectly looks as though GC tracking would stop.

Exception handling

Fault blocks

A fault block is effectively a finally block that only runs when an exception is raised. There is no direct C# or VB equivalent.

This is close:

try { 
    ... 
} catch {
    ...
    throw;
}

but gets the semantics as to when the exception is considered unhandled wrong.

This is closer:

    bool succeeded = false;
    try {
        ...
        succeeded = true;
    } finally {
        if (!succeeded) {
            ...
        }
    }

but a succeeded = true would need to be injected at every branch out (leave instruction) in the try block, which would be painful to implement and difficult for analyzers to reason about.

Given that, it would be best IMHO to just model fault blocks directly in IOperation. Adding IBlockStatement ITryStatement.FaultHandler would work. It adds a slight wrinkle of having to define an ordering between fault and finally if both were present on the same try node, but I think it's perfectly intuitive to say that finally is always last. (Nested tries can be used to force a different order.)

Filters

EDIT: This section was written before seeing PR #8624, which changed the filter to an IOperation and blurred expression/statement boundary). I have tentatively represented this as a BlockStatement node with a non-null Type and the implied semantics that the type of the final operation in the block is its value in an expression context. I am still discussing on #8624 to understand if this is a suitable direction in general (reusing XxxStatement nodes as expressions).

ICatch has IExpression Filter, which is sufficient for C# or VB's usage, but an arbitrary filter in IL cannot necessarily be expressed as a single IExpression.

We could resolve this with a new node of the form:

    interface IBlockExpression : IExpression {
        IBlockStatement Block { get; }
        IExpression Expression { get; }
    }

The semantics would be to evaluate Block, then Expression for its result.

Alternatively, we could move the filter body into a lambda and invoke it from the filter:

    Func<Exception, bool> filter = ex => { 
        ...; 
    }
    try {
       ...
    } catch (Exception ex) when(filter(ex)) {
       ...
    }

The latter would capture the semantics and decompile to C#, but convey an expense (closure + delegate allocation, invocation) that does not exist in the IL.

In general, we have to decide if we want to make these sorts of transformations that lose fidelity to the IL. I'd prefer that we didn't. I can see use cases evolving (decompilation) where we'd want to raise the tree up to something C# can produce to the full extent possible, but I think those should be handled with separate, optional passes.

Opcodes that have no direct C#/VB/IOperation equivalents

I've noted possible ways to raise them to higher level constructs, but again I would prefer that there were explicit nodes for them.

"unordered" floating point comparison

There are no BinaryOperationKinds to express the different handling of floating point comparisons vs. NaN values when e.g. cgt.un is used in place of cgt to compare floating point values.

Could possibly add explicit NaN checks instead.

unbox instruction

Unlike the unbox.any variant, unbox puts a by-ref to the value on
the heap on to the stack. It is therefore not just a conversion from
reference type to value type.

Adding interface IUnboxReferenceExpression : IReferenceExpression could
resolve this.

ldftn/ldvirtftn instructions

Although they can be pattern-matched to delegate creation in the common case, we can't represent arbitrary use with the current set of nodes.

calli instruction

I don't see an alternative to having an explict node.

jmp instruction

Could possibly raise as an explicit tail call (but see tail.) below.

cpblk instructions

Could possibly raise as naive loop or Buffer.BlockCopy call.

ckfinite instruction

Could possibly raise as explicit:

    if (Double.IsNan(x) || Double.IsInfinity(x)) 
        throw new ArithmeticException());

initblk instruction

Could possibly raise as naive loop.

break instruction

Could possibly raise as call to Debugger.Break.

ldtoken instruction

For types, We can pattern match to typeof(X) and fall back to typeof(X).TypeHandle, but there is no way to represent loading field and method tokens.

volatile. prefix

Could possibly raise as Volatile.Read/Write calls.

unaligned. prefix

Might be OK to ignore.

tail. prefix

If not represented, analyzers would be given a false impression of stack growth.

no.{typecheck|rangecheck|nullcheck} prefix

Probably OK to ignore. It is only well-defined to use these when the skipped check could not have failed. A CLI implementation is allowed to ignore them.

readonly. prefix

Probably OK to ignore. It would just put limitations on the subsequent use that would not be infringed upon by valid IL.

The text was updated successfully, but these errors were encountered:

ManishJayaswal · 2016-05-05T22:16:58Z

@JohnHamby

nguerrera · 2016-08-02T19:16:02Z

For byref arithmetic and comparison, we can consider raising calls to https://github.com/dotnet/corefx/issues/10451

nguerrera added the Area-Analyzers label Feb 12, 2016

nguerrera self-assigned this Feb 12, 2016

nguerrera added the Discussion label Feb 12, 2016

genlu added the Feature - IOperation IOperation label Feb 18, 2016

srivatsn added Area-Interactive and removed Area-Analyzers labels May 3, 2016

jinujoseph modified the milestone: 15.3 Mar 10, 2017

jinujoseph added the Area-Analyzers label Mar 13, 2017

jinujoseph removed the Area-Interactive label Mar 31, 2017

jinujoseph mentioned this issue Apr 19, 2017

[IOperation] API/Design Review #18066

Closed

48 tasks

jinujoseph modified the milestones: Unknown, 15.3 Apr 19, 2017

jinujoseph added Concept-API This issue involves adding, removing, clarification, or modification of an API. and removed Discussion labels Apr 19, 2017

jinujoseph mentioned this issue Apr 19, 2017

Implement Ioperation Parent pointer #18601

Closed

2 tasks

jinujoseph added the 0 - Backlog label Apr 26, 2017

jinujoseph removed the 0 - Backlog label Sep 29, 2017

jinujoseph unassigned nguerrera Jan 17, 2018

jinujoseph modified the milestones: Unknown, 16.0 Jun 23, 2018

jinujoseph added the _Product-level triaged label Jun 23, 2018

jinujoseph modified the milestones: 16.0, 16.1 Jan 16, 2019

jinujoseph modified the milestones: 16.1, Backlog Feb 25, 2019

jinujoseph removed the _Product-level triaged label Feb 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IOperation representational issues when raising IL #8640

IOperation representational issues when raising IL #8640

nguerrera commented Feb 12, 2016

ManishJayaswal commented May 5, 2016

nguerrera commented Aug 2, 2016

IOperation representational issues when raising IL #8640

IOperation representational issues when raising IL #8640

Comments

nguerrera commented Feb 12, 2016

Operations that are possible in C# but not modeled by IOperation

By-ref handling

Exception handling

Fault blocks

Filters

Opcodes that have no direct C#/VB/IOperation equivalents

"unordered" floating point comparison

unbox instruction

ldftn/ldvirtftn instructions

calli instruction

jmp instruction

cpblk instructions

ckfinite instruction

initblk instruction

break instruction

ldtoken instruction

volatile. prefix

unaligned. prefix

tail. prefix

no.{typecheck|rangecheck|nullcheck} prefix

readonly. prefix

ManishJayaswal commented May 5, 2016

nguerrera commented Aug 2, 2016