Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor decompilation of nested logical expressions #1133

Closed
Chicken-Bones opened this issue May 13, 2018 · 10 comments
Closed

Poor decompilation of nested logical expressions #1133

Chicken-Bones opened this issue May 13, 2018 · 10 comments

Comments

@Chicken-Bones
Copy link
Contributor

Chicken-Bones commented May 13, 2018

As the number of nested || and && operators within a statement increases, the decompiler makes increasingly questionable decisions on how to represent the final control flow.

There are many semantically equivalent representations, and some of them can only be decided via heuristics. For a simple example if(!a) return; vs nesting the entire method in if (a) {}. However the comparison tests below leave plenty of room for improvement.

Basic if statement side by side http://www.mergely.com/RkWr5GxE/
It is reasonable to expect the decompiler to pick a simplified control flow representation, and to favour nested if statements over arbitrarily long and complex expressions, however the introduction of gotos in tests 4a and 5 shows that even some simple patterns which you would expect to a code review are badly handled.

If the expression target is not a branch condition, then it handles better, but there's still room for improvement http://www.mergely.com/xWlXZH7o/

Referencing #1094
ILAst analysis to come

@dgrunwald
Copy link
Member

The current support for short-circuit operators is based on few hardcoded code patterns in ConditionDetection.HandleIfInstruction().
It might be that there's just a few patterns missing until the combination of patterns can handle the general case, or it might be that the current approach isn't viable and needs to be replaced altogether...

@Chicken-Bones
Copy link
Contributor Author

Further analysis reveals that these expressions always result in a pattern of two successive if statements with the same branch target, which could be merged with logic.or

Example:

	if (a && b)
		goto IL_0020;
	if (c && d)
		goto IL_0020;
	Console.WriteLine(2);
	return;
	IL_0020:
	Console.WriteLine(1);
	BlockContainer {
		Block IL_0000 (incoming: 1) {
			if (logic.and(ldfld a(ldloc this), ldfld b(ldloc this))) Block IL_0008 {
				br IL_0020
			}
			if (logic.and(ldfld c(ldloc this), ldfld d(ldloc this))) br IL_0020
			call WriteLine(ldc.i4 2)
			leave IL_0000 (nop)
		}

		Block IL_0020 (incoming: 2) {
			call WriteLine(ldc.i4 1)
			leave IL_0000 (nop)
		}
	}

I'll test a fix and submit a PR.
I might also propose some heuristic for when to early return vs inverting the condition and in-lining the else block

@Chicken-Bones
Copy link
Contributor Author

merging successive inlined ifs works well, but matching and transformation flow of ConditionDetection will need to be split up and reorganised as too many steps need to be re-run with slightly altered assumptions after the transform. Is there anything I should be aware of when doing this, or just hope the tests cover enough?

Subquestion: Is there any need to keep single branch blocks when transforming, or can the result of UnpackBlockContainingOnlyBranch be inserted directly as IfInstruction.TrueInst

@dgrunwald
Copy link
Member

In addition to the tests, what we're usually doing with significant decompiler changes:
We decompile a bunch of assemblies (usually the RoundtripTests, sometimes some .NET System.* assemblies) with both the old and new ILSpy code version, then take a look at the resulting diff.

Subquestion: Is there any need to keep single branch blocks when transforming, or can the result of UnpackBlockContainingOnlyBranch be inserted directly as IfInstruction.TrueInst

For branch instructions there shouldn't be a difference. In general, Block { inst } differs from inst in that the block always evaluates to type void, but the instruction within the block might have a different result type (if the return value is ignored). Since ILAst if acts like the C# ?: operator when the TrueInst/FalseInst evaluate to a value, removing a block might end up changing the semantics. (it'll likely result in an assertion if the TrueInst and FalseInst produce different types)

@Chicken-Bones
Copy link
Contributor Author

I had a lot of success adding better pattern matching cases and inversions to the ConditionDetection, however on review it was hard to determine the necessity of when to recheck for or perform certain simplifications or inversions so I'm now looking at a more fundamental approach based on just inlines, inversions and removing redundant gotos. Currently all the commented test cases in Loops.cs succeed, as well as some more complex tests of my own but it's not quite complete.

Are there any other known, previously marked as "TODO" control flow tests?

@dgrunwald
Copy link
Member

I'm not aware of any other than those you found in Loops.cs.
I guess an easy way to find additional problematic control flow would be to decompile some assemblies and search for goto in the decompiled code (though you'll have to compare with the original C# code -- sometimes there really was a goto in there).

@Chicken-Bones
Copy link
Contributor Author

Chicken-Bones commented May 23, 2018

I'm in the process of organising tests and I'm wondering if there's a way to add "counter cases" things which produce a known different output when decompiled. For example, one of the restored test cases

public int InterestingLoop()
{
	int num = 0;
	if (num % 11 == 0) {
		while (true) {
			if (num % 4 == 0) {
				if (num % 7 == 0) {
					if (num % 11 == 0) {
						// use a continue here to prevent moving the if (i%7) outside the loop
						continue;
					}
					Console.WriteLine("7");
				} else {
					// this block is not part of the natural loop
					Console.WriteLine("!7");
				}
				break;
			}
			num++;
		}
		// This instruction is still dominated by the loop header
		num = -2147483648;//int.MinValue;
	}
	return num;
}

actually decompiles to

public int InterestingLoop()
{
	int num = 0;
	if (num % 11 == 0)
	{
		while (true)
		{
			if (num % 4 == 0)
			{
				if (num % 7 != 0)
				{
					Console.WriteLine("!7");
					break;
				}
				if (num % 11 != 0)
				{
					Console.WriteLine("7");
					break;
				}
			}
			else
				num++;
		}
		num = -2147483648;
	}
	return num;
}

Which is actually quite good to see it successfully removes use of "continue" as a priority.

Another example would be the switch

switch (I) {
	case 1:
	case 4:
	case 7:
		Console.WriteLine();
		break;
}

decompiles to

if (i == 1 || i == 4 || i == 7)
	Console.WriteLine();

@Chicken-Bones
Copy link
Contributor Author

Chicken-Bones commented May 23, 2018

Additionally, the

public void WhileWithGoto()
{
	while (Condition("Main Loop")) {
		if (Condition("Condition")) {
			goto IL_000f;
		}
		goto IL_0026;
		IL_000f:
		Console.WriteLine("Block1");
		if (Condition("Condition2")) {
			continue;
		}
		goto IL_0026;
		IL_0026:
		Console.WriteLine("Block2");
		goto IL_000f;
	}
}

test compiles fine, but there's no label matching, it'd be nice if there was a way to specify a given test will only pass exactly with certain compile/debug configurations

@Chicken-Bones
Copy link
Contributor Author

Chicken-Bones commented May 23, 2018

Finally, is there a way to specify that a test won't pass on certain compiler combinations? For example, often generated temporaries in non-optimised code make it impossible to perform certain transformations.

Edit: I was able to change the transformation pipeline slightly so that these tests pass again

@dgrunwald
Copy link
Member

You can use #if in pretty tests to include a test only with certain compilers. (e.g. the names of temporaries might change with #if ROSLYN && OPT)
See Tester.GetPreprocessorSymbols for the full list.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants