Proposal: Use value types to store query comprehensions' intermediate variables #372

benjamin-hodgson · 2017-03-28T13:35:58Z

Problem statement

In high-throughput situations it's often desirable to minimise garbage. You often see advice like "don't use LINQ in hot code because it allocates a lot". One reason for this is that query comprehensions are translated to the query pattern using anonymous objects which live on the heap.

var q =
    from x in xs
    from y in ys
    from z in zs
    select x + y + z;
// translates to...
var q = xs
    .SelectMany(x => ys, (x, y) => new { x, y })  // new { x, y } is a reference type that lives on the heap
    .SelectMany(dummy => zs, (dummy, z) => dummy.x + dummy.y + z);

(Of course in practice dummy will be a transparent identifier.) While dummy will often be short-lived and won't survive the nursery, if your goal is to minimise garbage it's still preferable to avoid allocating it altogether.

You can achieve this by writing your query manually and storing intermediate variables in a custom value type. The example below will run with O(1) allocations:

var q = xs
    .SelectMany(x => ys, (x, y) => new MyStruct(x, y))
    .SelectMany(dummy => zs, (dummy, z) => dummy.x + dummy.y + z);
// or
var q =
    from dummy in (
        from x in xs
        from y in ys
        select new MyStruct(x, y)
    )
    from z in zs
    select dummy.x + dummy.y + z;

struct MyStruct
{
    public int x { get; }
    public int y { get; }
    public MyStruct(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}

When your query is long or complicated this translation gets rather tedious rather quickly (although C#7's new ValueTuple certainly eases some of the pain). I'd like to be able to use the nice original query syntax but be confident that it won't allocate a lot at run time.

Proposed solution

My proposal is to (optionally) translate the original query into one which looks like the manually-written version, by generating MyStruct at compile time, much like how anonymous objects already work.

It's not always desirable to use value types - it can be expensive to copy large value types around, and existing query providers may not understand expressions that don't use anonymous objects. So I propose having this behaviour disabled by default. Users can enable the value-type translation on a per-method level using an attribute:

[StructQueries]  // all query comprehensions in this method will use an anonymous value type for their intermediate identifiers
public void MyMethod()
{
    var q =
        from x in xs
        from y in ys
        from z in zs
        select x + y + zs;
}

// translates to...
[StructQueries]
public void MyMethod()
{
    var q =
        .SelectMany(x => ys, (x, y) => new <>AnonymousStruct0(x, y))
        .SelectMany(dummy => zs, (dummy, z) => dummy.x + dummy.y + z);
}

[CompilerGenerated]
struct <>AnonymousStruct0
{
    public int x { get; }
    public int y { get; }
    public MyStruct(int x, int y)
    {
        this.x = x;
        this.y = y;
    }
}

The text was updated successfully, but these errors were encountered:

HaloFour · 2017-03-28T13:40:46Z

On the Roslyn repo it was mentioned that the compiler may switch to using tuples instead of anonymous types for range projections:

public void MyMethod()
{
    var q =
        from x in xs
        from y in ys
        from z in zs
        select x + y + zs;
}

public void MyMethod()
{
    var q =
        .SelectMany(x => ys, (x, y) => (x, y))
        .SelectMany(tuple => zs, (tuple, z) => tuple.Item1 + tuple.Item2 + z);
}

That would get you the performance benefit you seek effectively for free and without requiring an opt-in.

HaloFour · 2017-03-28T13:41:42Z

Here's the Roslyn conversation:

dotnet/roslyn#8192

/cc @gafter

benjamin-hodgson · 2017-03-28T13:41:47Z

@HaloFour That'd be awesome! It's more or less exactly what I'm looking for. (The main downside is that it requires you to depend on ValueTuple, but that's not so bad.) Do you have a link to that comment?

Edit: Thanks for the link! 😄

svick · 2017-03-28T16:07:11Z

I think that doing this is a good idea, but it should not effect Expression-based LINQ providers at all. They generally translate the query to some other form, so they wouldn't benefit from switching to a value type. So this would be a pretty big breaking change with no benefits that I can see.

Because of that, I don't see much point in making this opt-in. It might be worth considering if a value type should be used only when the copying overhead is not significant, based on some heuristic. But I think the compiler is likely going to have a better chance of making the right decision in the common case. (And if performance matters to you so much that you actually care about this in a specific piece of code, you're probably still better of not using LINQ there in the first place.)

bbarry · 2017-03-28T18:24:37Z

I think if the compiler can detect use of LINQ statements on IEnumerable<T> in ways that would not compile to Expression instances, it should perform a number of optimizations on them including use of stack variables and replacing lambdas with iterative code using local method variables. Conversions for eager invocations could be done in place, while lazy ones could compile to a hidden method to delay the evaluation.

As it stands today, codebases are actively avoiding usage of LINQ when under this condition (or occasionally banning it altogether).

HaloFour · 2017-03-28T18:32:48Z

@bbarry

That is several orders of magnitude more involved. Currently the compiler has zero knowledge of IEnumerable<T> when it comes to LINQ. The entire feature is based around extension methods and lambdas. Even when using the query syntax the compiler will blindly attempt to apply the expected methods, which is why you can write partial implementations of LINQ on top of any object. The compiler has no clue what Select or Where does, it only cares that it can emit an attempt to call such an instance (or extension) method with the expected lambda as an argument.

The only thing that the compiler cares about when it comes to determining whether to emit lambdas as functions and delegates is whether the argument accepts a delegate directly, or if it accepts an Expression<T>. If the latter, the compiler will emit an expression tree. The Queryable extension methods all accept Expression<T> instead of the normal delegates, which is what the Enumerable extension methods all accept.

orthoxerox · 2017-03-28T19:21:16Z

@HaloFour the compiler can very well lower the calls on IEnumerable<T> and emit a loop or a generator. https://github.com/antiufo/roslyn-linq-rewrite does exactly that, after all.

benjamin-hodgson · 2017-03-28T23:26:47Z

@orthoxerox I think @HaloFour's point wasn't that inlining iterators and lambdas is impossible or a bad idea, just that it's much more complex to implement in Roslyn than this (modest) proposal. In any case I think it's an altogether different feature than stack-allocating intermediate results, and it deserves its own issue.

HaloFour · 2017-03-28T23:52:29Z

@orthoxerox

Sure, the compiler could do lots of things. But every time this has come up it seems that the compiler team has been very reluctant to consider this. It would effectively mean baking all of the implementation details of LINQ directly into the language and the spec, which would be a large undertaking. The opinion seems to be that such optimization belongs in the JIT or some ahead-of-time compilation/optimization.

That said, I wouldn't at all be opposed to the compiler team taking up such an endeavor. I am a fan of improved performance. 😄

gafter · 2017-03-29T00:48:30Z

Is there any language issue here? It sounds more like a compiler feature being requested.

HaloFour · 2017-03-29T01:10:00Z

@gafter

As this request doesn't affect the syntax of the language at all, nor does it involve any modifications to the language spec, I assume that it belongs on the Roslyn repo? In that case a ping on dotnet/roslyn#8192 seems sufficient.

gafter · 2017-03-29T03:31:12Z

Yes, I believe that dotnet/roslyn#8192 is the right place to continue this discussion.

jnm2 · 2018-10-04T12:47:24Z

Fyi, work has started on stack-allocated objects. dotnet/coreclr#20251

benjamin-hodgson mentioned this issue Mar 28, 2017

Use (value) tuples for query translation dotnet/roslyn#8192

Open

gafter closed this as completed Mar 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Use value types to store query comprehensions' intermediate variables #372

Proposal: Use value types to store query comprehensions' intermediate variables #372

benjamin-hodgson commented Mar 28, 2017 •

edited

Loading

HaloFour commented Mar 28, 2017

HaloFour commented Mar 28, 2017

benjamin-hodgson commented Mar 28, 2017 •

edited

Loading

svick commented Mar 28, 2017

bbarry commented Mar 28, 2017

HaloFour commented Mar 28, 2017

orthoxerox commented Mar 28, 2017

benjamin-hodgson commented Mar 28, 2017 •

edited

Loading

HaloFour commented Mar 28, 2017

gafter commented Mar 29, 2017

HaloFour commented Mar 29, 2017

gafter commented Mar 29, 2017

jnm2 commented Oct 4, 2018

Proposal: Use value types to store query comprehensions' intermediate variables #372

Proposal: Use value types to store query comprehensions' intermediate variables #372

Comments

benjamin-hodgson commented Mar 28, 2017 • edited Loading

Problem statement

Proposed solution

HaloFour commented Mar 28, 2017

HaloFour commented Mar 28, 2017

benjamin-hodgson commented Mar 28, 2017 • edited Loading

svick commented Mar 28, 2017

bbarry commented Mar 28, 2017

HaloFour commented Mar 28, 2017

orthoxerox commented Mar 28, 2017

benjamin-hodgson commented Mar 28, 2017 • edited Loading

HaloFour commented Mar 28, 2017

gafter commented Mar 29, 2017

HaloFour commented Mar 29, 2017

gafter commented Mar 29, 2017

jnm2 commented Oct 4, 2018

benjamin-hodgson commented Mar 28, 2017 •

edited

Loading

benjamin-hodgson commented Mar 28, 2017 •

edited

Loading

benjamin-hodgson commented Mar 28, 2017 •

edited

Loading