Make emitted egal code more loopy #54121

Keno · 2024-04-17T23:41:02Z

The strategy here is to look at (data, padding) pairs and RLE them into loops, so that repeated adjacent patterns use a loop rather than getting unrolled. On the test case from #54109, this makes compilation essentially instant, while also being faster at runtime (turns out LLVM spends a massive amount of time AND the answer is bad).

There's some obvious further enhancements possible here:

The memcmp constant is small. LLVM has a pass to inline these with better code. However, we don't have it turned on. We should consider vendoring it, though we may want to add some shorcutting to it to avoid having it iterate through each function.
This only does one level of sequence matching. It could be recursed to turn things into nested loops.

However, this solves the immediate issue, so hopefully it's a useful start. Fixes #54109.

gbaraldi · 2024-04-18T12:14:54Z

We do run the pass that inlines memcmps, but it's a backend pass. I've been chipping away at moving it to the middle end in llvm/llvm-project#77370, but it still needs some love

Keno · 2024-04-18T12:18:03Z

The probably explains why the performance is better than I expected

gbaraldi · 2024-04-18T17:01:08Z

I played a bit with it here #52719, which is what made me start doing the LLVM work

Keno · 2024-04-21T19:50:27Z

@nanosoldier runtests()

nanosoldier · 2024-04-22T14:32:30Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

gbaraldi · 2024-04-22T14:33:58Z

@nanosoldier runbenchmarks(ALL, vs=":master")

nanosoldier · 2024-04-22T22:51:04Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

gbaraldi · 2024-04-23T01:57:35Z

Performance looks good

Keno · 2024-04-23T03:19:54Z

Performance is fine, but as shown in pkgeval, we're missing some edge cases, since we're double booking the meaning of has padding to also mean bots comparable. I need to think about whether to implement the edge case or just fall back.

Keno · 2024-04-24T05:52:16Z

@nanosoldier runtests(["ReduceWindows", "NonlinearSystems", "StatsModels", "GeoParquet", "GeoStatsModels", "MicroCanonicalHMC", "DistributedStwdLDA", "Agents", "SurfaceCoverage", "JumpProblemLibrary", "SDEProblemLibrary", "ConceptualClimateModels", "ReactionNetworkImporters", "Phylo", "ModelOrderReduction", "LibTrixi", "ReactionSensitivity", "Biofilm", "Turkie", "BloqadeODE", "IonSim"])

nanosoldier · 2024-04-24T11:12:29Z

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

The strategy here is to look at (data, padding) pairs and RLE them into loops, so that repeated adjacent patterns use a loop rather than getting unrolled. On the test case from #54109, this makes compilation essentially instant, while also being faster at runtime (turns out LLVM spends a massive amount of time AND the answer is bad). There's some obvious further enhancements possible here: 1. The `memcmp` constant is small. LLVM has a pass to inline these with better code. However, we don't have it turned on. We should consider vendoring it, though we may want to add some shorcutting to it to avoid having it iterate through each function. 2. This only does one level of sequence matching. It could be recursed to turn things into nested loops. However, this solves the immediate issue, so hopefully it's a useful start. Fixes #54109.

vtjnash

Very fancy! Sgtm (though I didn't review the recursive algorithm fully, I trust our tests would have caught any issues with it)

vtjnash · 2024-04-26T03:08:12Z

src/datatype.c

@@ -621,18 +624,17 @@ void jl_compute_field_offsets(jl_datatype_t *st)
        // if we have no fields, we can trivially skip the rest
        if (st == jl_symbol_type || st == jl_string_type) {
            // opaque layout - heap-allocated blob
-            static const jl_datatype_layout_t opaque_byte_layout = {0, 0, 1, -1, 1, {0}};
+            static const jl_datatype_layout_t opaque_byte_layout = {0, 0, 1, -1, 1, { .haspadding = 0, .fielddesc_type=0, .isbitsegal=1, .arrayelem_isboxed=0, .arrayelem_isunion=0 }};


I thought this was a GCC extension. Did it finally make it into the standard with reasonable behavior? It is certainly much better syntax for readability/maintainability!

I believe it's in C99. The C++ version was a GCC extension until C++20, so that may be what you're thinking of.

vtjnash · 2024-04-26T03:20:21Z

src/codegen.cpp

+                // Emit memcmp. TODO: LLVM has a pass to expand this for additional
+                // performance.
+                Value *this_answer = ctx.builder.CreateCall(prepare_call(memcmp_func),
+                    { ptr1,


This load from a gc value appears to be unrooted (as required by the ABI for memcp). Please copy the code from the other memcmp branch to add the roots to this (unless I missed something)

As requested in post-commit review on #54121.

oscardssmith added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels Apr 23, 2024

Keno force-pushed the kf/loopyegal branch 2 times, most recently from b5a5ea1 to fe67c79 Compare April 24, 2024 05:46

Keno force-pushed the kf/loopyegal branch 3 times, most recently from ee760e0 to 1ac52df Compare April 25, 2024 18:13

Keno force-pushed the kf/loopyegal branch from 1ac52df to f694f8f Compare April 25, 2024 19:42

Keno merged commit 50833c8 into master Apr 25, 2024
5 of 7 checks passed

Keno deleted the kf/loopyegal branch April 25, 2024 23:21

vtjnash reviewed Apr 26, 2024

View reviewed changes

Keno added a commit that referenced this pull request Apr 26, 2024

Add root bundle to loopy egal

bebd307

As requested in post-commit review on #54121.

Keno mentioned this pull request Apr 26, 2024

Add root bundle to loopy egal #54274

Merged

Keno added a commit that referenced this pull request Apr 26, 2024

Add root bundle to loopy egal (#54274)

0b7ee9d

As requested in post-commit review on #54121.

KristofferC mentioned this pull request May 29, 2024

Experiment with always using memcmp #52719

Closed

vtjnash mentioned this pull request Aug 3, 2024

codegen: take gc roots (and alloca alignment) more seriously #55336

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make emitted egal code more loopy #54121

Make emitted egal code more loopy #54121

Keno commented Apr 17, 2024

gbaraldi commented Apr 18, 2024

Keno commented Apr 18, 2024

gbaraldi commented Apr 18, 2024

Keno commented Apr 21, 2024

nanosoldier commented Apr 22, 2024

gbaraldi commented Apr 22, 2024

nanosoldier commented Apr 22, 2024

gbaraldi commented Apr 23, 2024

Keno commented Apr 23, 2024

Keno commented Apr 24, 2024

nanosoldier commented Apr 24, 2024

vtjnash left a comment

vtjnash Apr 26, 2024

Keno Apr 26, 2024

vtjnash Apr 26, 2024

Make emitted egal code more loopy #54121

Make emitted egal code more loopy #54121

Conversation

Keno commented Apr 17, 2024

gbaraldi commented Apr 18, 2024

Keno commented Apr 18, 2024

gbaraldi commented Apr 18, 2024

Keno commented Apr 21, 2024

nanosoldier commented Apr 22, 2024

gbaraldi commented Apr 22, 2024

nanosoldier commented Apr 22, 2024

gbaraldi commented Apr 23, 2024

Keno commented Apr 23, 2024

Keno commented Apr 24, 2024

nanosoldier commented Apr 24, 2024

vtjnash left a comment

Choose a reason for hiding this comment

vtjnash Apr 26, 2024

Choose a reason for hiding this comment

Keno Apr 26, 2024

Choose a reason for hiding this comment

vtjnash Apr 26, 2024

Choose a reason for hiding this comment