Primitive operator functions are not inlined #198

ChengCat · 2018-09-29T06:06:59Z

Compile the following program with dalec a.dt -s ir -O4 --no-dale-stdlib (Dale is compiled with LLVM 6.0.1),

(import cstdio)

(def main (fn extern-c int (void)
  (def sum (var auto \ 0))
  (for (i \ 0) (< i 500000000) (incv i)
    (setv sum (+ sum i)))
  (printf "%d\n" sum)
  0))

I get:

[...]

; Function Attrs: alwaysinline norecurse nounwind readonly
define weak_odr i32 @"_Z1$2bii"(i32 %a, i32 %b) local_unnamed_addr #0 {
entry:
  %0 = add nsw i32 %b, %a
  ret i32 %0
}

[...]

; Function Attrs: nounwind
define i32 @main() local_unnamed_addr #2 {
entry:
  %0 = tail call i8 @"_Z1$3cii"(i32 0, i32 500000000)
  %1 = and i8 %0, 1
  %2 = icmp eq i8 %1, 0
  br i1 %2, label %_dale_internal_label_breaklabel_2, label %_dale_internal_label_continuelabel_1.preheader

_dale_internal_label_continuelabel_1.preheader:   ; preds = %entry
  br label %_dale_internal_label_continuelabel_1

_dale_internal_label_continuelabel_1:             ; preds = %_dale_internal_label_continuelabel_1.preheader, %_dale_internal_label_continuelabel_1
  %.03 = phi i32 [ %3, %_dale_internal_label_continuelabel_1 ], [ 0, %_dale_internal_label_continuelabel_1.preheader ]
  %.012 = phi i32 [ %4, %_dale_internal_label_continuelabel_1 ], [ 0, %_dale_internal_label_continuelabel_1.preheader ]
  %3 = tail call i32 @"_Z1$2bii"(i32 %.03, i32 %.012)
  %4 = tail call i32 @"_Z1$2bii"(i32 %.012, i32 1)
  %5 = tail call i8 @"_Z1$3cii"(i32 %4, i32 500000000)
  %6 = and i8 %5, 1
  %7 = icmp eq i8 %6, 0
  br i1 %7, label %_dale_internal_label_breaklabel_2, label %_dale_internal_label_continuelabel_1

_dale_internal_label_breaklabel_2:                ; preds = %_dale_internal_label_continuelabel_1, %entry
  %.0.lcssa = phi i32 [ 0, %entry ], [ %3, %_dale_internal_label_continuelabel_1 ]
  %8 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @_dvidrl0, i64 0, i64 0), i32 %.0.lcssa)
  ret i32 0
}

[...]

Note those _Z1$2bii calls, the primitive operator functions are not inlined, despite being marked as alwaysinline. This leads to very bad performance, and on my machine it's 10x slower than an equivalent C program compiled with gcc.

After some investigation, I think this is caused by inefficient use of LLVM. As indicated by LLVM command line tools, LLVM can easily optimize the whole thing into printing a constant. With the following commands,

> llvm-as a.dt.ll
> opt -O1 a.dt.bc -o O1.bc
> llvm-dis O1.bc

A file O1.ll is produced:

[...]

; Function Attrs: nounwind
define i32 @main() local_unnamed_addr #2 {
_dale_internal_label_breaklabel_2:
  %0 = tail call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @_dvidrl0, i64 0, i64 0), i32 1711656320)
  ret i32 0
}

[...]

I also think --no-dale-stdlib should always be enabled. Those Dale run-time functions are very small, and without them being properly inlined, performance of both compiled programs and compilation time (to run the macros) is largely degraded.

The text was updated successfully, but these errors were encountered:

…n/static-modules settings (#198).

This means that those functions can always be linked statically, which in turn means that they are inlined regardless of optimisation settings.

tomhrr · 2018-10-06T13:43:10Z

Thanks for this: I had noticed some of these problems when working on
some recent updates to support newer versions of LLVM, but forgot to
go back and fix them afterwards. The basic arithmetical/relational
functions are now always inlined, and for the example program
(a.dt), O1 or higher will cause it to be optimised into the
printing of a constant.

tomhrr · 2018-10-07T04:48:47Z

There was an additional problem with those basic functions being retained
in the output even when they weren't needed, which has now been fixed.

ChengCat · 2018-10-08T13:26:06Z

Hi, I have just tried to help testing the fixes. Dale now works beautifully with that example. Thanks!

I noticed two minor issues.

; in a.dt
(import cstdio)
(import macros)
(import b)

(def m (macro extern (void)
  (def sum (var auto \ 0))
  (for (i \ 0) (< i 500000000) (incv i)
    (setv sum (+ sum i)))
  (std.macros.mnfv mc sum)))
(def f (fn extern int (void)
  (def sum (var auto \ 0))
  (for (i \ 0) (< i 500000000) (incv i)
    (setv sum (+ sum i)))
  sum))

(def main (fn extern-c int (void)
  (printf "%d\n" (m))
  (printf "%d\n" (f))
  (printf "%d\n" (b.m))
  (printf "%d\n" (b.f))))

; in b.dt
(module b)
(import stdlib)
(import macros)
(namespace b
  (def m (macro extern (void)
    (def sum (var auto \ 0))
    (for (i \ 0) (< i 500000000) (incv i)
      (setv sum (+ sum i)))
    (std.macros.mnfv mc sum)))
  (def f (fn extern int (void)
    (def sum (var auto \ 0))
    (for (i \ 0) (< i 500000000) (incv i)
      (setv sum (+ sum i)))
    sum)))

The above two files are compiled as:

dalec -c b.dt -O4
dalec a.dt -O4 -o a

It is slow to use a macro from the same module, leading to slow compile time. Currently, it is already efficient to use macros from any other module, so I think this is minor issue.
The basic operator functions are retained in libb.bc.

#198).

tomhrr · 2018-10-11T09:29:08Z

Thanks for the extra comments. 2 is now fixed, but I don't think there's any way to fix 1, since the slowness appears to be due to compilation of the macro.

ChengCat · 2018-10-14T06:59:38Z

I thought the slowness of 1 was mainly caused by that, JIT compilation is another code path, and inline is not properly handled in that code path. But I am not sure.

I am closing this issue for now, since it is not important anyway.

ChengCat · 2018-10-14T08:24:25Z

I forgot to say.. Thank you!

tomhrr added a commit that referenced this issue Oct 4, 2018

[gh-198] Add script to show executable size for different optimisatio…

14c7e31

…n/static-modules settings (#198).

tomhrr added a commit that referenced this issue Oct 4, 2018

[gh-198] Expand to cover different output types.

3dfc6e1

tomhrr added a commit that referenced this issue Oct 4, 2018

[gh-198] Internalise functions when using LTO.

9905c1a

tomhrr added a commit that referenced this issue Oct 4, 2018

[gh-198] Re-enable inlining.

307f733

tomhrr added a commit that referenced this issue Oct 6, 2018

[gh-198] Use standard for loop for compatibility.

59f9d6e

tomhrr added a commit that referenced this issue Oct 6, 2018

[gh-198] Internalise variables when using LTO.

2837f54

tomhrr added a commit that referenced this issue Oct 6, 2018

[gh-198] Move basic arithmetical/relational functions out of drt.

ec36560

This means that those functions can always be linked statically, which in turn means that they are inlined regardless of optimisation settings.

tomhrr added a commit that referenced this issue Oct 6, 2018

[gh-198] Fix dependencies.

8bca206

tomhrr added a commit that referenced this issue Oct 6, 2018

[gh-198] Fix dependencies (2).

61acb31

tomhrr added a commit that referenced this issue Oct 7, 2018

[gh-198] Always internalise basic arithmetical/relational functions.

7e63a0b

tomhrr added a commit that referenced this issue Oct 11, 2018

[master] Internalise basic functions in modules as well as executables (

e813064

#198).

ChengCat closed this as completed Oct 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primitive operator functions are not inlined #198

Primitive operator functions are not inlined #198

ChengCat commented Sep 29, 2018

tomhrr commented Oct 6, 2018

tomhrr commented Oct 7, 2018

ChengCat commented Oct 8, 2018

tomhrr commented Oct 11, 2018

ChengCat commented Oct 14, 2018

ChengCat commented Oct 14, 2018

Primitive operator functions are not inlined #198

Primitive operator functions are not inlined #198

Comments

ChengCat commented Sep 29, 2018

tomhrr commented Oct 6, 2018

tomhrr commented Oct 7, 2018

ChengCat commented Oct 8, 2018

tomhrr commented Oct 11, 2018

ChengCat commented Oct 14, 2018

ChengCat commented Oct 14, 2018