Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #642 Bad performance/memory leak for long list literals #643

Merged
merged 1 commit into from
Nov 15, 2019

Conversation

folkertdev
Copy link
Contributor

A one-character fix that speeds up the formatting of long literals
and drastically reduces memory usage.

A very simple example is

module Small exposing (x)

x =
    [ 1, 2, 3
     , 4]

But with a list literal of many thousands of elements. The newline
forces all elements to get their own line. This is implemented as a
fold, but in practice this would often perform

longList ++ shortList

By reversing the fold (from foldl to foldr) that turns into

shortList ++ longList

Which requires less traversals and memory.

There are test files and profiling outputs in the issue #642.

A one-character fix that speeds up the formatting of long literals
and drastically reduces memory usage.

A very simple example is

```elm
module Small exposing (x)

x =
    [ 1, 2, 3
     , 4]
```

But with a list literal of many thousands of elements. The newline
forces all elements to get their own line. This is implemented as a
fold, but in practice this would often perform

```
longList ++ shortList
```

By reversing the fold (from `foldl` to `foldr`) that turns into

```
shortList ++ longList
```

Which requires less traversals and memory.
@avh4
Copy link
Owner

avh4 commented Nov 8, 2019

Thanks, and great catch! I'll be happy to merge this.

One thing I'm curious about first... since writing the original code I have learned more about haskell, specifically that foldl' is generally preferred over foldl. https://wiki.haskell.org/Foldr_Foldl_Foldl%27 Clearly foldl1 is not the right thing to use here, but I wonder what the tradeoffs here are between foldl1' and foldr1?

@avh4 avh4 added this to the 0.8.3 milestone Nov 8, 2019
@folkertdev
Copy link
Contributor Author

Usage of foldl1' didn't make a difference in this case, but in general using foldl' is a good idea.

In this case, a foldr just works, and it's actually not a bad choice in a lazy language. See also this blog post as a github comment.

There are 6 occurences of foldl remaining:

folkertdev@folkertdev ~/e/elm-format> grep "foldl" src/**/*.hs
src/ElmFormat/Render/Box.hs:          |> foldl step (ReversedList.empty, ReversedList.empty)
src/ElmFormat/Render/Box.hs:    foldl step (ReversedList.empty, Set.empty) input |> fst |> ReversedList.toList
src/ElmFormat/Render/Box.hs:            case foldl stepChildren (ReversedList.empty, seen) next |> (\(a,b) -> (ReversedList.toList a, b)) of
src/ElmFormat/Upgrade_0_19.hs:                (terms'', multiline'') = foldl step (ReversedList.empty, multiline) terms'
src/ElmFormat/Upgrade_0_19.hs:                        newBody = foldl (\e (name, value) -> mapExpr (inlineVar name (appMultiline == FASplitFirst) value) e) body mappings
src/Util/List.hs:        List.foldl step init list

where only one function from Util.List is used, which is also in ElmFormat.Render.Box. I could add the conversion to foldl' to this PR or open a new one for it if you want.

@avh4
Copy link
Owner

avh4 commented Nov 15, 2019

Just confirming that this change has some clear improvement for memory usage and no significant increase in memory usage or execution time for all of the existing test files:

-	total alloc =   3,798,304 bytes  (excludes profiling overheads)
+	total alloc =   3,797,656 bytes  (excludes profiling overheads)
-	total alloc =   1,805,976 bytes  (excludes profiling overheads)
+	total alloc =   1,803,360 bytes  (excludes profiling overheads)
-	total alloc =   2,668,800 bytes  (excludes profiling overheads)
+	total alloc =   2,665,488 bytes  (excludes profiling overheads)
-	total alloc =   1,119,040 bytes  (excludes profiling overheads)
+	total alloc =   1,119,344 bytes  (excludes profiling overheads)
-	total alloc =   1,254,960 bytes  (excludes profiling overheads)
+	total alloc =   1,255,136 bytes  (excludes profiling overheads)
-	total alloc =   3,122,560 bytes  (excludes profiling overheads)
+	total alloc =   3,118,200 bytes  (excludes profiling overheads)
-	total alloc =   2,501,000 bytes  (excludes profiling overheads)
+	total alloc =   2,501,712 bytes  (excludes profiling overheads)
-	total alloc =   1,505,760 bytes  (excludes profiling overheads)
+	total alloc =   1,506,376 bytes  (excludes profiling overheads)
-	total alloc =   1,450,184 bytes  (excludes profiling overheads)
+	total alloc =   1,450,456 bytes  (excludes profiling overheads)
-	total alloc =   2,000,256 bytes  (excludes profiling overheads)
+	total alloc =   2,000,872 bytes  (excludes profiling overheads)
-	total alloc =   1,140,888 bytes  (excludes profiling overheads)
+	total alloc =   1,141,192 bytes  (excludes profiling overheads)
-	total alloc =  75,428,736 bytes  (excludes profiling overheads)
+	total alloc =  74,520,992 bytes  (excludes profiling overheads)
-	total alloc =  15,000,728 bytes  (excludes profiling overheads)
+	total alloc =  14,948,400 bytes  (excludes profiling overheads)
-	total alloc =  15,461,496 bytes  (excludes profiling overheads)
+	total alloc =  15,380,240 bytes  (excludes profiling overheads)
-	total alloc =  17,956,168 bytes  (excludes profiling overheads)
+	total alloc =  17,838,680 bytes  (excludes profiling overheads)
-	total alloc =  16,303,536 bytes  (excludes profiling overheads)
+	total alloc =  16,213,856 bytes  (excludes profiling overheads)
-	total alloc =  12,749,080 bytes  (excludes profiling overheads)
+	total alloc =  12,691,024 bytes  (excludes profiling overheads)
-	total alloc =   1,862,776 bytes  (excludes profiling overheads)
+	total alloc =   1,863,784 bytes  (excludes profiling overheads)
-	total alloc =   3,782,888 bytes  (excludes profiling overheads)
+	total alloc =   3,782,240 bytes  (excludes profiling overheads)
-	total alloc =  20,492,984 bytes  (excludes profiling overheads)
+	total alloc =  20,390,720 bytes  (excludes profiling overheads)
-	total alloc =  59,016,856 bytes  (excludes profiling overheads)
+	total alloc =  58,403,464 bytes  (excludes profiling overheads)
-	total alloc =   4,800,496 bytes  (excludes profiling overheads)
+	total alloc =   4,778,616 bytes  (excludes profiling overheads)
-	total alloc =  13,300,336 bytes  (excludes profiling overheads)
+	total alloc =  13,199,344 bytes  (excludes profiling overheads)
-	total alloc =   1,849,696 bytes  (excludes profiling overheads)
+	total alloc =   1,850,704 bytes  (excludes profiling overheads)
-	total alloc =   4,940,128 bytes  (excludes profiling overheads)
+	total alloc =   4,938,920 bytes  (excludes profiling overheads)
-	total alloc = 179,215,360 bytes  (excludes profiling overheads)
+	total alloc = 176,197,448 bytes  (excludes profiling overheads)
-	total alloc =   1,154,984 bytes  (excludes profiling overheads)
+	total alloc =   1,155,328 bytes  (excludes profiling overheads)
-	total alloc = 338,898,296 bytes  (excludes profiling overheads)
+	total alloc = 327,299,256 bytes  (excludes profiling overheads)
-	total alloc =   3,798,608 bytes  (excludes profiling overheads)
+	total alloc =   3,797,960 bytes  (excludes profiling overheads)
-	total alloc = 246,916,120 bytes  (excludes profiling overheads)
+	total alloc = 236,894,176 bytes  (excludes profiling overheads)
-	total alloc =  17,267,944 bytes  (excludes profiling overheads)
+	total alloc =  17,108,976 bytes  (excludes profiling overheads)
-	total alloc =  93,388,144 bytes  (excludes profiling overheads)
+	total alloc =  90,377,024 bytes  (excludes profiling overheads)
-	total alloc =   5,128,864 bytes  (excludes profiling overheads)
+	total alloc =   5,093,608 bytes  (excludes profiling overheads)
-	total alloc =   4,519,512 bytes  (excludes profiling overheads)
+	total alloc =   4,515,408 bytes  (excludes profiling overheads)
-	total alloc =   9,944,424 bytes  (excludes profiling overheads)
+	total alloc =   9,851,968 bytes  (excludes profiling overheads)
-	total alloc =   4,008,400 bytes  (excludes profiling overheads)
+	total alloc =   3,996,072 bytes  (excludes profiling overheads)
-	total alloc =   3,122,864 bytes  (excludes profiling overheads)
+	total alloc =   3,118,504 bytes  (excludes profiling overheads)
-	total alloc =  22,628,720 bytes  (excludes profiling overheads)
+	total alloc =  22,482,256 bytes  (excludes profiling overheads)
-	total alloc =   4,162,992 bytes  (excludes profiling overheads)
+	total alloc =   4,153,136 bytes  (excludes profiling overheads)
-	total alloc =  59,228,384 bytes  (excludes profiling overheads)
+	total alloc =  58,636,272 bytes  (excludes profiling overheads)
-	total alloc =   1,176,456 bytes  (excludes profiling overheads)
+	total alloc =   1,175,824 bytes  (excludes profiling overheads)
-	total alloc =   2,700,760 bytes  (excludes profiling overheads)
+	total alloc =   2,701,304 bytes  (excludes profiling overheads)
-	total alloc =   5,342,768 bytes  (excludes profiling overheads)
+	total alloc =   5,337,928 bytes  (excludes profiling overheads)
-	total alloc =   1,684,408 bytes  (excludes profiling overheads)
+	total alloc =   1,683,496 bytes  (excludes profiling overheads)
-	total alloc =  15,260,640 bytes  (excludes profiling overheads)
+	total alloc =  15,107,120 bytes  (excludes profiling overheads)
-	total alloc =   4,280,992 bytes  (excludes profiling overheads)
+	total alloc =   4,263,376 bytes  (excludes profiling overheads)
-	total alloc =   4,254,520 bytes  (excludes profiling overheads)
+	total alloc =   4,246,440 bytes  (excludes profiling overheads)
-	total alloc =   2,110,680 bytes  (excludes profiling overheads)
+	total alloc =   2,112,152 bytes  (excludes profiling overheads)
-	total alloc =  13,427,632 bytes  (excludes profiling overheads)
+	total alloc =  13,342,376 bytes  (excludes profiling overheads)
-	total alloc =  22,987,752 bytes  (excludes profiling overheads)
+	total alloc =  22,450,168 bytes  (excludes profiling overheads)
-	total alloc =  28,399,568 bytes  (excludes profiling overheads)
+	total alloc =  28,363,464 bytes  (excludes profiling overheads)
-	total alloc =   4,065,144 bytes  (excludes profiling overheads)
+	total alloc =   4,051,304 bytes  (excludes profiling overheads)
-	total alloc =   4,222,616 bytes  (excludes profiling overheads)
+	total alloc =   4,221,032 bytes  (excludes profiling overheads)
-	total alloc =   8,484,824 bytes  (excludes profiling overheads)
+	total alloc =   8,433,752 bytes  (excludes profiling overheads)
-	total alloc =  33,422,344 bytes  (excludes profiling overheads)
+	total alloc =  33,406,480 bytes  (excludes profiling overheads)
-	total alloc =   1,141,192 bytes  (excludes profiling overheads)
+	total alloc =   1,141,496 bytes  (excludes profiling overheads)
-	total alloc =   1,863,080 bytes  (excludes profiling overheads)
+	total alloc =   1,864,088 bytes  (excludes profiling overheads)
-	total alloc =   1,132,984 bytes  (excludes profiling overheads)
+	total alloc =   1,133,288 bytes  (excludes profiling overheads)
-	total alloc =  53,508,584 bytes  (excludes profiling overheads)
+	total alloc =  53,283,568 bytes  (excludes profiling overheads)
-	total alloc =  51,917,440 bytes  (excludes profiling overheads)
+	total alloc =  51,728,104 bytes  (excludes profiling overheads)
-	total alloc =  33,764,280 bytes  (excludes profiling overheads)
+	total alloc =  33,714,640 bytes  (excludes profiling overheads)
-	total alloc =  62,300,400 bytes  (excludes profiling overheads)
+	total alloc =  62,032,168 bytes  (excludes profiling overheads)
-	total alloc =  19,602,248 bytes  (excludes profiling overheads)
+	total alloc =  19,558,248 bytes  (excludes profiling overheads)
-	total time  =        0.01 secs   (5 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
 	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (3 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (1 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (2 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (3 ticks @ 1000 us, 1 processor)
 	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (3 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (2 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (3 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (2 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (5 ticks @ 1000 us, 1 processor)
 	total time  =        0.00 secs   (3 ticks @ 1000 us, 1 processor)
-	total time  =        0.07 secs   (68 ticks @ 1000 us, 1 processor)
+	total time  =        0.08 secs   (81 ticks @ 1000 us, 1 processor)
-	total time  =        0.02 secs   (21 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (18 ticks @ 1000 us, 1 processor)
-	total time  =        0.02 secs   (22 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (17 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (15 ticks @ 1000 us, 1 processor)
+	total time  =        0.03 secs   (25 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (14 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (20 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (13 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (18 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (8 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (14 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (16 ticks @ 1000 us, 1 processor)
-	total time  =        0.06 secs   (55 ticks @ 1000 us, 1 processor)
+	total time  =        0.04 secs   (44 ticks @ 1000 us, 1 processor)
 	total time  =        0.01 secs   (8 ticks @ 1000 us, 1 processor)
-	total time  =        0.02 secs   (18 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (15 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (3 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (9 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (7 ticks @ 1000 us, 1 processor)
-	total time  =        0.12 secs   (123 ticks @ 1000 us, 1 processor)
+	total time  =        0.12 secs   (121 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (5 ticks @ 1000 us, 1 processor)
-	total time  =        0.29 secs   (289 ticks @ 1000 us, 1 processor)
+	total time  =        0.28 secs   (277 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (8 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.18 secs   (182 ticks @ 1000 us, 1 processor)
+	total time  =        0.17 secs   (169 ticks @ 1000 us, 1 processor)
-	total time  =        0.02 secs   (23 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (21 ticks @ 1000 us, 1 processor)
-	total time  =        0.10 secs   (95 ticks @ 1000 us, 1 processor)
+	total time  =        0.09 secs   (92 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (5 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (7 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (7 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (9 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (12 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (13 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (9 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (8 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
-	total time  =        0.02 secs   (22 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (21 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (3 ticks @ 1000 us, 1 processor)
-	total time  =        0.05 secs   (51 ticks @ 1000 us, 1 processor)
+	total time  =        0.06 secs   (59 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (5 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (5 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (9 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (5 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (13 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (15 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (9 ticks @ 1000 us, 1 processor)
 	total time  =        0.01 secs   (7 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (11 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (13 ticks @ 1000 us, 1 processor)
-	total time  =        0.02 secs   (21 ticks @ 1000 us, 1 processor)
+	total time  =        0.03 secs   (25 ticks @ 1000 us, 1 processor)
-	total time  =        0.02 secs   (24 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (23 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (6 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (5 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (8 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (7 ticks @ 1000 us, 1 processor)
-	total time  =        0.01 secs   (12 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (13 ticks @ 1000 us, 1 processor)
-	total time  =        0.03 secs   (26 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (21 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (2 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (4 ticks @ 1000 us, 1 processor)
+	total time  =        0.01 secs   (7 ticks @ 1000 us, 1 processor)
-	total time  =        0.00 secs   (2 ticks @ 1000 us, 1 processor)
+	total time  =        0.00 secs   (3 ticks @ 1000 us, 1 processor)
-	total time  =        0.05 secs   (48 ticks @ 1000 us, 1 processor)
+	total time  =        0.05 secs   (46 ticks @ 1000 us, 1 processor)
-	total time  =        0.05 secs   (52 ticks @ 1000 us, 1 processor)
+	total time  =        0.04 secs   (35 ticks @ 1000 us, 1 processor)
-	total time  =        0.03 secs   (27 ticks @ 1000 us, 1 processor)
+	total time  =        0.03 secs   (34 ticks @ 1000 us, 1 processor)
-	total time  =        0.05 secs   (52 ticks @ 1000 us, 1 processor)
+	total time  =        0.07 secs   (69 ticks @ 1000 us, 1 processor)
-	total time  =        0.02 secs   (23 ticks @ 1000 us, 1 processor)
+	total time  =        0.02 secs   (22 ticks @ 1000 us, 1 processor)

@avh4
Copy link
Owner

avh4 commented Nov 15, 2019

Thanks! this'll be merged via #646 where I started adding some scripts to compare performance between builds.

@avh4 avh4 merged commit e7c1e74 into avh4:master Nov 15, 2019
avh4 added a commit that referenced this pull request Nov 15, 2019
Add profiling script and merge #643
kutyel pushed a commit to kutyel/elm-format that referenced this pull request Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants