-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider design change in light of GHC join point work #156
Comments
... filter can't fuse without skips can it?
…On Tue, Feb 21, 2017 at 2:49 PM David Feuer ***@***.***> wrote:
The join points paper
<https://www.microsoft.com/en-us/research/publication/compiling-without-continuations/>
suggests that it may now be possible to avoid the need for Skip
altogether. It would be worth checking if that's actually the case. If so,
it might be possible to simplify the implementation considerably.
Furthermore, I seem to recall that the presence of skips (which allow
vectors of the same length to be represented by streams of different
lengths) prevents one or more optimizations. I vaguely remember trying to
do something a bit clever with replicate and finding it impossible
because of skips, but I don't remember any details.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#156>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAQwtSu6MWDlo1qGWdrgLswl4Muet-aks5rez-1gaJpZM4MHx21>
.
|
I believe the join point paper claims the new approach would allow it to do
so, yes. Whether that is actually the case as implemented, I don't know.
On Feb 23, 2017 6:02 AM, "Carter Tazio Schonwald" <notifications@github.com>
wrote:
… ... filter can't fuse without skips can it?
On Tue, Feb 21, 2017 at 2:49 PM David Feuer ***@***.***>
wrote:
> The join points paper
> <https://www.microsoft.com/en-us/research/publication/compiling-without-
continuations/>
> suggests that it may now be possible to avoid the need for Skip
> altogether. It would be worth checking if that's actually the case. If
so,
> it might be possible to simplify the implementation considerably.
> Furthermore, I seem to recall that the presence of skips (which allow
> vectors of the same length to be represented by streams of different
> lengths) prevents one or more optimizations. I vaguely remember trying to
> do something a bit clever with replicate and finding it impossible
> because of skips, but I don't remember any details.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#156>, or mute the thread
> <https://github.com/notifications/unsubscribe-auth/
AAAQwtSu6MWDlo1qGWdrgLswl4Muet-aks5rez-1gaJpZM4MHx21>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#156 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABzi_d0j-U5DDGe2FmQYYxi0Dk9RF-_Vks5rfWdHgaJpZM4MHx21>
.
|
The paper says the new join point handling makes it so that recursive stepper functions are no longer a problem. So However, making this change would break (fusion-wise) the functions that require |
And we still need to measure impact on compile time/ code side /
performance for vector.
That would change how loop breakers etc are chosen sometimes right?
…On Fri, Feb 24, 2017 at 12:50 AM dolio ***@***.***> wrote:
The paper says the new join point handling makes it so that recursive
stepper functions are no longer a problem. So Skip is no longer needed.
However, making this change would break (fusion-wise) the functions that
require Skip on every compiler version that doesn't have the new join
point handling. I assume that's everything before 8.2, or at best 8.0. So
I'm not sure such a change can be justified for quite some time.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#156 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAQwr2jc6OW3lYepHs9qySRvlRAcjpGks5rfm-lgaJpZM4MHx21>
.
|
The one way I can think of to accelerate this kind of thing is to decouple the fusion from vectors. Have modules that provide plain vectors, and other modules that somehow add in fusion. Then give a choice of which fusion framework to use. However, I'm not very sure how to accomplish something like this. The most obvious thing to say is, "this is where you want ML modules," because fusion is sort of a thing that transforms one implementation of an API into another. But backpack is probably more of a limiting factor than the join point stuff. Perhaps a newtype could work: newtype Fused v a = F (v a) Where you use Anyhow, this is a research project on top of a research project. :) |
@treeowl whats the best way to explore the join point cleanup in your mind? |
Any thoughts on this recently? From the paper it sound very interesting:
Would it be worthwhile to do a small prototype to see e.g. how much it improves runtime and/or compiler performance? Perhaps some of the zip or zipWith functions ? I picked those based on the following from the paper, but I'm not sure if it applies to zip for vectors:
|
It is interesting, but it is also an enormous amount of work. @GeorgeCo You feel like working on this and submitting a PR with a prototype? If I were to work on this I would start with implementing and verifying performance of We already have a plan for putting all the streaming functionality that uses |
I've just done a quick scan over the code. Now I'm stuck and it seems to me like it is impossible to remove drop :: Monad m => Int -> Stream m a -> Stream m a
{-# INLINE_FUSED drop #-}
drop n (Stream step t) = Stream step' (t, Just n)
where
{-# INLINE_INNER step' #-}
step' (s, Just i) | i > 0 = do
r <- step s
case r of
Yield _ s' -> step' (s', Just (i-1))
Done -> return Done
step' (s, _) = liftM (\r ->
case r of
Yield x s' -> Yield x (s', Nothing)
Done -> Done
) (step s) But here the Or is there another way to write this so that it can use a join point? |
This is a very cool observation, would having a flavor of monads that
guarantee they respect tail calls, in a way ghc understands, solve that?
I’ve hit a similar technical issue in some of my own prjects over time,
though with different motivations. I’ll ask around
…On Mon, Sep 26, 2022 at 11:07 AM Jaro Reinders ***@***.***> wrote:
I've just done a quick scan over the code. It seems to me like it is
impossible to remove Skip for monadic streams. Take for example the drop
function. I've removed the Step constructor and replaced its usage by a
recursive call:
drop :: Monad m => Int -> Stream m a -> Stream m a
{-# INLINE_FUSED drop #-}drop n (Stream step t) = Stream step' (t, Just n)
where
{-# INLINE_INNER step' #-}
step' (s, Just i) | i > 0 = do
r <- step s
case r of
Yield _ s' -> step' (s', Just (i-1))
Done -> return Done
step' (s, _) = liftM (\r ->
case r of
Yield x s' -> Yield x (s', Nothing)
Done -> Done
) (step s)
But here the step' (s', Just (i - 1)) recursive call is not guaranteed to
be a tail call (that depends on what the Monad m is), so GHC can't mark
step' as a join point in general (if I understand it correctly).
—
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQXHQZAQDOBSHR2XCUTWAG33LANCNFSM4DA7DW2Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
That's exactly what I was thinking! I wonder what that looks like. I know somebody asked SPJ once after one of his talks if join points could be made explicit in the surface language and SPJ considered the possibility but didn't see an immediate use case. I also think this is related to Alexis King's "Effects for Less" talk which mentions that it is really important that the monadic bind operator gets inlined. Perhaps if we use her |
It might be worthwhile to draw SPJ and the other author of the paper into this conversation. |
ping @simonpj @pdownen @lukemaurer (I couldn't find Zena Ariola's github account if she even has one) |
We are taking about fusion here, correct? That's only going to happen if you inine the function (in this case I rather doubt that you'll achieve fusion for an arbitrary monad. A good way to move this converation on would be to provide a concrete example of
Being concrete aids comprehension, and saves misinderstandings. |
Oh, that sounds right. Now I feel silly for missing that. To be much more concrete. Take for example the Quickhull benchmark. Before my changes it takes 55.7 ms and after removing Skip it takes 72.2 ms. Looking at the core reveals a bunch of bindings like the following which I think are to blame for this discrepancy. These are not present in the core dumps from before my changes. letrec {
$s$wstep'_sebJ :: Int# -> Int# ->
Id (Step (Int, Int, Maybe (Double, Double)) ((Double, Double), Double))
$s$wstep'_sebJ
= \ (sc_sebF :: Int#) (sc1_sebE :: Int#) ->
case >=# sc1_sebE ww_sdOU of {
__DEFAULT ->
case indexDoubleArray# ww2_sdP0 (+# ww1_sdOY sc1_sebE) of wild2_a6SW { __DEFAULT ->
case indexDoubleArray# ww4_sdP7 (+# ww3_sdP5 sc1_sebE) of wild1_Xe { __DEFAULT ->
case >=# sc_sebF ww7_sdOz of {
__DEFAULT ->
case indexDoubleArray# ipv3_adv1 sc_sebF of wild3_Xh { __DEFAULT ->
case >## wild3_Xh 0.0## of {
__DEFAULT -> $s$wstep'_sebJ (+# sc_sebF 1#) (+# sc1_sebE 1#); -- <<< recursive call here
1# ->
(Yield
((D# wild2_a6SW, D# wild1_Xe), D# wild3_Xh)
(I# (+# sc1_sebE 1#), I# (+# sc_sebF 1#), Nothing))
`cast` <Co:15>
}};
1# -> Done `cast` <Co:15>
}}};
1# -> Done `cast` <Co:15>
}; } in Here are the full pre and post dumps of all the benchmarks: And the particular benchmarks that got noticeably worse are: listrank, awshcc, hybcc, quickhull, findIndexR, findIndexR_manual. Especially awshcc got more than 3x slower and hybcc about 2x slower. |
Maybe it is some kind of cross-function case of case optimization that is lacking. After reducing an example to just a letrec {
$wstep'_s3Uf :: Int# -> Id (Step Int Double)
$wstep'_s3Uf
= \ (ww2_s3Ud :: Int#) ->
case >=# ww2_s3Ud ipv1_s3Po of {
__DEFAULT ->
case indexDoubleArray# ipv2_s3Pp (+# ipv_s3Pn ww2_s3Ud)
of wild2_i3d4
{ __DEFAULT ->
case >## wild2_i3d4 10.0## of {
__DEFAULT -> $wstep'_s3Uf (+# ww2_s3Ud 1#);
1# ->
(Yield (D# wild2_i3d4) (I# (+# ww2_s3Ud 1#))) `cast` <Co:5>
}
};
1# -> Done `cast` <Co:5>
}; } in
joinrec {
$s$wfoldlM'_loop_s3WN
:: State# RealWorld -> Int# -> Int# -> Vector Double
$s$wfoldlM'_loop_s3WN (sc_s3WM :: State# RealWorld)
(sc1_s3WK :: Int#)
(sc2_s3WJ :: Int#)
= case ($wstep'_s3Uf sc1_s3WK) `cast` <Co:4> of { -- <<< $wstep' can't be a join point
Yield x_a3P4 s'_a3P5 ->
case x_a3P4 of { D# x1_a2wq ->
case writeDoubleArray#
ipv4_i3En
sc2_s3WJ
(+## x1_a2wq x1_a2wq)
(sc_s3WM `cast` <Co:5>)
of s'#_i3S3
{ __DEFAULT ->
case s'_a3P5 of { I# ww3_X4 ->
jump $s$wfoldlM'_loop_s3WN
(s'#_i3S3 `cast` <Co:4>) ww3_X4 (+# sc2_s3WJ 1#)
}
}
};
Done ->
case unsafeFreezeByteArray# ipv4_i3En (sc_s3WM `cast` <Co:5>) of
{ (# ipv5_i3ci, ipv6_i3cj #) ->
(Vector 0# sc2_s3WJ ipv6_i3cj) `cast` <Co:5>
}
}; } in
jump $s$wfoldlM'_loop_s3WN (ipv3_i3Em `cast` <Co:4>) 0# 0# Maybe I'm running into https://gitlab.haskell.org/ghc/ghc/-/issues/16335 again? |
Maybe this is too hard, but could you distil out a standalone repro case? Or (less good but still OK) supply precise repro instructions for your "reduced example". By precise I mean "check out this repo, cd to there, execute these commands..." Looking at the code you show, I think that FloatIn should move the binding for |
Here's a standalone reproducer: https://gist.github.com/noughtmare/f979c0fc9eed30abcd0849d19cf0f746 with the interesting core here: https://gist.github.com/noughtmare/f979c0fc9eed30abcd0849d19cf0f746#file-all-dump-simpl-L2737-L2794 Commands you can run:
|
If I translate the produced core back to Haskell: module T where
import Control.Monad.Primitive
import Data.Primitive.Array
import System.IO.Unsafe (unsafePerformIO)
data Step s a = Yield a s | Done
uninitialised = undefined
test :: Int -> Int -> Array Double -> (Int, Int, Array Double)
test off n oldArr = unsafePerformIO $ do
newArr <- newArray n uninitialised
let
step' i
| i >= n = Done
| otherwise =
let x = indexArray oldArr (off + i) in
if x > 10
then Yield x (i + 1)
else step' (i + 1)
loop i j = do
case step' i of
Yield x s' -> do
writeArray newArr j (x + 1)
loop s' (j + 1)
Done -> do
out <- unsafeFreezeArray newArr
return (0, j, out)
loop 0 0 And then compile that again I do get the nice core with two join points where the intermediate $wtest :: Int -> Int# -> Array Double -> (Int, Int, Array Double)
$wtest
= \ (w :: Int) (ww :: Int#) (w1 :: Array Double) ->
runRW#
(\ (s :: State# RealWorld) ->
case noDuplicate# s of s' { __DEFAULT ->
case newArray# ww uninitialised (s' `cast` <Co:3>) of
{ (# ipv, ipv1 #) ->
joinrec {
$s$wloop
:: State# RealWorld -> Int# -> Int# -> (Int, Int, Array Double)
$s$wloop (sc :: State# RealWorld) (sc1 :: Int#) (sc2 :: Int#)
= join {
$j :: (Int, Int, Array Double)
$j
= case unsafeFreezeArray# ipv1 (sc `cast` <Co:3>) of
{ (# ipv2, ipv3 #) ->
lazy (test1, I# sc1, Array ipv3)
} } in
case >=# sc2 ww of {
__DEFAULT ->
case w of { I# x ->
case w1 of { Array ds2 ->
case indexArray# ds2 (+# x sc2) of { (# ipv2 #) ->
case ipv2 of { D# x1 ->
case >## x1 10.0## of {
__DEFAULT ->
joinrec {
$wstep' :: Int# -> (Int, Int, Array Double)
$wstep' (ww1 :: Int#)
= case >=# ww1 ww of {
__DEFAULT ->
case indexArray# ds2 (+# x ww1) of { (# ipv3 #) ->
case ipv3 of { D# x2 ->
case >## x2 10.0## of {
__DEFAULT -> jump $wstep' (+# ww1 1#);
1# ->
case writeArray#
ipv1 sc1 (D# (+## x2 1.0##)) (sc `cast` <Co:3>)
of s'#
{ __DEFAULT ->
jump $s$wloop (s'# `cast` <Co:2>) (+# sc1 1#) (+# ww1 1#)
}
} } };
1# -> jump $j
}; } in
jump $wstep' (+# sc2 1#);
1# ->
case writeArray# ipv1 sc1 (D# (+## x1 1.0##)) (sc `cast` <Co:3>)
of s'#
{ __DEFAULT ->
jump $s$wloop (s'# `cast` <Co:2>) (+# sc1 1#) (+# sc2 1#)
}
} } } } };
1# -> jump $j
}; } in
jump $s$wloop (ipv `cast` <Co:2>) 0# 0#
} }) So is the simplifier just running out of steam? |
Hi Jaro
Good progress on this! wrt
So is the simplifier just running out of steam?
Did you try any of the increasing the count of any of the simplifier
options in
https://downloads.haskell.org/ghc/latest/docs/users_guide/using-optimisation.html
?
Unfortunately I am very far from being an expert but the one that looked
most likely to be was the following and the associated -fddump-simpl-stats
…-fsimpl-tick-factor=⟨n⟩
Default: 100
GHC’s optimiser can diverge if you write rewrite rules (Rewrite rules
<https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/rewrite_rules.html#rewrite-rules>)
that don’t terminate, or (less satisfactorily) if you code up recursion
through data types (Bugs in GHC
<https://downloads.haskell.org/ghc/latest/docs/users_guide/bugs.html#bugs-ghc>).
To avoid making the compiler fall into an infinite loop, the optimiser
carries a “tick count” and stops inlining and applying rewrite rules when
this count is exceeded. The limit is set as a multiple of the program size,
so bigger programs get more ticks. The -fsimpl-tick-factor flag lets you
change the multiplier. The default is 100; numbers larger than 100 give
more ticks, and numbers smaller than 100 give fewer.
If the tick-count expires, GHC summarises what simplifier steps it has
done; you can use -fddump-simpl-stats to generate a much more detailed
list. Usually that identifies the loop quite accurately, because some
numbers are very large.
<https://downloads.haskell.org/ghc/latest/docs/users_guide/using-optimisation.html#ghc-flag--fsimpl-tick-factor=%E2%9F%A8n%E2%9F%A9>
On Tue, Sep 27, 2022 at 11:17 AM Jaro Reinders ***@***.***> wrote:
If I translate the produced core back to Haskell:
module T where
import Control.Monad.Primitiveimport Data.Primitive.Arrayimport System.IO.Unsafe (unsafePerformIO)
data Step s a = Yield a s | Done
uninitialised = undefined
test :: Int -> Int -> Array Double -> (Int, Int, Array Double)
test off n oldArr = unsafePerformIO $ do
newArr <- newArray n uninitialised
let
step' i
| i >= n = Done
| otherwise =
let x = indexArray oldArr (off + i) in
if x > 10
then Yield x (i + 1)
else step' (i + 1)
loop i sc2 = do
let c = step' i
case c of
Yield x s' -> do
writeArray newArr sc2 (x + 1)
loop s' (sc2 + 1)
Done -> do
out <- unsafeFreezeArray newArr
return (0, sc2, out)
loop 0 0
And then compile that again I do get the nice core with two join points
where the intermediate Yield and Done constructors are eliminated:
$wtest :: Int -> Int# -> Array Double -> (Int, Int, Array Double)$wtest
= \ (w :: Int) (ww :: Int#) (w1 :: Array Double) ->
runRW#
(\ (s :: State# RealWorld) ->
case noDuplicate# s of s' { __DEFAULT ->
case newArray# ww uninitialised (s' `cast` <Co:3>) of
{ (# ipv, ipv1 #) ->
joinrec {
$s$wloop
:: State# RealWorld -> Int# -> Int# -> (Int, Int, Array Double)
$s$wloop (sc :: State# RealWorld) (sc1 :: Int#) (sc2 :: Int#)
= join {
$j :: (Int, Int, Array Double)
$j
= case unsafeFreezeArray# ipv1 (sc `cast` <Co:3>) of
{ (# ipv2, ipv3 #) ->
lazy (test1, I# sc1, Array ipv3)
} } in
case >=# sc2 ww of {
__DEFAULT ->
case w of { I# x ->
case w1 of { Array ds2 ->
case indexArray# ds2 (+# x sc2) of { (# ipv2 #) ->
case ipv2 of { D# x1 ->
case >## x1 10.0## of {
__DEFAULT ->
joinrec {
$wstep' :: Int# -> (Int, Int, Array Double)
$wstep' (ww1 :: Int#)
= case >=# ww1 ww of {
__DEFAULT ->
case indexArray# ds2 (+# x ww1) of { (# ipv3 #) ->
case ipv3 of { D# x2 ->
case >## x2 10.0## of {
__DEFAULT -> jump $wstep' (+# ww1 1#);
1# ->
case writeArray#
ipv1 sc1 (D# (+## x2 1.0##)) (sc `cast` <Co:3>)
of s'#
{ __DEFAULT ->
jump $s$wloop (s'# `cast` <Co:2>) (+# sc1 1#) (+# ww1 1#)
}
} } };
1# -> jump $j
}; } in
jump $wstep' (+# sc2 1#);
1# ->
case writeArray# ipv1 sc1 (D# (+## x1 1.0##)) (sc `cast` <Co:3>)
of s'#
{ __DEFAULT ->
jump $s$wloop (s'# `cast` <Co:2>) (+# sc1 1#) (+# sc2 1#)
}
} } } } };
1# -> jump $j
}; } in
jump $s$wloop (ipv `cast` <Co:2>) 0# 0#
} })
So is the simplifier just running out of steam?
—
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQIJ65QW6AXKBWUDW4Q7ODWAL6XHANCNFSM4DA7DW2Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I did try increasing the max simplifier iterations which did not help. But now I've rewritten it using the lower level unboxed operations: {-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
module T where
import GHC.Exts
import GHC.IO
data Step s a = Yield a s | Done
uninitialised = undefined
test :: Int# -> Int# -> Array# Double -> (# Int#, Int#, Array# Double #)
test off n oldArr = runRW# $ \s0 ->
case newArray# n uninitialised s0
of { (# s1, newArr #) ->
let
step' i
| isTrue# (i >=# n) = Done
| otherwise =
let (# D# x #) = indexArray# oldArr (off +# i) in
if isTrue# (x >## 10.0##)
then Yield (D# x) (I# (i +# 1#))
else step' (i +# 1#)
loop i j s2 =
case step' i of
Yield x (I# s') ->
case writeArray# newArr j (x + 1) s2
of { s3 ->
loop s' (j +# 1#) s3
}
Done ->
case unsafeFreezeArray# newArr s2
of { (# s3, out #) ->
(# 0#, j, out #)
}
in
loop 0# 0# s1
} And compiling this program does show the optimisation failure: test
= \ off n oldArr ->
runRW#
(\ s0 ->
case newArray# n uninitialised s0 of { (# ipv, ipv1 #) ->
letrec {
step'
= \ i ->
case >=# i n of {
__DEFAULT ->
case indexArray# oldArr (+# off i) of { (# ipv2 #) ->
case ipv2 of wild { D# x ->
case >## x 10.0## of {
__DEFAULT -> step' (+# i 1#);
1# -> Yield wild (I# (+# i 1#))
}
}
};
1# -> Done
}; } in
join {
exit j s2
= case unsafeFreezeArray# ipv1 s2 of { (# ipv2, ipv3 #) ->
(# 0#, j, ipv3 #)
} } in
joinrec {
loop i j s2
= case step' i of {
Yield x ds1 ->
case ds1 of { I# s' ->
case writeArray#
ipv1 j (case x of { D# x1 -> D# (+## x1 1.0##) }) s2
of s3
{ __DEFAULT ->
jump loop s' (+# j 1#) s3
}
};
Done -> jump exit j s2
}; } in
jump loop 0# 0# ipv
}) |
I think it is time to open a proper GHC issue for this, so I've done that: https://gitlab.haskell.org/ghc/ghc/-/issues/22227 |
this is great :)
…On Tue, Sep 27, 2022 at 11:08 AM Jaro Reinders ***@***.***> wrote:
I think it is time to open a proper GHC issue for this, so I've done that:
https://gitlab.haskell.org/ghc/ghc/-/issues/22227
—
Reply to this email directly, view it on GitHub
<#156 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABBQVAEPOXFMAY4EJVU4DWAMEWBANCNFSM4DA7DW2Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Good news! I have been able to manually implement the "loopification" optimization that GHC is missing (see this WIP merge request of GHC) with cfaf0d4. Now almost all benchmarks are the same or faster (tridiag speeds up the most from 86ms to 19.3ms) except awshcc which is still about 2x slower (406ms to 872ms). I'll look into that now. For now, I can already say is that it is able to fuse properly so the problem must be something else. |
Wow ! |
Would love to see the table of timing changes |
I've now reached the point that all benchmarks are (approximately) equal or faster, see #448 (comment). |
@noughtmare That is all pretty great news. If I understand correctly the proper speed up is only observed with the ghc patch How does the regression look for all the older ghc versions. We do support all the way down to ghc-8.0 |
No, I've manually implemented the optimization that GHC would do with that patch. For now I've tested with GHC 9.2.4. I'll do some more tests with other GHC versions, but I believe this only really requires join points (I don't know when those were introduced, but I think GHC 8 already had them). |
Now I understand, that is great! Awesome work! Let's make sure we don't mess up performance for anyone and I don't see a reason why we shouldn't get this change into vector. |
It's great to see that you can make a loopified definition work today with GHC, but do note that fusion will break across module boundaries. For example, module A where
import Data.Stream
drop6 :: Stream a -> Stream a
drop6 xs = drop 6 xs
module B where
import A
main = print $ sum $ drop6 [1,2,3] note that the stepper of This is kind of the same restriction that we have about foldr/build fusion at the moment, because we rely on compiler phases (too much). |
The join points paper suggests that it may now be possible to avoid the need for
Skip
altogether. It would be worth checking if that's actually the case. If so, it might be possible to simplify the implementation considerably. Furthermore, I seem to recall that the presence of skips (which allow vectors of the same length to be represented by streams of different lengths) prevents one or more optimizations. I vaguely remember trying to do something a bit clever withreplicate
and finding it impossible because of skips, but I don't remember any details.The text was updated successfully, but these errors were encountered: