Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some operations are slow due to fusion oriented definitions (replicateM) #113

Closed
mgsloan opened this issue Apr 22, 2016 · 6 comments
Closed
Labels

Comments

@mgsloan
Copy link

mgsloan commented Apr 22, 2016

replicateM round trips through a list, even though we immediately know the size of the result and could allocate it. It's much faster (5x for my application) to do something like this

replicateM n f = do
    mut <- MV.new n
    forM_ [0..n-1] $ \i -> f >>= MV.write mut i
    V.unsafeFreeze mut

Relevant chunks of code:

From Data.Vector:

replicateM :: Monad m => Int -> m a -> m (Vector a)
{-# INLINE replicateM #-}
replicateM = G.replicateM

From Data.Vector.Generic:

replicateM :: (Monad m, Vector v a) => Int -> m a -> m (v a)
{-# INLINE replicateM #-}
replicateM n m = unstreamM (MBundle.replicateM n m)
unstreamM s = do
                xs <- MBundle.toList s
                return $ unstream $ Bundle.unsafeFromList (MBundle.size s) xs
@cartazio
Copy link
Contributor

@mgsloan 1) what GHC version 2) what optimization levels 3) do we currently have any rewrite rules that should have accomplished this?

@cartazio
Copy link
Contributor

either way, thanks for including the unrolling of the definitions on the ticket, that will help contextulize discussing this

@mgsloan
Copy link
Author

mgsloan commented Apr 22, 2016

-O2 with ghc-7.10.3. I do not see any rewrite rules that attempt to accomplish an optimization here.

I'm not sure what to do about this, we can either prioritize fusion working or prioritize normal vector performance. It would be interesting to benchmark how well some multi-step fusion thing performs with the non-fusion definition of replicateM. It's quite possible that it's only a little bit slower. In that case, I think it makes sense to adopt the non-fusion definition for this and related functions.

@cartazio
Copy link
Contributor

since i'm on 8.0Tip atm, I'll have a go at digging into this this weekend (if you or @dolio or @hvr nudge me), since build times etc for vector are much more pleasant in 8.0 land.

I do agree with what you're saying, and that does remind me of some of the other improvements for making imperative ish code much nicer to write in vector, esp since fusion isn't a cureall (though it can be darn nice)

@cartazio
Copy link
Contributor

@mgsloan did you have any good comparative conclusions here?

@cartazio
Copy link
Contributor

if you have some some benchmarking code you can share comparing implementations/performance, please share, closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants