Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type check / compilation: a basic problem #2423

Closed
nicolabotta opened this issue Jul 8, 2015 · 16 comments
Closed

Type check / compilation: a basic problem #2423

nicolabotta opened this issue Jul 8, 2015 · 16 comments

Comments

@nicolabotta
Copy link

The program

https://github.com/nicolabotta/SeqDecProbs/blob/master/frameworks/14-/SeqDecProbMonadicTheoryRVExample2.lidr

type checks and compiles in roughly linear time in maxColumnO2. maxColumnO2 is defined at line 94. The times it takes to compile the application for maxColumnO2 equal to 5, 10, 20, 40, 80 and 160 on my machine are:

user    1m14.184s
user    1m35.068s
user    2m15.896s
user    3m32.116s
user    6m32.520s
user    12m17.776s

respectively. For larger values of maxColumnO2 the machine starts swapping and compilation does not appear to terminate. Thus, compiling the application for realistic values of maxColumnO2 is virtually impossible. In contrast, the program

https://github.com/nicolabotta/SeqDecProbs/blob/master/frameworks/14-/SeqDecProbMonadicTheoryRVExample2b.lidr

type checks and compiles in about 50 seconds, independently of the value of maxColumnO2. The difference between the two programs is just the point of definition of maxColumnO2: in the second example, this is at line 566.

Thus, it seems that the point of definition of parameters can have a dramatic impact on the time it takes to type check or compile a program. This is a serious problem. It implies that developers have to be careful to defer parameter definitions or otherwise accept erratic type check and compile time behavior. Deferring definitions not only leads to spaghetti-code. It also makes the development of sizeable applications very tedious.

I believe that issue #2418 is an instance of the same problem but I do not have a simple solution for that case.

I guess the problem is related with the way reduction operates. The question is whether is is possible to inhibit reduction of certain expressions in certain scopes. This seems to be what happens when the definition of maxColumnO2 is deferred to line 566.

@ziman
Copy link
Contributor

ziman commented Jul 17, 2015

I believe the problem is caused by "too much"* reduction in the types. Deferring the definition makes the variable abstract in the meantime, which prevents reduction.

[*] The definition of "too much" here is "I don't like how much time it takes on my computer". The algorithm is most probably correct and making it more efficient is probably a hard problem and I have no idea what the rules/heuristics should be.

This is a (reasonably) minimal failing example:

-- First, define something exponentially-big-typed
-- along with some utility functions to talk about it.
data Exp : Nat -> Type where
  EZ : Exp 1
  ES : Exp n -> Exp (n + n)

exp : Nat -> Nat
exp  Z    = 1
exp (S n) = exp n + exp n

mkExp : {n : Nat} -> Exp (exp n)
mkExp {n = Z}   = EZ
mkExp {n = S n} = ES (mkExp {n = n})

------------

-- Now, we define some size constant. 2^20 seems to be enough.
size : Nat
-- size = 20   -- ### uncomment this to make typechecking slow ###

-- At this point of elaboration, the value of `size` is not known
-- and it's probably an opaque/abstract/irreducible value.
-- 
-- However, the types don't need `size` to reduce in order to match.
bigTypedVal : Exp (exp size)
bigTypedVal = mkExp {n = size}

-- Now we're free to give a concrete value to `size`
-- because it will no longer be reduced.
size = 20

-- However, if `size` were fully defined *before* elaborating
-- the type of `bigTypedVal`, it would reduce to an exponentially-sized type.

On my computer, the difference is instantaneous typechecking vs. a runaway program that I have to kill before it eats my memory.

If we could teach the typechecker to recognise whether we need to reduce something for typechecking or whether it will typecheck without reducing, we could improve its performance. Can we at least figure out some rules that humans could use to work around this?

@ziman
Copy link
Contributor

ziman commented Jul 17, 2015

Making size abstract doesn't help prevent its δ-reduction.

@ziman
Copy link
Contributor

ziman commented Jul 18, 2015

Making it abstract probably affects only exports, %freeze size helps within the module.

@nicolabotta
Copy link
Author

This is another instance of #761, #172. Could we have a %default freeze directive (and a %thaw one) ? I would expect that, at the level of applications, most declaration should be frozen by default.

@nicolabotta
Copy link
Author

I have commited a new version of

https://github.com/nicolabotta/SeqDecProbs/blob/master/frameworks/14-/SeqDecProbMonadicTheoryRVExample2.lidr

with %freeze annotations. As expected, most definitions are now frozen and the code type checks very fast and in constant time. Freezing definitions is a viable workaround. But a better approach would be to have all definitions frozen by default and to "thaw" via directives and for a local scope, I think. Something like

lala : Nat -> Nat
lala = ...

%thaw lala
f  : Nat -> Nat
f = ...

The idea is that the implementor of f is in a better position (than the implementor of lala and, possibly, than the type checker) for knowing (or understanding from error messages) when the implementation of fneeds delta-reduction of lala for type checking and when it doesn't.

@heather-alexander
Copy link

I have a possibly-related issue; I found this one when searching for a related existing issue report so it seems appropriate to report here.

With Idris 0.9.20, the following program takes at-least linear time (in the size of the Finite n Int instance) to typecheck. It still takes linear (albeit much less) time if I leave the elements function of the instance for Int as a metavariable. On my computer, an instance for Int with size of 512 takes about 5 seconds to compile, making large (i.e. machine-word-sized) instances impractical. This behavior didn't really surprise me (and Idris does compile this program more quickly than GHC compiles a much more awkward equivalent). Still, I would love to be able to write similar programs involving much larger sizes.

I'm not sure how to apply "%freeze" annotations to the program, but I'm messing with it now and I'll post an update if I figure anything out.

module Finite                                                                                       
import Data.Vect                                                                                    

class Finite (n : Nat) a where elements : Vect n a                                                  
instance Finite 2 Bool where elements = [True, False]                                               
instance Finite 512 Int where elements = fromList [0..511]  
-- I edited the file to time "instance Finite n Int where elements = fromList [0..n-1]" for various `n`. 

@ziman
Copy link
Contributor

ziman commented Nov 21, 2015

I did some benchmarking on three variants of your program:

The benchmarking script ran idris --check finite.idr three times for each vector length.

This is the result:
Benchmark plot

Surprisingly enough, the lazy variant is only marginally faster than the strict variant, and both are much faster than your version.

@ziman
Copy link
Contributor

ziman commented Nov 21, 2015

From the log-log graph, it seems that these runtimes are non-polynomial:
Log-log plot of runtimes

@heather-alexander
Copy link

Thanks for doing the extra analysis, Ziman.

I did suspect that the fromList might be part of it - and I did do some experiments with alternate approaches, eventually ending up with something like your enum-based version. Part of the issue that particularly concerned me was that the runtime increased even if I supplied Idris with a program to typecheck but with little actual runtime-work-to-be-done... something like the example above, but with an empty Finite instance for Int, like:

instance Finite 512 Int where     -- no instance defined

... which still appeared to run in super-linear time...

@ziman
Copy link
Contributor

ziman commented Nov 24, 2015

It's probably the big Nat index that slows things down here.

@ziman
Copy link
Contributor

ziman commented Nov 24, 2015

An exponential fits these curves well (about 3s . 2^(N/1000)), which supports the Nat explanation.

@nicolabotta
Copy link
Author

With 0.9.20.1-git:140d06a I can "idris --check" both

module Finite                                                                                       
import Data.Vect                                                                                    
n : Nat
n = 1511
class Finite (m : Nat) a where elements : Vect m a                                                  
instance Finite 2 Bool where elements = [True, False]                                               
instance Finite (S n) Int where elements = fromList [0..n]

and

module Main                
import Data.Vect           
n : Nat                    
n = 1600                    
nats : Vect n Nat          
nats = fromList [1..n]     
main : IO ()               
main = putStrLn $ show n 

in constant time in n as one would expect. Best, Nicola

@ziman
Copy link
Contributor

ziman commented Nov 24, 2015

Right, so we could regard these issues as the same problem: some things are better left abstract for typechecking (but some are not).

@ahmadsalim
Copy link

I think this is a duplicate of #172, which is already tagged as a major issue and put for release 0.12. I will therefore close this issue.

@nicolabotta
Copy link
Author

@ahmadsalim I am not sure that #2418 or even this issue are duplicates of #172. There are similarities (in both cases, deferring a definition helps circumventing a strange compile-time behavior) but I suspect that the causes of such behavior could be different in the two issues.

I have constructed another example that might help clarifying what is going on here: on a fresh installation of https://github.com/nicolabotta/SeqDecProbs, you should be able to do

cd SeqDecProbs-master/frameworks/14-/
time make example1

This should yield something like

idris -p contrib -p effects -V SeqDecProbsExample1.lidr -o example1
Type checking ./Sigma.lidr
...
Type checking ./SeqDecProbsExample1.lidr

real    2m24.838s
user    2m24.016s
sys 0m1.548s

Now, try commenting out line 182

> %freeze SeqDecProbsHelpers.finiteCtrl

of SeqDecProbsExample1.lidr and doing

rm example1
make clean
time make example1

You should now get something like

Type checking ./Sigma.lidr
...
Type checking ./SeqDecProbsExample1.lidr

real    12m56.022s
user    12m39.540s
sys 0m9.320s

The example shows that type checking certain expressions under the premise that

> finiteCtrl : {t : Nat} -> {n : Nat} -> (x : State t) -> Finite (Ctrl t x) 
> finiteCtrl _ = finiteLeftAheadRight

can cost significantly more than type checking the same expressions under the (weaker!) premise

> finiteCtrl : {t : Nat} -> {n : Nat} -> (x : State t) -> Finite (Ctrl t x) 

This behavior seems to violate the (quite reasonable, I think) understanding that stronger assumptions should make type checking easier.

What are the expressions on which type checking stumbles when the definition of finiteCtrl is not frozen?

Using emacs' idris-mode, it is easy to see (by tracking coloring times) that -- when the definition of finiteCtrl is not frozen, type checking gets stuck at line 192. It takes just a few seconds to get to line 192 and then about 12 minutes to type check lines 192-196!

This is a bit bizarre since lines 192-196 seem very similar to lines 186-190 which type check really fast! The example also suggests that, here, the problem could be quite different than in #172 where type checking time was found to be linear (instead of constant) in n.

Please, let me know if you want me to open a new issue with this specific example.

@ahmadsalim
Copy link

@nicolabotta I think the underlying issue is the same, which is that too much reduction is happening in the types. Of course it comes in a different facade than the one in #172, but I think it would be nice if we could have a particular focused issue.

If you feel strongly about it, I am more than happy to reopen this issue however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants