Investigate different representation for `readByteArray` #346

acfoltzer · 2016-06-29T21:15:10Z

Perhaps due to indexing into ByteString, large files wind up taking a lot more memory than we'd hope.

The text was updated successfully, but these errors were encountered:

atomb · 2019-10-25T23:00:34Z

Related to #319.

brianhuffman · 2019-10-28T17:57:22Z

The source of the memory usage is in the function readFileCmd in module Cryptol.REPL.Command:

cryptol/src/Cryptol/REPL/Command.hs

Lines 738 to 747 in 3ea5e9e

    
           readFileCmd :: FilePath -> REPL () 
        
           readFileCmd fp = do 
        
             bytes <- replReadFile fp (\err -> rPutStrLn (show err) >> return Nothing) 
        
             case bytes of 
        
                 Nothing -> return () 
        
                 Just bs -> 
        
                   do pm <- getPrimMap 
        
                      let expr = T.eString pm (map (toEnum . fromIntegral) (BS.unpack bs)) 
        
                          ty   = T.tString (BS.length bs) 
        
                      bindItVariable ty expr

It turns the bytes from the file into an expression, consisting of a list literal full of 8-bit numeric literals. Building this expression takes a lot of memory: Using :readByteArray on a 20 megabyte file takes about 40 seconds (of which 80% is garbage collection) and allocates 32 GB, of which nearly 6 GB remains in use.

  32,002,815,144 bytes allocated in the heap
  26,486,133,976 bytes copied during GC
   5,864,512,960 bytes maximum residency (9 sample(s))
      16,683,584 bytes maximum slop
            5592 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       471 colls,     0 par   17.380s  19.024s     0.0404s    0.4306s
  Gen  1         9 colls,     0 par   15.297s  27.798s     3.0887s    14.7761s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0(0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.004s elapsed)
  MUT     time    8.129s  ( 12.228s elapsed)
  GC      time   32.677s  ( 46.822s elapsed)
  EXIT    time    0.001s  (  0.014s elapsed)
  Total   time   40.807s  ( 59.068s elapsed)

  Alloc rate    3,936,934,763 bytes per MUT second

  Productivity  19.9% of total user, 20.7% of total elapsed

To reduce the memory footprint of the expression, I tried building a different expression with split applied to a single very large integer literal. The maximum resident memory usage is now very efficient: It grows linearly with the file size, with a constant factor of 2 (compared to a factor of nearly 300 for the current version), so a 20 MB file should only take up 40 MB in memory when loaded.

However, the bottleneck now is the conversion from ByteString to Integer: I used BS.foldl' (\a b -> a `shiftL` 8 .|. toInteger b) 0 bs. Unfortunately this results in a quadratic amount of total memory allocation: On a 1.25 MB file, it is already at 1.4 terabytes and takes more than 2 minutes to run. For a 20 MB file, it would be in the hundreds of terabytes, and I'm not patient enough to let it finish.

I can think of a few different approaches we might try to fix the problem:

Use the split-with-single-big-integer-literal approach, but find a more efficient ByteString -> Integer function
Add a new expression constructor for what would essentially be ByteString literals, so we don't have to do the conversion at all
Refactor the REPL code so that we don't need to create an expression at all, just a value

brianhuffman · 2019-10-28T21:48:28Z

The first option worked: Converting ByteString to Integer can be done much more efficiently with a balanced binary fold. Now we can read a 20 MB file with readByteArray in about a second, with practically no size blowup.

acfoltzer added the feature request Asking for new or improved functionality label Jun 29, 2016

acfoltzer added this to the Cryptol 2.5 milestone Jun 29, 2016

atomb modified the milestones: Cryptol 2.5, Cryptol 2.6 Feb 5, 2018

atomb removed this from the Cryptol 2.6 milestone Apr 16, 2018

brianhuffman closed this as completed in 42ab1ea Oct 28, 2019

qsctr mentioned this issue Sep 6, 2022

Using result of :readByteArray on large file is slow #1428

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate different representation for `readByteArray` #346

Investigate different representation for `readByteArray` #346

acfoltzer commented Jun 29, 2016

atomb commented Oct 25, 2019

brianhuffman commented Oct 28, 2019

brianhuffman commented Oct 28, 2019

Investigate different representation for readByteArray #346

Investigate different representation for readByteArray #346

Comments

acfoltzer commented Jun 29, 2016

atomb commented Oct 25, 2019

brianhuffman commented Oct 28, 2019

brianhuffman commented Oct 28, 2019

Investigate different representation for `readByteArray` #346

Investigate different representation for `readByteArray` #346