Skip to content

Memory usage on large inputs #56

@melisgl

Description

@melisgl

I'm using the per-block implementation in parse-doc, but it's still fairly easy to run out of memory with large %blocks with something like this:

CL-USER> (time
          (let ((input (with-output-to-string (out)
                         (loop repeat 100000
                               do (format out "- ~A ~A ~A ~A~%"
                                          (random 1000000) (random 1000000)
                                          (random 1000000) (random 1000000))))))
            (3bmd-grammar::parse-doc input)
            (length input)))
Evaluation took:
  12.364 seconds of real time
  12.371129 seconds of total run time (11.750773 user, 0.620356 system)
  [ Run times consist of 5.771 seconds GC time, and 6.601 seconds non-GC time. ]
  100.06% CPU
  37,030,481,562 processor cycles
  15,570,202,368 bytes consed
  
2955662
CL-USER> (/ 15570202368 2955662.0)
5267.924

This example uses a bulleted list because it is probably the worst offender, but a large paragraph behaves similarly.

According to time, consing scales linearly with the number of repeats, which is good. Perhaps 5267 bytes per character is too high, but I suspect that the main problem is that maximum size of the working set also scales linearly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions