Description
This issue was originally filed by zundel@google.com
When running some unit tests with heap space constrained to 32M, 2 tests run out of memory:
=== debugia32 dartc co19/LibTest/core/List/sort/List/sort/A01/t06 ===
=== debugia32 dartc co19/LibTest/core/List/sort/List/sort/A01/t05 ===
Top on the heap histogram are DartScanner.Location, DartScanner.Position and DartScanner.TokenData, with over 200K instances each adding up to 75M of the 32M heap.
-
The scanner currently tokenizes the entire file into memory, which may be over aggressive. If we do not keep references to these objects throughout the parse, we may be able to GC them if we only tokenize the file in chunks.
-
In DartScanner.Location, the code currently stores Position objects for start,end of each token. Each Position object contains 3 integers, line # ,column #, and offset.
We could reduce memory usage by storing only 2 integers in Location as byte offsets for start/end from the start of the file. Then, for that source file keep an index to indicate what offset corresponds with each line number. Since the column and line position is rarely accessed, we could use something as simple as an array and use binary search to find the right line number for a given character offset.