Maximizing PE array utilization in convolution runs #267

DanP114 · 2024-06-13T17:12:19Z

Hello there,

I am currently working on designing a 200 by 200 PE Convolution accelerator. I have taken the base template from the exercise provided and read through some documentation but my mapping strategies return with about 1-2% utilization.

Here are my input architecture files, parsed_input, generated map, and statistics showing utilization.

My inner PE spatial loop bounds seem to only unroll along the Y-axis with nothing in the X-axis. I believe the issues come from the constraints definition but I also have the intution problem dimensions (VGG) are not suited for a large PE array hence why I try mapping more batches.

Any input is appreciated.

arch_conv.txt
parsed-processed-input-large-pe-array-multi-batch.txt

timeloop-mapper.stats.txt
timeloop-mapper.map.txt

angshuman-parashar · 2024-08-06T14:49:50Z

There's something odd. Your spec appears to be creating a 200x200 array but the stats.txt reports 16x16 instances at all inner levels of the hierarchy. Are you sure the stat dump is from this arch?

Overall a 200x200 array is hard to fill spatially. Most mappings will be underutilized, so I suspect the mapper search is just giving up too quickly. Try tweaking the hyperparameters to make it try harder. Also, in your innermost buffer constraints you should add a min parallelism constraint (e.g., 0.5). This will early-reject any mappings that don't have at least 50% utilization. You won't prevent the search heuristic from visiting such mappings, but you will elide the expensive evaluation cost for these mappings.

chipletstu · 2024-11-13T06:56:56Z

There's something odd. Your spec appears to be creating a 200x200 array but the stats.txt reports 16x16 instances at all inner levels of the hierarchy. Are you sure the stat dump is from this arch?

Overall a 200x200 array is hard to fill spatially. Most mappings will be underutilized, so I suspect the mapper search is just giving up too quickly. Try tweaking the hyperparameters to make it try harder. Also, in your innermost buffer constraints you should add a min parallelism constraint (e.g., 0.5). This will early-reject any mappings that don't have at least 50% utilization. You won't prevent the search heuristic from visiting such mappings, but you will elide the expensive evaluation cost for these mappings.

Hello,

I have a question about how to add a min parallelism constraint (e.g., 0.5) in my innermost buffer constraints. Can you give me an example?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximizing PE array utilization in convolution runs #267

Maximizing PE array utilization in convolution runs #267

DanP114 commented Jun 13, 2024 •

edited

Loading

angshuman-parashar commented Aug 6, 2024

chipletstu commented Nov 13, 2024

Maximizing PE array utilization in convolution runs #267

Maximizing PE array utilization in convolution runs #267

Comments

DanP114 commented Jun 13, 2024 • edited Loading

angshuman-parashar commented Aug 6, 2024

chipletstu commented Nov 13, 2024

DanP114 commented Jun 13, 2024 •

edited

Loading