You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently working on designing a 200 by 200 PE Convolution accelerator. I have taken the base template from the exercise provided and read through some documentation but my mapping strategies return with about 1-2% utilization.
Here are my input architecture files, parsed_input, generated map, and statistics showing utilization.
My inner PE spatial loop bounds seem to only unroll along the Y-axis with nothing in the X-axis. I believe the issues come from the constraints definition but I also have the intution problem dimensions (VGG) are not suited for a large PE array hence why I try mapping more batches.
There's something odd. Your spec appears to be creating a 200x200 array but the stats.txt reports 16x16 instances at all inner levels of the hierarchy. Are you sure the stat dump is from this arch?
Overall a 200x200 array is hard to fill spatially. Most mappings will be underutilized, so I suspect the mapper search is just giving up too quickly. Try tweaking the hyperparameters to make it try harder. Also, in your innermost buffer constraints you should add a min parallelism constraint (e.g., 0.5). This will early-reject any mappings that don't have at least 50% utilization. You won't prevent the search heuristic from visiting such mappings, but you will elide the expensive evaluation cost for these mappings.
There's something odd. Your spec appears to be creating a 200x200 array but the stats.txt reports 16x16 instances at all inner levels of the hierarchy. Are you sure the stat dump is from this arch?
Overall a 200x200 array is hard to fill spatially. Most mappings will be underutilized, so I suspect the mapper search is just giving up too quickly. Try tweaking the hyperparameters to make it try harder. Also, in your innermost buffer constraints you should add a min parallelism constraint (e.g., 0.5). This will early-reject any mappings that don't have at least 50% utilization. You won't prevent the search heuristic from visiting such mappings, but you will elide the expensive evaluation cost for these mappings.
Hello,
I have a question about how to add a min parallelism constraint (e.g., 0.5) in my innermost buffer constraints. Can you give me an example?
Hello there,
I am currently working on designing a 200 by 200 PE Convolution accelerator. I have taken the base template from the exercise provided and read through some documentation but my mapping strategies return with about 1-2% utilization.
Here are my input architecture files, parsed_input, generated map, and statistics showing utilization.
My inner PE spatial loop bounds seem to only unroll along the Y-axis with nothing in the X-axis. I believe the issues come from the constraints definition but I also have the intution problem dimensions (VGG) are not suited for a large PE array hence why I try mapping more batches.
Any input is appreciated.
arch_conv.txt
parsed-processed-input-large-pe-array-multi-batch.txt
timeloop-mapper.stats.txt
timeloop-mapper.map.txt
The text was updated successfully, but these errors were encountered: