Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximizing PE array utilization in convolution runs #267

Open
DanP114 opened this issue Jun 13, 2024 · 2 comments
Open

Maximizing PE array utilization in convolution runs #267

DanP114 opened this issue Jun 13, 2024 · 2 comments

Comments

@DanP114
Copy link

DanP114 commented Jun 13, 2024

Hello there,

I am currently working on designing a 200 by 200 PE Convolution accelerator. I have taken the base template from the exercise provided and read through some documentation but my mapping strategies return with about 1-2% utilization.

Here are my input architecture files, parsed_input, generated map, and statistics showing utilization.

My inner PE spatial loop bounds seem to only unroll along the Y-axis with nothing in the X-axis. I believe the issues come from the constraints definition but I also have the intution problem dimensions (VGG) are not suited for a large PE array hence why I try mapping more batches.

Any input is appreciated.

arch_conv.txt
parsed-processed-input-large-pe-array-multi-batch.txt

timeloop-mapper.stats.txt
timeloop-mapper.map.txt

@angshuman-parashar
Copy link
Collaborator

There's something odd. Your spec appears to be creating a 200x200 array but the stats.txt reports 16x16 instances at all inner levels of the hierarchy. Are you sure the stat dump is from this arch?

Overall a 200x200 array is hard to fill spatially. Most mappings will be underutilized, so I suspect the mapper search is just giving up too quickly. Try tweaking the hyperparameters to make it try harder. Also, in your innermost buffer constraints you should add a min parallelism constraint (e.g., 0.5). This will early-reject any mappings that don't have at least 50% utilization. You won't prevent the search heuristic from visiting such mappings, but you will elide the expensive evaluation cost for these mappings.

@chipletstu
Copy link

There's something odd. Your spec appears to be creating a 200x200 array but the stats.txt reports 16x16 instances at all inner levels of the hierarchy. Are you sure the stat dump is from this arch?

Overall a 200x200 array is hard to fill spatially. Most mappings will be underutilized, so I suspect the mapper search is just giving up too quickly. Try tweaking the hyperparameters to make it try harder. Also, in your innermost buffer constraints you should add a min parallelism constraint (e.g., 0.5). This will early-reject any mappings that don't have at least 50% utilization. You won't prevent the search heuristic from visiting such mappings, but you will elide the expensive evaluation cost for these mappings.

Hello,

I have a question about how to add a min parallelism constraint (e.g., 0.5) in my innermost buffer constraints. Can you give me an example?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants