You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an accelerator to which the cores have streaming access. Due to this streaming access nature, I want to make streaming accesses to the accelerator's memory region L1 and L2 cacheable, but L3 uncacheable to not pay the L3 tag lookup latency.
The current classic cache design assumes that a memory address is either cached in all cache levels or not cached because its memory region is tagged as uncacheable by the page table. I found that simply excluding the address range of the non-L3-cacheable memory region from L3 is not working. It's because there might be multiple accesses from multiple L2 caches at the same time, and if L3 does not coalesce them, there will be multiple outstanding accesses from L3 in the membus, which results in a fault.
In addition, when membus snoops the L3, it will not forward the snoop to its upper-level caches because it thinks that the address is also uncacheable in all the above caches.
What would be the best way to approach this problem? Any insights or discussions would be appreciated.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi everyone,
I have an accelerator to which the cores have streaming access. Due to this streaming access nature, I want to make streaming accesses to the accelerator's memory region L1 and L2 cacheable, but L3 uncacheable to not pay the L3 tag lookup latency.
The current classic cache design assumes that a memory address is either cached in all cache levels or not cached because its memory region is tagged as uncacheable by the page table. I found that simply excluding the address range of the non-L3-cacheable memory region from L3 is not working. It's because there might be multiple accesses from multiple L2 caches at the same time, and if L3 does not coalesce them, there will be multiple outstanding accesses from L3 in the membus, which results in a fault.
In addition, when membus snoops the L3, it will not forward the snoop to its upper-level caches because it thinks that the address is also uncacheable in all the above caches.
What would be the best way to approach this problem? Any insights or discussions would be appreciated.
Best,
Beta Was this translation helpful? Give feedback.
All reactions