Team Members:
- Nguyen Dinh Cuong
- Nguyen Duy Anh Quan
- Nguyen Trong Nhan
- Tran Tuan Viet
- What is the smallest capacity that brings the miss rate to less than 10%?
- Answer: 2^12 B or 4 KB
- What is the smallest capacity that brings the miss rate to less than 5%?
- Answer: 2^14 B or 16 KB
- Today's processors generally have 32KB to 128KB first-level data caches. By what ratio does increasing the cache size from 16KB to 32KB reduce the miss rate? (2.0 would be halving the miss rate; 1.0 would be no change in miss rate; less than 1.0 would be an increase in misses).
- Answer: 1.62/1.34 = 81/67 ~ 1.209
- By what ratio does increasing the cache size from 32KB to 64KB reduce the miss rate?
- Answer: 1.34/1.17 = 134/117 ~ 1.145
- When deciding on an ideal cache size, engineers typically look for the "knee" of the curve. (Links to an external site.) When considering various cache sizes, we want the point at which increasing to that size yields a great benefit, but increasing beyond that size yields far less benefit. What would you say is the ideal cache size for a direct mapped cache?
- Answer: From the graph, we can see that the knee of the curve is at x = 14. Therefore, the ideal cache size for a direct mapped cache is 2^14 B or 16 KB
- What is the smallest capacity that brings the miss rate of the 2-way set associative cache to less than 10%?
- Answer: 2^12 B or 4 KB
- What is the smallest capacity that brings the miss rate of the 2-way set associative cache to less than 5%?
- Answer: 2^13 B or 8 KB
- How large must the direct-mapped cache be before it equals or exceeds the performance of the 1 KB 2-way assoc?
- Answer: The miss rate of the 1 KB 2-way set associative cache is 11.18%. Thus, the direct-mapped cache must have the capacity of 2^11 B or 2 KB (the miss rate = 12.86%) before it equals or exceeds the performance of the 1 KB 2-way set associative cache
- At what cache size do the two write policies generate approximately the same amount of writes to the bus?
- Answer: From the graph, we can see that at x = 10, or cache size = 1 KB, the two write policies generate approximately the same amount of writes to the bus
- Why does the difference between the two schemes diverge at small cache sizes?
- Answer: For the write-through cache, the traffic is the average number of bytes written by each store multiplied by the number of stores, so it will be a constant for a particular trace no matter what the cache size is. On the other hand, for the write-back cache system, the traffic is the number of dirty evictions multiplied by the block size. Because at most one eviction happens per miss, while small cache sizes result in more miss rates, thus, small cache sizes result in large write-back data. This proves that the two schemes diverge at small cache sizes.
- Why does the difference between the two schemes diverge at large cache sizes?
- Answer: We use the same reasoning of the answer to question 10. Writethrough traffic is a constant for a particular trace, while large cache sizes result in small writeback data (in this case, = 0)
- Explain the observed miss rate associated with a small block size.
- Answer: As the block size decreases, the number of cold misses increases. Thus, we have a higher miss rate associated with a small block size
- Explain the observed miss rate associated with a large block size.
- Answer: As the block size increases, the number of cold misses decreases. Thus, we have a lower miss rate associated with a large block size. The conflict miss increase is insignificant here
- What is the block size with the lowest miss rate?
- Answer: 512 B with 0.52% miss rate
- Why does the miss rate get worse with more cores?
- Answer: In a multiprocessor system or a multicore processor, each cpu core/processor comes with its own cache memory. This results in more cold misses, thus, the miss rate get worse with more cores.
- If the miss rate is so bad, why would one use the VI protocol over no protocol?
- Answer: Although the problem of cache coherency limits the performance of multicore systems, using the VI protocol is still better since the shift towards multicore will achieve continuing exponential performance gains. It's a trade off between the CPU time and the miss rate.
- What is the block size with the lowest total write-back traffic (transferred in + write-back transferred out)
- Answer: From the graph, we can see it's 2^2 or 4 B
- What are the two sources of additional traffic as the block size grows? Explain why each grows.
- Answer: The first source of traffic is the data transfers to fill cache blocks (the number of misses multiplied by the block size). The other source of traffic is write traffics (for the write-back cache system, the traffic is the number of dirty evictions multiplied by the block size). Therefore, the two sources of traffic grows as the block size grows.
- Given that current processors typically use, say, 64B blocks, which metric (miss rate or traffic) are today's caches designed to minimize?
- Answer: Average access time is the most important parameter for cache performance: tavg = thit + miss-rate * tmiss. This equation includes more than just the miss rate. A cache with a low miss rate but a large penalty per miss would perform worse than one with a greater miss rate but a significantly lower penalty per miss. It might also have a low miss rate but a long hit time (this is true for large fully associative caches, for instance). For this reason, multilevel of caching are employed. In a single cache, not all of the performance aspects can be maximized. It is challenging to build a cache that is both quick in the common case (a hit) and reduces the costly uncommon case by having a low miss rate when tmiss (memory delay) is specified. Two caches are used to fulfill these two design aims. Because the first level cache reduces hit time, it is generally small and has low associativity. The second level cache reduces the number of misses; it is typically huge, with large blocks and strong associativity. Since 64B is a typical block size for a Level 1 cache, and also as we can see from the graph, it is attempting to reduce the miss rate.
📉 Graph 5: Replicate Graph #3 (Miss Rate vs Block Size for 1, 2, & 4 cores) but this time with MSI Protocol
- Is there ever a scenario in which the VI protocol would be preferable to the MSI protocol? If so, provide that scenario. If not, explain why not.
- Answer: We don't think there is a scenario in which the VI protocol would be preferable to the MSI protocol since the VI protocol is inefficient as only one cached copy allowed in entire system. Hence, multiple copies can’t exist even if read-only. This will be a very big problem in reality
- For a 2 core trace, with a block size of 64B, what fraction of bus snoops by Core 0 are "hits" (i.e., the LD_MISS or ST_MISS on the bus is for a cache line that is currently valid in Core 0's cache.
- Answer: Fraction of bus snoops by Core 0 are "hits" = 50/365 = 10/73 ~ 0.137
- Answer: Fraction of bus snoops by Core 0 are "hits" = 348/1110 = 58/185 ~ 0.314
- What is one characteristic of the workload (visible in statistics in the output) that reconciles the high snoop hit rate and the low miss rates in Graph #5?
- Answer: It's the block size since more block size means high snoop hit rate and low miss rates