The Taub Faculty of Computer Science Events and Talks
Ella Sheory (M.Sc. Thesis Seminar)
Tuesday, 04.04.2023, 14:30
Advisor: Prof. Dan Tsafrir
The translation lookaside buffer (``TLB’’) is a small cache that accelerates virtual to physical address translation, which processors typically manage with variants of the least recently used (``LRU’’) algorithm. Although LRU is simple, it is suboptimal for some workloads. Our analysis shows that if the processor uses the optimal---but impractical---Belady algorithm instead of LRU, runtime improves by up to 15% (and 5% on average) over LRU in single-thread (``ST’’) mode. Runtime further improves by up to 23% (yet still 5% on average) in simultaneous multithreading (``SMT’’) mode, where the TLB is competitively shared between two threads.
Given this background, we observe that while past research developed various practical caching algorithms to outperform LRU, such research was typically evaluated in the context of regular data caches rather than the TLB. Our goal in this study is therefore to investigate whether advanced practical caching algorithms improve TLB performance and, if so, to what extent they can approach Belady performance.
To this end, we classify existing caching algorithms into three groups based on their additional storage requirements: small (less than 10% of the LRU-managed TLB), medium (between 10% and 50%), and high (more than 50%). From each group, we select a representative algorithm. We find that the representatives of the small and medium groups improve performance by only 1%, on average (and no more than 11%). We also find that the third algorithm's complexity and sophistication are unjustified, as we can achieve the aforementioned optimal Belady improvement by handing the associated additional storage to the simpler LRU baseline. We thus conclude that no existing practical caching algorithm can meaningfully improve LRU TLB performance.