Evaluating Uniform Memory Access Mode on AMD's Turin

twoodfin an hour ago

I’m guessing this is less about average latency / throughput tradeoffs and more about providing predictable performance (all else being equal) for both.

Plenty of legacy software out there that a) will never be optimized for NUMA b) scales via more cores touching more shared memory c) needs to hit SLAs & performance beyond that is effectively wasted.