Current technology has one cache and ALL data requires coherence.
This proposal has two caches.
The first cache contains data that is private, it does not require coherence.
Updates to this cache are faster because it does not write-thru to main memory. It also does not require invalidation. Invalidation is binomial which limits the number of processors. Invalidation also requires physical connections to avoid the memory bus.
The second cache is virtual. It passes writes to main memory. Snoopy is write-thru and is less efficient because it also entails invalidation.
The second cache reads from main memory. Current systems read from memory on the first access, cache miss. Current systems are faster only on repeat reads. However software does not perform repeat reads on shared data because shared data is subject to change. The results are unpredictable.
Repeat reads will occur only when different processes on the same core processor
read the same shared data address; software handles these repeat reads without
coherence by being thread-safe. The software ensures every update is current.
The proposed coherence-free system is faster than current except for the speed on the atomic swap. The swap currently occurs in the cache, resulting in coherence. The proposed swap must be single-threaded in main memory, with no cache copies, eliminating cache coherence.
Only one instruction is slightly slower. And the usage of this instruction is minimized by the programmer.
To compensate, merely add another core processor. Adding core processors reduces the stack or multitasking queue. This reduces elapsed time but not execution time. Execution becomes simultaneous instead of concurrent. Cores no longer require a physical connection.
Infinite cores > four cores
Different Algorithms for the Same Problem
No comments:
Post a Comment