on read - cache-hit - read from cache.
on read - cache miss - read from memory and store in cache.
Overhead increases as a binomial with the number of caches.
Two algorithms that synergize:
Different Algorithms for the Same Problem
The coherence algorithm is to invalidate other caches. This algorithm increases as a binomial with the number of caches. This algorithm resolves (4) which is the current definition of cache coherence.
The thread safety algorithm is to use an atomic uninterruptible swap to ensure update currency. This algorithm increases linearly with the number of processes. This algorithm renders (4) irrelevant. The software resolves the data race in (3).
Thread safe software provides update integrity for multiple records via either single thread or via mutex. Coherence synchronizes the caches with invalidation. However if you bypass the cache for shared data, then you do not need the cache invalidation which is coherence. Thread safe hardware requires bypassing the cache and performing atomic instructions in memory instead of in the cache. However this removes the cache. To create a cache that does not require coherence, merely designate at allocation that data IS private. The private cache does not require coherence and it does not even require write-through.
Coherence has many issues that thread safety resolves. The most important is that coherence is binomial with the number of processors while thread safety has linear contention for one record. Binomial overhead is why current multi-core is restricted to four. (*The fifth processor increases coherence overhead by 67%.) Hardware thread safety can not be implemented because the uninterruptible swap occurs in a cache. It is pseudo-atomic.
The thread safety algorithm requires that the uninterruptible swap
is single threaded. This is ensured by executing at a single location which is the shared resource location to ensure update currency. A lock will suffice, because the lock is an uninterruptible swap. The software single threads through one shared resource location but the hardware can not. All current instructions are executed in the cache.
The change is to the hardware implementation of instructions. The software does not change.
* Each conversation is a conduit for invalidation. The fifth processor adds 4 conversations to the system while a 4 processor system has 6 conversations.
Different Algorithms for the Same Problem
This is why infinite JVMs can execute concurrently, but multiprocessors are limited to 4.
Because JVMs are thread-safe but current hardware is not.
Thread-Safe Algorithm - A Single-Threaded Swap Ensures Currency
This compares software requirements for thread safety with current hardware design and shows how to implement thread-safe hardware.
Different Algorithms for the Same Problem
Patent Pending
The invention redefines thread safe. For comparison, two current definitions are shown below. Due to blog incompatibility current version...