Saturday, July 27, 2024

Thread Safe Hardware

Two algorithms that synergize:

First algorithm (software):
The thread safety algorithm enables multiple software processes to share data.
The same algorithm could enable multiple processors with caches to share memory.
There are three types of data needed by thread safety - private data, shared data, and swap data. 
1 - Thread safety allocates private data so it will not be altered by other processes. This is data that can be changed without a mutex.
2 - Thread safety finalizes shared data changes while it is under mutex control.
3 - Thread safety serializes updates to a swap address with an atomic swap that performs a mutex, data swap (counter), or pointer swap.
 
First Approach:
1 - Thread safety requires the ability to recognize private data. This could be done at allocation. The data can be placed in a virtual cache for identification. This data is not shared and does not require coherence.
2 - The current software logic finalizes shared data prior to mutex release. However finalizing must be in main memory and not in a cache. Finalizing in a cache requires coherence.
3 - The current atomic swap is pseudo-atomic because it occurs in a cache. The swap must serialize in main memory. This creates a true atomic swap.
This approach requires only one software change. It requires an allocation instruction to identify private data. This change is for performance only. The approach requires changes to hardware instructions which must access cache memory or main memory depending upon whether data is private or shared. Results must be stored at the corresponding output memory address. The atomic swap must be execute-through. It must be atomic and serialized by single threading through the swap address.
 
Second algorithm (hardware):
Cache coherence provides the consistency of data. The software delivers logical consistency.
Cache coherence merely delivers the values that the software expects; hardware consistency.
Cache coherence provides no software logical consistency
When the hardware is thread safe, it provides hardware consistency and cache coherence disappears. 
 
Current Snoopy Protocol (performed on every instruction for ALL data, circa 1983)
on read - cache-hit - read from cache.
on read - cache miss - read from memory and store in cache.
on write - write-through and invalidate in EVERY cache.

Overhead increases as a binomial with the number of caches.
Invalidation occurs on writes for ALL data (because shared data is not identified).

Proposed Thread Safe Hardware Protocol:
Private data (updated without mutex) - private cache - no protocol.
Shared data (protected by mutex) - read-through and write-through - no protocol.
Swap data (used by mutex) - execute-through - serialized in memory.
 
A) The software serializes all shared updates through a mutex. B) Coherence synchronizes the multiple caches that could store the data. C) If the hardware stores all data in only one location, then the mutex will serialize all updates without coherence. 

All java concurrency programmers know A. All computer architects know B. C requires 1) an allocation instruction that identifies shared data and 2) a mutex that serializes in main memory without coherence.
 
No cache for shared data means no cache coherence.
All shared updates must serialize through a swap to ensure currency.
 
 

 

An Alternative to Coherence

Different Algorithms for the Same Problem

 



No comments:

Post a Comment

Thread Safe Computers

  The invention redefines thread safe.  For comparison, two current definitions are shown below. Due to blog incompatibility current version...