Thursday, July 18, 2024

An Alternative to Coherence

 

Thread-Safe Hardware
 
The current Thread-Safe software algorithm:
The atomic swap (CMPXCHG) finalizes associated changes.
Before the swap is executed, the software finalizes all data protected by that swap.
Thread-safe algorithm: Lock, finalize all changes, then unlock.
 
Thread-safe hardware part 1 (update timing):
Problem 1: Write-through occurs prior to execution of the swap.
Explanation 1: Multitasking software is thread-safe. The software resolves the data race. It resolves all timing issues. Therefore write-through can occur prior to the atomic swap that finalizes the change. The software prevents partial changes from being overwritten by another process.
Result: Current write-through of protected data preserves thread-safe software logic.
 
Thread-safe hardware part 2 (coherence):
Problem 2: Currently writes in a multi-core processor require binomial invalidation of all other caches.  
Solution 2: This can be prevented if reads of protected shared data bypass the cache.
Result: Read-through of protected data eliminates coherence.
 
Read-through eliminates coherence because thread-safe software resolves timing issues.
However there are four hardware issues and one software issue.
Issue 1 - The computer can not identify protected shared data. This is data protected by thread safety from concurrent updates by multiple processes. It is data protected by an atomic swap.
Issue 2 - Currently all instructions, including the swap, occur in the cache. The swap is pseudo-atomic.
Issue 3 - Cache line currency.
Issue 4 - Software update currency.
Issue 5 - Performance impact.

Solution to 1 - A) Allocate shared data separately. Make all shared data bypass the cache. This eliminates the need to invalidate other caches. Since re-accessing shared data has unpredictable results, storing shared data in a cache is of no benefit. It could only benefit another process using the same cache. Because that process is also thread-safe, it resolves any timing issues.
Solution to 2 - B) Modify atomic swaps to serialize in main memory. The combination of (A)&(B) eliminates cache invalidation. No shared data resides in a cache.
Solution to 3 - Step 1 (A) enables a private cache that has neither write-through nor transmits invalidation nor receives invalidation. Step 2 (B) requires that cache-lines only write-through changes. Reads of shared data fetch for one instruction.
Solution to 4 - Current software is thread-safe. Thread-safe resolves timing issues without coherence.
Solution to 5 - See (A). Re-accessing shared data has unpredictable results, storing shared data in a cache is of no benefit to each process.

Thread-safe hardware part 3 (atomic):
Problem 3: Currently the pseudo-atomic swap is executed in the cache.
Solution 3: Serialize the swap in main memory. This makes the hardware thread-safe.
Result: Execute-through of the atomic swap preserves main memory integrity without coherence.

Thread-Safe Computer:
1 - Differentiate shared from private data at allocation.
2 - Private data cache uses neither write-through nor invalidation.
3 - Shared data cache is pass-through.
4 - Serialize the atomic swap in main memory. It is execute-through.
 
If you bypass the cache, there is no cache coherence.
Only place private data in the cache.
 
Benefit 1 - No coherence. No processor limit.
Benefit 2 - Private data requires neither write-thru, nor cast out, nor give invalidation nor receive invalidation.
Benefit 3 - Shared data becomes contiguous eliminating write-thru for private data.
Benefit 4 - Runs existing multitasking software without any modification albeit no cache.
Benefit 5 - Enables simultaneous execution of the concurrent multitasking queue, reducing elapsed time.
Benefit 6 - Cores access only main memory, eliminating physical connections between caches.
Benefit 7 - GPUs can come with their own core processor. Cores are independent. The caches collaborate.
Benefit 8 - The operating system can delegate to multitasking multi-cores with a pointer swap.
Benefit 9 - Many applications contain only private data. (This should be the default allocation.)

Requirement 1 - Software allocation instruction that differentiates shared data from private data.
Requirement 2 - Either (A) place swap capability on each main memory chip or (B) serialize atomic swap instructions. Usage of atomic swaps is currently minimized. The software avoids contention for one resource. The swap often eliminates the lock by serializing through one virtual data address. Current software permits the swap to occur at any main memory location because it may reside in a contiguous record. 
 
Option (C) would be another instruction to allocate swap addresses and contiguous locations, however this would require programming. Software currently does not distinguish swap areas from shared. 
However it already distinguishes shared from private. The software treats them differently; the hardware should too.


No comments:

Post a Comment

Thread Safe Computers

  The invention redefines thread safe.  For comparison, two current definitions are shown below. Due to blog incompatibility current version...