Thread-Safe Algorithm - A Single-Threaded Swap Ensures Currency
CMPXCHG is NOT atomic. Its is pseudo-atomic.
This compares software requirements for thread safety with current hardware design and shows how to implement thread-safe hardware.
Thread-Safe Software Rules:
1 - Private data is protected from other tasks. (re-entrant memory allocation)
2 - Shared data is protected by data swap, pointer swap, or lock (i.e. mutex, semaphore).
3 - Swap data is the memory location that ensures currency.
4 - Shared data is either a swap data address or protected by a swap data address.
5 - The swap must occur at the swap data address or single-thread to test currency.
6 - The swap tests the currency of the proposed update and finalizes if current.
Result - The result is no task limit.
Current Hardware scorecard:
1 - Currently not identified.
2 - Currently not identified.
3 - Identified by the swap instruction.
4 - Currently not identified.5 - The swap currently single-threads with a cache copy and not in main memory. (pseudo-atomic)
6 - The swap currently requires coherence to ensure cache copy currency required by the swap.
Result - Coherence imposes a processor limit.
Thread-Safe Hardware checklist:
1 - Identified by allocation instruction. (cache)
2 - Identified by allocation instruction. (memory)
3 - Identified by the swap instruction. (swap area)4 - Identified by allocation. (memory and swap area)
5 - The swap single threads in shared main memory. (atomic Coherence-Free Swap)
6 - The swap has no coherence, it is atomic.
Result - The result is no processor limit.
#1, #2, #4 enable a cache.#5 & #6 are required for thread-safe.
Software Phase 1: Fix the swap and run without a cache. (all allocation is for shared memory)
Software Phase 2: Modify allocation of private data to a cache that does not require coherence. Some allocations might perform faster as two allocation instructions.
Software Phase 2: Modify allocation of private data to a cache that does not require coherence. Some allocations might perform faster as two allocation instructions.
The journal article presents the allocation first because it immediately reaps a performance benefit by reducing coherence.
Speed Comparison - Why thread-safe computers are faster even at two cores.
No comments:
Post a Comment