CFP: Coherence-Free Processor: April 2024

Saturday, April 27, 2024

Offering a $1 Million dollar prize

Offering a $1 Million dollar prize for a Coherent Memory Model

$1 Million will be awarded for demonstrating the Coherent Main Memory model.

The model requires a hardware and software understanding of atomic instructions.

Store data in one place!

Coherent Memory maintains data integrity without cache coherence. Cache Coherence limits the number of core processors.

Coherent Memory organizes main memory - Database managers store data in one place for performance. Computer memory should too.

Award:

$1 Million dollars contingent on and paid for by a licensing contract for any of the patents.

In addition the winner will receive all future monetary payments that are awarded to the inventor and based on one or more of the patents.

Terms and Conditions:

The inventor refers to the inventor and/or the companies owning the patents.

The patents refers to all patents, owned by the inventor and/or his companies, with a priority date prior to 4/27/2024.
The winner is the first working model received prior to the signing of the first licensing contract.
If no Step 2 model has been received prior to the signing of the first licensing contract, the first Step 1 model received will be the winner.
If the winner is also the licensee, the payments will be awarded to an unaffiliated university.
Any decision as to who should receive payments is up to the company acquiring the first license and is final.

A contingency contract is available.

Frank Yang

FrankYang43338@acm.org

FrankYang43338@gmail.com

Different Algorithms for the Same Problem

Wednesday, April 24, 2024

Software is Coherence-Free

Why?

Just like you need to add memory or drives, you need to add speed.

But you can not. Why not? It is called coherence. Computer caches must talk.

However, software tasks do not talk. Infinite tasks run concurrently on a uniprocessor.

The answer to why solves a hardware issue that predates 1965.

Since 1965 the question has been, "How do you make caches talk efficiently?"

Answer - Infinite computers do not have to talk at all.

Coherence goes poof.

Because...

In 1965, IBM announced a multiprocessor. It had two processors and two caches and the caches talked to synchronize. IBM announced it as quad, but a quad was never built. The algorithm has been optimized, but not redesigned.

In 1970 the creation of relational databases resulted in the recognition that data should be stored in one location. Store data in one place!

In 1973 the creation of the conditional compare and swap instruction enabled most software locks to be eliminated.

However work on cache synchronization, now termed cache coherence, continued with an algorithm that was conceived before 1965.

This new computer design incorporates those two software changes into the hardware, resulting in processors that do not communicate; true parallel processing. The design requires a change to the hardware implementation of the conditional compare and swap instruction.

Coherent Memory maintains data integrity without cache coherence

How

Software recognizes that some data is shared, meaning that it is R/W for other processes. The software handles this data differently. The hardware would also perform better if it handled data in two ways. Step 1 is a new allocation instruction that allocates data for either shared processing or non-shared processing. Coherence is immediately reduced because non-shared data does not require coherence. (Load balancing can be handled by cast out, but is not needed given sufficient processors.) Having sufficient processors changes everything because multitasking addresses insufficient processors. Multitasking has a queue, impacting elapsed time.

Impact

Multi-core processors can finally exceed the 1965 design limit of four. The new limit is infinite.

Step 1 - Reduces coherence.

Step 2 - Eliminates all remaining coherence.

History

Cache coherence and an interlock prevent core processors from being connected solely to the main memory bus.

The IBM manual linked below explains the entire issue on page 104. The 2nd and 3rd paragraphs explain buffer (cache) invalidation. The next to last paragraph explains the processor interlock. The interlock is for CS, CDS, and TS instructions. These are HSP instructions. Implementing an HSP that does not contain an interlock creates a coherence-free swap (CFS). However removing this interlock has been of no benefit because shared data in cache memory requires cache coherence.

However if shared data is not stored in cache memory, it does not require cache coherence. Then no interlock enables processors to connect directly to the main memory bus.

IBM 3033 Processor Complex April 1979

download IBM manual

Chronology

Chronology of disappearance of Coherence

Design Notes

Step 1 permits an exclusive cache, which requires neither coherence nor write-through.
Step 1, in conjunction with replacing the HSP with a CFS, allows the hardware to handle shared data without a cache and therefore without coherence.
Coherence is solely for hardware update integrity. Because of multitasking, software already protects from changes made by other tasks. Software update integrity only requires the interlock. Eliminating the interlock can be done either by serialization or with an uninterruptible swap in memory, but it can not be implemented in the cache because that causes coherence.
Eliminating both the interlock and cache coherence enables core processors to be connected solely to the main memory bus.
The swap could be performed by a memory processor. One would be required for each memory bank.
More cost efficient is to have an instruction that allocates swap memory so only the swap memory bank would require a memory processor..
If swap memory is only altered with a CFS, then a processor could be dedicated to handling swap instructions. This processor would be able to keep all the swap areas in its cache.
Additional latency is restricted to the CFS instructions.
Implementing step 1 alone will reduce both parts of coherence which consists of a write-through and invalidation. The benefit is expected to be great enough that that Step 2 will not need to be modeled for licensing.

back to Technical Specifications

Different Algorithms for the Same Problem

Coherent Memory Management

There are three types of main memory.

1 - Static Common Memory

2 - Protected Shared Memory

3 - Protected Swap Area

These are described in detail in the JCST article on CFPs. (Fig. 2)

Static Common Memory would consist of data that does not change dynamically, such as programs.

Protected Shared Memory is protected by software logic because it is updated by multiple processes. The software uses the CFS either as a lock/unlock or as a pointer swap. The software currently protects because of multitasking.

Protected Swap Areas are data areas used by an CFS in a conditional swap. An example would be a conditional swap to increment a counter.

All the above areas are protected by the software. The issue is that the current HSP does the swap in the cache and therefore requires coherence. All instructions are currently implemented in the cache. Doing the swap in main memory bypasses the cache and eliminates coherence, providing all other memory is protected by software logic.

Summary:
The software currently uses the HSP to eliminate coherence from the software. However the hardware instruction was implemented in the cache, therefore the implementation requires coherence. If the instruction bypassed the cache, there will be no cache coherence.

Protected Shared Memory is also updated in main memory. Therefore it has no coherence.

back to Technical Specifications

Different Algorithms for the Same Problem

Saturday, April 20, 2024

Parallel AI Processing

A computer with parallel cores can process multiple AI prompts simultaneously.

It could process multiple AI branches simultaneously.

It enables devices to contain their own core processor.

Currently parallel cores are prevented by coherence. Cores have to talk.

This design was introduced before 1965.

The new Coherence-Free Processor permits cores with no coherence.

They can share a multitasking queue without talking.

The new design allows unlimited cores to run simultaneously while sharing data.

Conversely, uniprocessors are concurrent because they must share one core through multitasking.

Current multiprocessors have binomial overhead and are typically limited to four cores.

Parallel cores reduce time waiting in a multitasking queue, reduce swapping, permit simultaneous execution, allow multiple dedicated database managers that run without locks, and permit additional cores to be connected solely to the main memory bus.

The Journal of Computer Science and Technology March 2024 article CFP: A Coherence-Free Processor Design explains reducing coherence in two steps, and the second step is 100%.

back to Technical Specifications

Different Algorithms for the Same Problem

Ethical AI can not be privately owned

Ownership creates a Conflict Of Interest

Ethical AI must be public and secure - Open Source is required for both

We can create an Open Source Ethical Constitution, for We the People

Includes the goal of eliminating hate and lies.

Compassion can measure intent. The ultimate goal is to enable a world of good intentions.

It must be modifiable by consensus, by vote

This is computers interpreting our laws written logically.
It is not creative AI, but logical AI.

It does not interpret laws, it determines the laws that apply.

A system or systems that provide:

1 - A system that creates computer understandable rules to evaluate legality and ethics.

2 - Record factual circumstances for historical purposes.

3 - Evaluate truthfulness in journalism. (Separation of Fact vs Opinion)

4 - Evaluate the impact of events based on compassion.

5 - Create a platform that facilitates informed debate.

6 - Create a consumer oriented honest internet.

7 - Freedom of speech which excludes hate and lies; such as accusatory opinions, blame not based on fact.

The entire system changes our ability to evaluate ethics and nothing else.

People will still make and enforce laws. The AI merely determines violations.

AI can also determine the effect of proposed laws.

It can find contradictions and loopholes.

Our military must be able to operate on an open source system.

Private code could potentially contain a hidden off switch.

All private software is a trojan horse.

Military programs must be coded on an open base.

What if a potential attacker gained control of Microsoft (or any company)?

Paradox - who will fund it?

Answer - It will be funded by a new computer with a new design.

All current computers are obsolete. The new design runs current software.

A car has better traction when you have tires that match the weather.

Similarly a computer can perform faster if it has two ways of handling data.

Computer programs recognize shared data and local data.

Execution can be optimized for each type of data.

Current computers are one core, though attempts have been made to run multiple cores.

The new design permits infinite cores because cores are coherence-free.

AI thought creates branches that require multiple cores.

The issue with using multiple computers is that they require access to the same data, then must coordinate results. Data ends up in different workstations.

One computer with multiple cores is the solution.

Multitasking masks the problem.

Multitasking allows tasks to share a core.

Multitasking is concurrent, because tasks wait but do not appear to wait.

Tasks don’t run slower, but they wait longer.

Multiple cores enables simultaneous execution.

Ten cores can handle ten tasks simultaneously.

Tasks do not wait.

One multitasking core takes ten times longer.

The Problem (since 1965):

Multiprocessor cores talk. This is cache coherence.

Talking increases as a binomial. This is the birthday problem. The problem balloons at 3 cores.

Breakthrough:

Solves the binomial problem by designing computers that do not talk.

Runs any current multitasking software.

The Journal of Computer Science and Technology March 2024 article CFP: A Coherence-Free Processor Design explains reducing coherence in two steps, and the second step is 100%.

Different Algorithms for the Same Problem

Two Step Alogrithm for Coherence Reduction

A multiprocessor requires coherence in order to synchronize the caches. However not all data requires coherence. If data is not shared with another task, it does not require coherence.

If we create an instruction that identifies data that is not shared with other tasks, then we have identified data that does not require coherence.

Step 1 - We identify non-shared program data with a new memory allocation instruction. This data does not require coherence. This immediately reduces coherence. (Fig. 1 in JCST article below.) Coherence occurs on every write and consists of both cache invalidation and write-through.

Step 2 - The second step handles shared data in a manner that also does not require coherence. This completely eliminates coherence.

The Journal of Computer Science and Technology March 2024 article CFP: A Coherence-Free Processor Design explains reducing coherence in two steps, and the second step is 100%.

Different Algorithms for the Same Problem

Friday, April 19, 2024

CFP: A Coherence-Free Processor Design - JCST Article

CFP: A Coherence-Free Processor Design

Link to CFP: A Coherence-Free Processor Design

Different Algorithms for the Same Problem