I think its time to design a new hardware architecture that can eventually replace x86 as the dominant Instruction set architecture (ISA) for high performance computing. In this post i want to outline my reasoning for why this should happen and why it should happen now.

Historically, x86 has won out for three main reasons, Intel’s superior fabs, the scale of the x86 market, and Microsoft reluctance to support other instructions sets. When someone came up with something better (Like Alpha), the market size of x86 and the huge investments made in to it, would make sure that the advantage didn’t last long. Intel, even with a worse instruction set, could simply clock its CPUs so much faster making any instruction-per-cycle advantage irrelevant. This is no longer true.

A lot of attention have been given to Apples M1 architecture. Apple has an advantage using a newer ISA then x86. But the fact that a 30 year old architecture (Arm) has advantages over a 40 years old architecture, should neither surprise nor impress anyone. (It would however surprise me greatly if Apple makes the investments needed to make their architecture competitive on the high end, given how small they are in that market). Arm, while newer then x86, is essentially built with the same basic constraints: limited number of transistors. While Risc-V has gathered a lot of excitement, because of its open nature, its design mirrors old architectures in that it aims to be simple rather then fast.

So why is it time to design a new ISA right now? I think its time to redesign something when the constraints of the original design, are markedly different from the current constraints, and you can see that the new constrains will remain for the foreseeable future. Design decisions where made at the time, because of the limitations of the time. Today we are in a very different situation from when x86, Arm, and PowerPC was conceived:

-Single threaded performance has hit the ceiling. While computers are getting faster as a whole getting more cores and special hardware like GPUs, ML units, and video en/decoding, and so on, the vast majority of software is single threaded and runs on the CPU. many problems can’t be parallelized effectively. Even when software makes use of multiple cores or the GPU, a single thread acting as a job dispatcher can often be the bottleneck. This means that increasing single core performance would have an outsized impact on how fast the computer is in practice. A computer with half as many cores but with 50% more performance per core, would be a much more desirable in most cases even if it has a 25% lower theoretical performance.

-Most older designs where bound by transistor count, where as today we have so many transistors available, that spending more transistors on a core has diminishing returns. That’s why we instead go multi core. If we designed an ISA today we would do so with the assumption that we have a lot of transistors, and are likely to get more.

-Frequencies are not going up any longer mostly due to heat dissipation issues, so a design with better instruction-per-cycle would have a more permanent advantage.

-Memory access (especially latency) has become a limiting factor of real world performance. A design that has memory access designed from the ground up for a non-uniform-memory(NUMA) access models, with cashes, stacks in SRAM, more/different registers, memory synchronization and prefetching at its core, would enable many new innovations.

-A good ISA used to be one that was good for humans to write assembler to, but almost no one does that today. A good ISA today is one that a compiler can write better code for. What is clean and simple for a human to make use of is not the same as whats good for a computer to make use of.  

-A very large limiting factor is the CPUs ability to reason about out-of-order execution. Currently the ISA provides very little semantic information to aid in this. A new ISA and language constructs along the lines of “restrict” could aid both compiler and CPU designers reach higher performance.

-So much software and infrastructure we use today is opensource, therefor a new ISA would very quickly gain a working software stack. One could imagine a working GCC/LLVM compiler and a Linux port fairly quickly. Microsoft has also shown their willingness to support other ISA then x86, and their modern code base is designed for multiple ISAs.

-x86 has a lot of old stuff that currently is needed in order to be backwards compatible. (MMX!) Removing this would save transistors and “dark silicon”.

-Modern CPUs, have advanced branch prediction, pipelining, decoding and a lot of other hardware designed to turn the existing ISA in to something the CPU more effectively can use. The Itanium architecture, tried to move a lot of this logic in to the ISA. The problem with that is that the ISA works for only one specific hardware implementation. What we need is the opposite: an ISA that unleashes the creativity of chip designers, and gives them the tools they need to innovate further.

How would this happen?

I would prefer to see an organization set up and funded by the industry, mainly Intel, AMD and Microsoft. They would create a small group of independent engineers (preferably lead by a industry heavyweight like Jim Keller) who would go off and design the new ISA. Then each IHV could go off and make their own hardware implementation and compete in the market for the best product. The ISA would only be licensed for hardware implementation to the participating companies, for a few years so that investing companies could recoup their investment, and then made freely available.

Eskil Steenberg