Table of contents
Computer Architecture describes the attributes of a computer system which are visible to a programmer or, the attributes that have a direct impact on the logical execution of a programmer. Multi-core processors are fast becoming more of the rule rather than the exception in our industry. Convergent processors, mixtures of both the standard MCU and DSP worlds, are also starting to become prevalent. And, these are just the beginning of what our industry is turning to in order to be able to deliver increasing levels of services in ever shrinking packages. While 8-bit and 16-bit processors still have a place in certain applications, the industry is increasingly moving to 32-and 64-bit processors. In some cases, it can be difficult to justify the transition to 64-bit processors other than to say that we do it because they are the best processor the industry has to offer. Of course, software always has a way of consuming as much CPU horsepower and memory as we are willing to throw at the problem. On the large-end of the embedded spectrum, we have the carrier-grade systems. Communications switching systems and the like are huge consumers of CPU and memory. In this market, it’s not uncommon to see hyper-threaded, symmetric multiprocessing engines with 16Gbytes or more of RAM and terabytes of disk space. This can be seen in products like Tyan’s Thunder quad Opteron server board. Even battery-operated devices are becoming increasingly complex. This can easily be seen in portable game machines and cellular telephones. The convergence of the cell phone as camera, PDA, game machine and streaming media device is forcing developers to resort to multiple cores to be able to meet performance goals while maintaining battery life.
Intel is the world’s oldest and most established microprocessor company, which produces the world’s most popular microprocessor chips. It is best known for its PC processors. Nowadays, Intel devices are used in every electronics related fields such as automotive, industrial, automation, robotics, image processing, networking, encryption, military, medical, energy, and other industries.
Intel Architecture consists of a combination of microprocessors and supporting hardware. The first Intel 4004 microprocessor chip was made in 1971 and since then Intel has produced an unbroken series of upgrades and improvements to the world’s best known microprocessor family. The following are some of the Intel microprocessor chips advancements:
Intel 4004 was designed to work with three other microchips, the 4001 ROM, 4002 RAM, and the 4003 Shift Register. The 4004 performed the calculations and was mostly reside inside calculators and other computing devices with its max clock speed of 740 Khz. An improved variation of 4004 was 4040 which performed the similar function as 4004 with extended instruction set and higher performance.
8008 and 8080
A new line of 8-bit processors 8008 which came in 1972 followed by the 8080 in 1974 and 8085 in 1975 were introduced. 8008 was faster than 4004 with the ability to process data in 8-bit chunks. It used a 10-micrometer transistor technology but it was clocked rather conservatively between 200 and 800 kHz. The 8008’s performance did not attract many system developers.
8080 expanded on the design of 8008 by adding new instructions to 6-micrometer transistors. It was far more successful than 8008. The clock rates are doubled and the highest-performance 8080 chips in 1974 came running at 2 MHz. Since 8080 was used on countless devices, several software developers to focus on software for Intel’s processors such as the Microsoft.
8086 was released eventually and to maintain backward compatibility it was made source compatible with the 8080. As a result, the 8080s and key hardware elements have been present inside of all x86-based processor and 8086 software can technically still run on any x86 processor. A less expensive and higher-clocked variant of the 8080 was the 8085 which was highly successful but was less influential.
8086: The Beginning Of x86
The first 16-bit processor was the 8086, which helped to boost performance considerably as compared to earlier designs. It was clocked higher than the 8088 which was the budget-oriented. It had a 16-bit external data bus and a longer 6-byte pre-fetch queue and was also able to run 16-bit tasks. The address bus was extended to 20-bit and enabled the 8086 to access up to 1Mb of memory. Hence, the performance of the system increased. The 8086 also became the first x86 processor.
Intel also produced the 8088 around the same time. 8088 was based on the 8086, but with half as many data lines and a pre-fetch queue of 4-byte. However, this had caused a loss of balance as the narrower bus cut into instruction fetch rate, forcing Intel's execution unit to idle much of the time. It was quite slower than 8086 although it still had access to 1Mb RAM and ran at higher frequencies.
- iAPX 432
iPAX 432 was developed to diverge from its x86. It took CISC to a whole new level of complexity. Intel expected iAPX 432 to be several times faster but the processor ultimately failed due to some major design flaw. It was quite data hungry and failed failed to perform well without extremely high amounts of bandwidth and eventually it was abandoned.
- 960: Intel's First RISC
960 was a 32-bit superscalar architecture that used Berkeley RISC design concepts. It was clocked relatively low, with the slowest model running at 10 MHz, but over the years it was improved and supported 4GB of protected memory.
- 80486: Integrating The FPU
the first x86 CPU to contain L1 cache was the 80486. The integration of its components into the CPU The 80486 was tighter and was its key to success. Intel also incorporated the FPU into the CPU and thus, the latency dropped sharply.
- P5: The First Pentium
The Pentium used Intel's first x86 superscalar design, the P5 architecture. Its most prominent feature was a substantially improved FPU. L1 cache size was also increased on Pentium Processors.
- P5 And P6: Celeron And Xeon
Intel introduced its Celeron and Xeon product lines. These products used the same core as the Pentium II or Pentium III, but with varying amounts of cache. Celeron-branded processors were based on the Pentium II had no L2 cache at all and this resulted in horrible performance. The Pentium III based models had half of the L2 cache disabled compared to their Pentium III counterparts. This had resulted in Celeron processors to use the Coppermine core containing just 128KB of L2 cache. Models based on Tualatin increased this to 256KB. The first Xeon processors contained more L2 cache. The Pentium II-based Xeon processors contained 512KB, whereas higher-end models could have up to 2MB.
- Core: Core 2 Duo
Intel was able to push Core into service on mobile systems as Core proved to be highly scalable with TDPs as low as 5W and high-end servers with 130W ceilings. The dies used were built using two CPU cores, and quad-core designs used two dual-core dies on an MCM. Single-core versions, meanwhile, had one core disabled. L2 cache size ranged from 512KB up to 12MB. Intel could again compete against AMD with the improvements made to the Core architecture.
Skylake-based CPUs were Intel's fastest to date. The first consumer-oriented CPU to use DDR4 memory was Skylake, which is more energy-efficient than DDR3 and capable of enabling greater throughput. The Skylake platform also contained a number of improvements, such as a new DMI interface, an upgraded PCIe controller, and support for a much wider array of connectivity devices. Skylake included a better on-die GPU as well. Iris Pro Graphics 580 was known as the highest-end model , and it was deployed to certain Skylake-R CPUs. The Iris Pro Graphics 580 engine featured 72 EUs and came paired with 128MB of L4 eDRAM. Most other Skylake-based chips included HD Graphics with 24 EUs, based on a design similar to Broadwell's.
Intel New Architecture for 2019
With the releasing of Core and Xeon chips built around a new architecture, Intel will add a bunch of new instructions on the chip to accelerate workloads such as cryptography and compression. Previously, Intel processors under the Core and Xeon brands have been based on the Skylake architecture. Skylake was originally intended to release on its 14nm manufacturing process and Cannon Lake on its 10nm process. An enhanced microarchitecture to be built on 10nm process is Sunny Cove and is still derived from Skylake. It has been proved to execute more instructions in parallel and with lower latency, and certain buffers and caches have also been enlarged. Also, the level 1 data cache is 50 percent larger than that in Skylake. Skylake has two reservation stations dispatching instructions across eight ports with a maximum of four instructions dispatched per cycle whereas, Sunny Cove has four reservation stations, ten ports, and up to five instructions per cycle. Sunny Cove have two extra units capable of handling LEA instructions that perform various various arithmetic instruction as well as calculating memory address and another for vector shuffles.
This extract greater parallelism. More out-of-order instructions in flight are enabled since the reorder buffer is larger, and store buffers are also larger, enabling more in-flight memory operations. AVX-512 spans many different extensions and capabilities; some are general-purpose vector arithmetic, others are specialized for workloads such as neural networks. Sunny Cove in addition, will include new instruction for accelerating encryption and data compression workloads which are responsible for 75-percent performance improvement. Like the oddball Cannon Lake processor that's built on 10nm and shipping in limited quantities, Sunny Cove includes support for AVX-512 instructions. Sunny Cove also makes the first major change to x64 virtual memory support in 2003 but they actually only contain 48 useful bits of information. Bits 0 through 47 are used, with the top 16 bits, 48 through 63, all copies of bit 47. This limits virtual address space to 256TB. A page table structure with four levels is used to map the virtual addresses to the physical addresses with physical memory addresses also limited to 48 bits.
These limits have been removed by extending Sunny Cove external virtual addresses to 57 meaningful bits with top 7 bits again either all zeroes or all ones, copying bit 56), with physical memory addresses of up to 52 bits handled by a fifth level in page table. The new limits enable 128PB of virtual address space and 4PB of physical memory.
The Skylake various iterations have given us improved clock speeds and larger core counts.
AMD was founded in Sunnyvale in the year 1969, with a budget of $100.000. Its first proprietary product was Am2501. In 1975 it entered the RAM Market and produced Am2900 family. AMD 386 microprocessor family was introduced in 1991 and in 1993 the first AMD 486 was produced under AMD and Fujitsu Establishment.
The following are some of the AMD microprocessor chips advancements:
- AMD K5
K5 was the first in-house AMD x86 processor and was launched in 1996. The “K”was actually a reference to “Kryptonite”. K5 was actually based on an internal highly parallel 29k RISC processor architecture with an x86 decoding front-end. All K5 models had 4.3 million transistors with 5 integer units that could process instructions out of order and one FPU. The K5 lacked MMX instructions, which Intel started offering in its Pentium processors that were launched in early 1997. The K5’s FPU had around 10% less performance clock.
- Duron k6
AMD k6 was introduced in 1997. It was based on NexGen’s RISC core and on Nx586 core. It had an 84KB L1 cache, 8.8 million transistors, 0.25 micron process. It was much better than K5 as RISC86 core translates x86 complex instructions into shorter ones allowing AMD to reach higher frequencies and had MMX instructions.
- Athlon K7
The first AMD Athlon K& was introduced in 1999. It was based on K6 core and its FPU was improved. It has a 128 KB L1 cache, 8.8 million transistors,0.25 micron process, fast upto 1GHz clock speed. Athlon XP was designed for desktop, Athlon XP-M was designed for laptop and Athlon MP for server.
- Opteron-Phenom K10
On April 21st,2005,AMD released its first dual core Opteron which was an x86-based server CPU. A month later, the first desktop-based dual processor family, the Athlon 64 X2 came. K10, the AMD microprocessor architecture, became the successor to the K8 microarchitecture and the first processor were introduced in September 10th 2007, consisting of nine quad-core Third Generation Opteron processors. The K10 processors come in dual, triple-core and quad-core versions with all cores on one single die.
AMD Launches 2nd Gen Ryzen Pro and Athlon Pro APUs
AMD introduced new processors, the new AMD Ryzen Pro 3000 series as well as AMD Athlon Pro 300-series processors pack upto four x86 cores as well as AMD’s Radeon Vega integrated graphics. AMD said that laptops powered by its latest Ryzen Pro APUs will work upto 12 hours because of its improved power efficiency.
AMDs new Pro-series processors are essentially the Ryzen Mobile 3000-series APUs and made using GlobalFoundries’ 12LP process technology. They have numerous features supported by AMD’s Pro-series products such as built-in TrustZone security processor, DASH manageability, Secure Boot, Content Protection,pre-Application security, fTPM 2.0, Transparent Secure Memory Encryption(TSME) and some other technologies. These are the key features which differentiate the AMD’s Pro from the company’s regular processor for client PCs.
The AMD’s 2nd generation Ryzen Pro Mobile processors family includes four models: the Ryzen 7 Pro 3700U, the Ryzen 5 Pro 3500U, the Ryzen 3 Pro 3300U, and the Athlon Pro 300U. The Ryzen Pro-branded parts, with or without SMT, feature four cores whereas the Athlon Pro device has two cores. Radeon Vega Gpu are integrated in all of the APUs and the new APUs feature a TDP of up to 15 W and are therefore aimed at ultra-portable laptops.
Difference between Intel and AMD
The new Intel 9th generation has arrived in the consumer PU space and AMD’s Ryzen 3 will also be launching soon this summer. Intel 8th generation Coffee Lake offering were not a massive improvement with the removal of hyperthreading from the high mid-range and the inclusion of the 9th generation i9 as the highest tier of consumer desktop processor. Although, AMD's Ryzen 2 series put in an extremely strong showing at the start of this year, and still stacks up well against Intel's top offerings but is still being beaten on single core performance. However, AMD offers a much better value proposition with the fact that very impressive air coolers come boxed with the CPUs. The AMD processors can easily handle extremely demanding workloads ,but there is no doubt that Intel is the better performer, but with a cost. Both AMD and Intel are rushing to produce stable chips on smaller nano-technology, and the next battleground will be 10nm or 7nm technology and the first to do so will hold a massive advantage over the other.
- AMD Puma+
Puma is part of the 'Cat cores' family intended to go into Tablets and small laptops. It came after bobcat, jaguar and puma and is probably the only design win AMD has had in a decade. Puma+ is a small core which can fetch 2 instructions per clock and execute them out-of-order. AMD’s second biggest victory was beating Intel's Silvermont core in IPC and in absolute performance and was doing that on an older 28nm process vs Intel's 22nm finFET process. AMD’s Jaguar also made it to 25-30 million Xbox Ones and PS4s.
- Intel Silvermont
Intel Silvermont is a competent mobile core supposed to go into smartphones and tablets. it has proved itself to be a worthy adversary in the mobile space where it goes head to head with ARM's CPUs.
- AMD Steamroller
Although the IPC was a total waste, the AMD were also so stubborn to go on building on that mistake for 4 years (bulldozer, piledriver, steamroller, excavator). The users ended up with a CPU with IPC barely better than the 5 year old K10 cores in Phenoms and the thermal requirement of an 8 core CPU is desperately clocked at 5.2 GHz which is so high (220 watts) that no Motherboard manufacturer would openly support it.
However they had their estimates turned out to be true. Steamroller is a clever and very unique design with 2 cores inside a core. They made a shared Float processor and a frontend to two integer cores as they guessed that frontend and float processors are not used as frequently as the integer cores. Clustered Multi Threading (CMT) fetches alternatively from two threads and feeds it into their own designated smaller partitions of execution resources called clusters instead of fetching from two threads alternatively every clock cycle and feeding them into the same large pool of execution resources. This resulted into fetching from one cluster for each thread vs Intel's one cluster for both threads. Eventually they started marketing as 4 modules and 8 cores instead of 4 multi-threaded cores after realizing that their design wasn't as good as they thought and also because 8 'real' cores sounds better than 8 threads.
Intel Skylake core is the part of the winning streak Intel started with the Nehalem back in 2009 and stands the testimony to the philosophy that 'Bigger is better'. This time Intel went overboard with the idea in anticipation of AMD Zen which could possibly turn the heat up, going wider and wider. This makes AMD to be on par with Intel in process technology since both of them plan to make 14nm chips and it will be more or less the same then onwards. The Skylake had 20% better IPC than its predecessor. To accommodate these changes in the given transistor budget, Intel even compromised on the cache hierarchy.
Intel 14nm vs AMD 7nm
Since Intel has announced that its 10nm processors would not be available until the end of 2019, the company plans to continue with 14nm processors even as it faces severe shortages due to unanticipated demand. Intel brought out its third-gen 14nm Coffee Lake processors to help plug the gap. Intel has its new 'HK' chips in the hoppe and these chips are based on the existing Coffee Lake processors but come without integrated graphics. AMD recently announced its Zen 2 microarchitecture paired with the 7nm process in the EPYC server chips giving the company the first process lead over Intel in its history. That means Zen 2 architecture is designed around its 7nm process. Thus, the benefit of the smaller process node could bring chips with higher core countsor processors that generate less heat and suck less power.
Cite this Essay
To export a reference to this article please select a referencing style below