Superscalar Processor : Architecture, Pipelining, Types & Its Applications

In this world everyone wants to get their things/works done fasts. Isn’t it ? From Cars to industrial to household machines everyone wants them to work faster. Do you know what’s sitting inside these machines making them to work ? They are processors. They may be micro or macro processors depending on the functionality. The basic processor in general executes one instructions per clock cycle. In way to improve their processing speed so that the machines can improve their speed came into being is, the superscalar processor which has pipelining algorithm to enable it to execute two instructions per clock cycle.  It was first invented by Seymour Cray’s CDC 6600 invented in 1964 and was later enhanced by Tjaden & Flynn in 1970.


The first commercial single-chip superscalar microprocessor MC88100 was developed by Motorola in 1988, later Intel introduced its version I960CA in 1989 & the AMD 29000-series 29050 in 1990.  At present, the typical superscalar processor used is the Intel Core i7 processor depending on the Nehalem microarchitecture.

Even though, the implementations of superscalar are heading toward enhancing complexity. The design of these processors normally refers to a set of methods that permit the CPU of a computer to attain a throughput of above one instruction for each cycle while executing a single sequential program.  Let’s further see in this article the SuperScalarprocessor architecture which reduces its execution time and its applications.

What is Superscalar Processor?

A type of microprocessor that is used to implement a type of parallelism known as instruction-level parallelism in a single processor to execute  more than one instruction during a CLK cycle by dispatching simultaneously various instructions to special execution units on the processor. A scalar processor executes single instruction for each clock cycle; a superscalar processor can execute more than one instruction during a clock cycle.

The design techniques of superscalar normally comprise parallel register renaming, parallel instruction decoding, out-of-order executions & speculative execution. So these methods are normally used with complementing design methods like pipelining, branch prediction, caching & multi-core within current designs of microprocessors.

Superscalar Processor
      Superscalar Processor

Features

The features of superscalar processors include the following.

  • Superscalar architecture is a parallel computing technique utilized in various processors.
  • In a superscalar computer, the CPU manages several instruction pipelines to perform numerous instructions simultaneously during a clock cycle.
  • Superscalar architectures include all pipelining features although there are several instructions executing simultaneously within the same pipeline.
  • Superscalar design methods normally comprise parallel register renaming, parallel instruction decoding, speculative execution & out-of-order execution. So, these methods are normally used with complementing design methods like caching, pipelining, branch prediction & multi-core in recent microprocessor designs.

Superscalar Processor Architecture

We know that a superscalar processor is a CPU that executes above one instruction for each CLK cycle because processing speeds are simply measured in CLK cycles for each second. Compared to a scalar processor, this processor is very faster.

Superscalar processor architecture mainly includes parallel execution units where these units can implement instructions simultaneously. So first, this parallel architecture was implemented within a RISC processor that utilizes simple & short instructions to execute calculations. So due to their superscalar abilities, normally RISC processors have performed better as compared to CISC processors which run at the same megahertz. But, most CISC processors now like the Intel Pentium comprise some RISC architecture also, which allows them to perform instructions in parallel.

Superscalar Processor Architecture
                                                   Superscalar Processor Architecture

The superscalar processor is equipped with several processing units for handling various instructions in parallel in every processing stage. By using the above architecture, a number of instructions start execution within a similar clock cycle. These processors are capable of obtaining an instruction execution output of the above one instruction for each cycle.

In the above architecture diagram, a processor is used with two execution units where one is used for integer & other one is used for the operations of floating point. The instruction fetch unit (IFU) is capable of instructions reading at a time & stores them within the instruction queue. In every cycle, the dispatch unit fetches & decodes up to 2 instructions from the queue front. If there is a single integer, single floating point instruction & no hazards, then both instructions are dispatched within a similar clock cycle.

Pipelining

Pipelining is the procedure of breaking down tasks into sub-steps & executing them within different processor parts. In the following superscalar pipeline, two instructions can be fetched and dispatched at a time to complete a maximum of 2 instructions per cycle. The pipelining architecture in the scalar processor and the superscalar processor is shown below.

The instructions in a superscalar processor are issued from a sequential instruction stream. It must allow multiple instructions for each clock cycle and the CPU must check dynamically for data dependencies between instructions.

In the below pipeline architecture, F is fetched, D is decoded, E is executed and W is register write-back,. In this pipeline architecture, I1, I2, I3 & I4 are instructions.

The scalar processor pipeline architecture includes a single pipeline and four stages fetch, decode, execute & result write back. In the single pipeline scalar processor, the pipeline in the instruction1 (I1) works as; in the first clock period I1 it will fetch, in the second clock period it will decode and in the second instruction, I2 will fetch. The third instruction I3 in the third clock period will fetch, I2 will decode and I1 will execute. In the fourth clock period, I4 will fetch, I3 will decode, I2 will execute and I1 will write in memory. So, in seven clock periods, it will execute 4 instructions in a single pipeline.

Scalar Pipelining
 Scalar Pipelining

The superscalar processor pipeline architecture includes two pipelines and four stages fetch, decode, execute & result write back. It is a 2-issue superscalar processor which means at a time two instructions will fetch, decode, execute and result write back. The two instructions I1 & I2 will at a time fetch, decode, execute and write back in every clock period. Simultaneously in the next clock period, the remaining two instructions I3 & I4 will at a time fetch, decode, execute and write back. So, in five clock periods, it will execute 4 instructions in a single pipeline.

Superscalar Pipelining
Superscalar Pipelining

Thus, a scalar processor issues single instruction per clock cycle and performs a single pipeline stage per clock cycle whereas a superscalar processor, issues two instructions per clock cycle and it executes two instances of each stage in parallel. So the instruction execution in a scalar processor takes more time whereas in a superscalar it takes less time to execute instructions.

Types of Superscalar Processors

These are different types of superscalar processors available in the market which are discussed below.

Intel Core i7 processor

Intel core i7 is a superscalar processor that is based on the Nehalem micro-architecture. In a Core i7 design, there are various processor cores where every processor core is a superscalar processor. This is the fastest version of the Intel processor used in consumer-end computers & devices. Similar to the Intel Corei5, this processor is embedded in Intel Turbo Boost Technology. This processor is accessible in 2 to 6 varieties which support up to 12 different threads at once.

Intel Core i7 processor
Intel Core i7 processor

Intel Pentium Processor

The Intel Pentium processor superscalar pipelined architecture means the CPU executes a minimum of two or above instructions for each cycle. This processor is widely used in personal computers. Intel Pentium processor devices are normally built for online use, cloud computing, & collaboration. So this processor perfectly works for tablets, and Chromebooks to provide strong local performance & efficient online interactions.

Intel Pentium Processor
Intel Pentium Processor

IBM Power PC601

The superscalar processor like IBM power PC601 is from the family of PowerPC of RISC microprocessors. This processor is capable of issuing as well as retiring three instructions for each clock and one for each of the 3 execution units. Instructions are totally out of order for improved performance; but, the PC601 will make the execution emerge in order.

IBM Power PC601
IBM Power PC601

The power PC601 processor provides 32- bit logical addresses, 8, 16 & 32 bits integer data types & 32 & 64 bits floating-point data types. For the implementation of 64-bit PowerPC, the architecture of this processor provides 64-bit based integer data types, addressing & other features necessary to complete the 64-bit based architecture.

MC 88110

The MC 88110 is a single-chip, second-generation RISC microprocessor that utilizes advanced methods to exploit instruction-level parallelism. This processor uses multiple on-chip caches, superscalar instruction issues, recording of limited dynamic instruction, and speculative execution, for achieving maximum performance so it is ideally used as a central processor within low-cost PCs & workstations.

MC 88000
MC 88000

Intel i960

Intel i960 is a superscalar processor which is capable in executing & dispatching various independent instructions during every processor clock cycle. This is a RISC-based microprocessor that became very famous as an embedded microcontroller during the early 1990s. This processor is used continuously in a few military applications.

Intel i960
              Intel i960

MIPS R

The MIPS R is a dynamic & superscalar microprocessor used to execute the 64-bit MIPS 4-instruction set architecture. This processor fetches & decodes 4 instructions for each cycle & issues them to five completely pipelined and low-latency execution units. This processor is particularly designed for high-performance, large and real-world applications with poor memory locality. With approximate execution, it simply calculates memory addresses. MIPS processors are mainly used in various devices like Nintendo Gamecube, SGI’s product line, Sony Playstation 2, the PSP & Cisco routers.

MIPS R
                              MIPS R

Difference B/W Superscalar Vs Pipelining

The difference between superscalar and pipelining are discussed below.

Superscalar

Pipelining

A superscalar is a CPU, used to implement a form of parallelism which is called instruction-level parallelism in a single processor. An implementation technique like pipelining is used where several instructions are overlapped within the execution.
A superscalar architecture initiates several instructions simultaneously & executes them separately. Pipelining architecture executes a single pipeline stage only for each clock cycle.

 

These processors depend on spatial parallelism. It depends on temporal parallelism.
Several operations run concurrently on separate hardware. Overlapping several operations on common hardware.
It is achieved by duplicating hardware resources like register file ports & execution units. It is achieved by execution units pipelined more deeply with very fast CLK cycles.

Characteristics

The superscalar processor characteristics include the following.

  • A superscalar processor is a super-pipelined model where simply the independent instructions are performed serially without any waiting situation.
  • A superscalar processor fetches & decodes at a time several instructions of the incoming instruction stream.
  • The architecture of superscalar processors exploits the potential of instruction-level parallelism.
  • Superscalar processors mainly issue the above single instruction for every cycle.
  • The no. of instructions issued mainly depends on the instructions within the instruction stream.
  • Instructions are frequently reordered to fit the architecture of the processor better.
  • The superscalar method is usually associated with some identifying characteristics. Instructions are normally issued from a sequential instruction stream.
  • The CPU checks dynamically for data dependencies in between instructions at run time.
  • The CPU executes multiple instructions for each clock cycle.

Advantages and Disadvantages

The advantages of the superscalar processor include the following.

  • A superscalar processor implements instruction-level parallelism in a single processor.
  • These processors are simply made to perform any instruction set.
  • The superscalar processor including out-of-order execution branch prediction & speculative execution can simply find parallelism above several basic blocks & loop iterations.

The disadvantages of the superscalar processor include the following.

  • Superscalar processors are not used much in small embedded systems due to power usage.
  • The problem with scheduling can happen in this architecture.
  • Superscalar processor enhances the complexity level in the designing of hardware.
  • The instructions in this processor are simply fetched based on their sequential program order but this is not the best execution order.

Superscalar Processor Applications

The applications of a superscalar processor include the following.

  • The superscalar execution is frequently used by a laptop or desktop. This processor simply scans the program in execution to discover sets of instructions that can be executed as one.
  • A superscalar processor includes various data path hardware copies which execute various instructions at once.
  • This processor is mainly designed to generate an implementation speed of above one instruction for each clock cycle for a single sequential program.

Thus, this is all about an overview of the superscalar processor – architecture, types, and applications. Here is a question for you, what is the scalar processor?