Arm Neoverse E1 : Specifications, Architecture, Working, Differences & Its Applications

ARM Neoverse processors are 64-bit ARM processor cores designed for edge computing, high-performance computing, and data centers. These processors are available in different series based on performance, like E-series, V-series, N-series, etc. However, these processor cores can be built with features such as SVE (Scalable Vector Extension) for faster ML, HPC, and AI tasks. Among them, the Neoverse E-Series processors were derived from the Cortex-A65AE that implements the ARMv8.2-A instruction set. These processors are designed for edge computing and offer improved data throughput with reduced power consumption. This article elaborates on the Arm Neoverse E1 processor, its working, and its applications.


What is the Arm Neoverse E1 Processor?

The Arm Neoverse E1 is a highly-efficient and power-saving processor built on a scalable architecture that supports SMT (simultaneous multithreading) core by performing two threads concurrently, improving aggregate performance. It is designed specifically for high-throughput compute workloads within applications like 5G/4G transport, networking & storage SDN (software-defined networking) and SDS (software-defined storage).

The Neoverse E1 core features performance monitors, which allow you to collect a variety of statistics on the core operation & its memory system throughout runtime. These provide valuable information regarding the core behavior that you can utilize whenever profiling or debugging code.

How does the Arm Neoverse E1 Processor Work?

The Arm Neoverse E1 processor works to run two threads concurrently on every core with SMT (simultaneous multithreading) design by enhancing overall throughput performance mainly for data-intensive workloads. This can be achieved by containing two architectural threads for each core that can issue & execute instructions at the same time. In addition, it can also use cache stashing with an ACP (Accelerator Coherency Port) to decrease latency and incorporate accelerators very efficiently. This E1 processor is built on the Armv8.2-A architecture by supporting the AArch64 instruction set to make it compatible with accessible software ecosystems.

Specifications

The specifications of the Arm Neoverse E1 processor include the following.

  • Architecture is Armv8-A (AArch64) with Armv8.2-A extensions support.
  • Simultaneous Multithreading or SMT allows two threads to concurrently run for each core.
  • The pipeline is a Superscalar, out-of-order pipeline.
  • Decode width is 6.
  • Rename or Dispatch width is 8.
  • Micro-op cache – 1536 entries.
  • Execution ports – 15.
  • ROB (Re-order Buffer) – 320 entries.
  • L2 Cache is configurable with a private 256kB of L2 in some designs.
  • L3 Cache in some designs is possible & configurable from 512kB to 4MB.
  • Cache coherence can support cache stashing into L3 or L2 caches.
  • Low-latency ACP (Accelerator Coherency Port) for accelerators.
  • Maximum cores up to 8 cores or 16 threads for each cluster, or above, whenever connected through a mesh network
  • It uses the CMN-700 mesh interconnect.
  • It has a GIC-600 generic interrupt controller.
  • Debug & Trace is CoreSight SoC-400.
  • It supports ECC memory.
  • It provides high throughput efficiency.
  • It provides 2.7x more throughput performance, 2.1x more compute performance, and 2.4x better throughput. efficiency as compared to the Cortex-A53.
  • It consumes below 4W of CPU power & below 15W for the whole SoC.

Arm Neoverse E1 Architecture

The Arm Neoverse E1 is a scalable, high-throughput, power-efficient CPU core, built on the Armv8-A architecture, which is well-matched with the Arm software ecosystem. It uses SMT (Simultaneous MultiThreading) to perform two threads immediately for better efficiency and aggregate performance. Its architectural design targets enhancing throughput efficiency in a scalable and flexible way. Thus, it is suitable for a wide range of applications like network appliances, edge-to-core data transport systems, 5G infrastructure, and many more.

Arm Neoverse E1 Architecture
Arm Neoverse E1 Architecture

Arm Neoverse E1 Architecture Components

The Arm Neoverse E1 architecture components include the Armv8-A instruction set with Armv8.1, Armv8.2, Armv8.3 & Armv8.5 extensions, a SMT microarchitecture & configurable memory hierarchy components like: optional L2 & L3 caches and 64-128KB L1 I/D-caches. In addition, it can also have other parts like the Neon (Advanced SIMD), floating-point units, a DSU (DynamIQ Shared Unit) for cluster integration & different system-level components like the CoreSight debug & trace IP and the GIC-600 interrupt controller.

Instruction Set Architecture or ISA

The Arm Neoverse E1 is a throughput-oriented core that uses the Armv8.2-A architecture with particular extensions like the LDAPR instructions of Armv8.3-A and the SSBS bit of Armv8.5-A. In addition, it supports SMT (simultaneous multithreading), designed mainly for high-efficiency data plane-based workloads like 4G or 5G networking & software-defined storage.

The core features & instruction set include the following.

  • Its base architecture can implement the Armv8-A architecture’s AArch64 execution state.
  • The E1 core in the Armv8.2-A key extension is built on this specification.
  • Armv8.3-A extension contains the load-acquire instructions like LDAPR instructions.
  • Armv8.4-A extension supports the SDOT and UDOT instructions for dot product-based operations.
  • Armv8.5-A extension can implement the Speculative Store Bypass Safe (SSBS) bit for Specter Variant 4 mitigation.
  • The SMT core can perform two threads at once to enhance the overall total throughput.
  • The out-of-order pipeline feature can enhance performance efficiency over earlier designs.

Memory and Cache

The Arm Neoverse E1 processor architecture contains a normal memory hierarchy with separate L1 or Level 1 instruction & data caches per core, along with a private and optional L2 or Level 2 cache for each core. In addition, it can also support a shared SLC (System Level Cache) through a CMN-600 mesh interconnect. Thus, it provides a large cache shared across multiple cores and memory controllers. This configuration balances performance with efficiency for edge and networking workloads.

  • The L1 Cache in this architecture is normally 64 KiB for each instruction and data, consistent with the A76 architecture. It supports instruction cache coherency to enhance performance within virtualized environments.
  • The optional L2 cache is a private cache that is committed to a single core. It is available in different sizes. So its common configuration is 256 KiB for each core.
  • Therefore, the data is normally assigned only to the L2 cache whenever it is expelled from the L1 cache. But special rules can exist for various types of memory access while utilizing read-allocate or write-allocate hints.

Shared Cache & Interconnect

A shared system-level cache system in a multi-core cluster is provided through a CMN-600 mesh interconnect. Thus, it is shared among several cores & memory controllers by allowing coherent data access across the system.

The architecture includes other memory features like the MMU, or Memory Management Unit, for virtual memory management. The core features TLBs or Translation Lookaside Buffers for the instruction & data caches, including specific implementations related to the A76. The architecture is equipped with a PMU (Performance Monitor Unit) that follows L1 cache refills events.

Neon (Advanced SIMD)

Neon (Advanced SIMD) within the Arm Neoverse E1 core is an in-built architectural extension that extensively enhances the throughput of the core for different data-intensive workloads like video encoding, signal processing, and 5G data processing. It is a powerful SIMD unit that works on 64-bit & 128-bit vector data.

This Neon unit can be used in several ways by Developers, like the following.

  • Neon intrinsic is used by Programmers to provide fine-grained control through the ease of a high-level language.
  • Auto-vectorization compilers can change standard loops & data structures automatically into optimized Neon instructions.
  • Developers can use pre-existing optimized libraries like the Arm RAN Acceleration Library, which are already optimized with Neon instructions.

Floating-Point Unit or FPU

The floating-point unit or FPU in the Arm Neoverse E1 processor is a dedicated hardware component that performs mathematical operations over floating-point numbers. Therefore, these are numbers including decimal points. In addition, this floating-point unit is designed to speed up calculations. Thus, it improves performance within applications that need high numerical precision in scientific computing or machine learning.

DynamIQ Shared Unit or DSU

The DynamIQ Shared Unit, or DSU, in the Neoverse E1 processor, is the cluster-level component. Thus, it handles the shared L3 cache, shared resources, and snoop control for a group of cores, similar to the E1 cores. In addition, it provides the strength for a DynamIQ cluster by allowing some features like scalability for several cores, heterogeneous core configurations, and necessary control logic for debugging and power management.

The key functions of DSU include the following.

  • Shared L3 cache delivers the shared Level 3 cache & snoop control for the cores within its cluster.
  • It performs as the central controller for the cluster by incorporating all outside interfaces to the SoC.
  • In addition, it can support scalable core configurations up to 14 cores to adapt clusters for particular use cases.
  • It performs smart in-cluster power management features, which decrease power consumption.
  • It provides common system control registers for L3 cache partitioning, quality of service (QoS) bus control, and power management.

CoreSight Debug & Trace IP

This is an on-chip infrastructure in the Arm Neoverse E1 processor that provides the tools for tracing and debugging the performance of the processor and the whole SoC. In addition, it includes a modular library of components that permit designers to design a flexible and scalable debug and trace system. This allows detailed performance optimization, analysis & complex designs troubleshooting.

GIC-600 Interrupt Controller

The GIC-600 is a Generic Interrupt Controller in Arm Neoverse E1 that manages interrupts for Armv8.0-A processors, handles signals from peripherals & coordinates communication between cores. Thus, it is a configurable & distributed system that supports masking, interrupt prioritization, virtualization & the GICv3 architecture to deliver interrupts efficiently to the right processor cores.

Arm Neoverse E1 Software

This software features the Armv8-A architecture with its extensions, which support a different cloud-native ecosystem & infrastructure software. Therefore, it is designed for data plane workloads that can be optimized for power efficiency and performance with software development. It is supported by a complete set of tools, training resources, and documentation. Its main software aspects comprise its AArch64 execution state, architecture extensions support like v8.3-A, Armv8.2-A, and a focus on efficient data movement for different applications like 5G transport and software-defined networking.

Difference between Neoverse E1 and Neoverse N1

The difference between Neoverse E1 and Neoverse N1 includes the following.

Neoverse E1

Neoverse N1

It is an ARM-based CPU core. It is a CPU platform.
Its primary target is to be used in high-throughput infrastructure like edge devices and 5G. Its primary target is to be used in high-performance data centers, servers,
This processor provides higher throughput efficiency & throughput performance as compared to Cortex-A53. This processor provides higher raw compute performance as compared to E1.
It is designed for efficient data transport & throughput by incorporating SMT. It is designed for competitive performance within server workloads.
Power consumption is lower for each core under 1 watt. Power consumption is higher per core, like 1-1.8W.
It is more power-efficient, scalable, and smaller for high-density-based deployments. It is designed from the beginning for server infrastructure through higher performance goals.
Its core architecture is an out-of-order execution processor. Its architecture is an out-of-order execution processor through simultaneous multithreading (SMT) for handling several threads for each core.
It can be connected with ML/AI accelerators It is designed for strongly connected accelerator integration via an Accelerator Coherency Port (ACP)

Advantages

The advantages of the Arm Neoverse E1 processor include the following.

  • The Neoverse E1 processor delivers higher throughput efficiency, making it perfect for different applications like software-defined networking and storage.
  • The processor is highly scalable, which supports a wide range of performance requirements from low-power to high-performance 100Gb+ based systems.
  • Its architecture is designed specifically to shuffle bits efficiently & manage data transport tasks through higher performance.
  • It is a flexible platform and is configured for a wide range of applications.
  • This processor can be designed on the wide & various Arm software ecosystem, like cloud-native software that is being optimized increasingly for Arm.
  • It supports the Armv8-A architecture & a complete set of extensions by ensuring compatibility through a wide range of software tools like the Data Plane Development Kit, OpenSSL, and Linux kernel libraries.
  • This allows strongly connected fixed-function hardware offload support through cache coherency & low latency.
  • This processor can support scalability in a low-power budget from 25Gbps to multi-100Gbps systems by allowing it to adjust to different infrastructure requirements.

Disadvantages

The disadvantages of the Arm Neoverse E1 processor include the following.

  • The Arm Neoverse E1 lacks x86 software compatibility; thus, it is unable to run various legacy applications without emulation or modification.
  • It is not suitable for high-performance-based tasks like complex computational jobs and demanding gaming, which are suitable for other architectures.
  • Software compiled with most Windows OS for the x86 architecture will not natively run on an ARM processor.
  • It needs programming skills because its performance is dependent heavily on efficient instruction execution, thus it needs highly skilled programmers to attain optimal performance.
  • This processor provides low raw performance as compared to high-end x86 processors.

Applications

The applications of the Arm Neoverse E1 processor include the following.

  • This processor is used for network appliances & functions that need high-speed data transport & processing, like SD-WAN and SDN.
  • This processor architecture is designed efficiently to move & process bits of data to make it perfect for data plane workloads within a variety of networking & storage systems.
  • Its scalability allows using it in a variety of power envelopes by supporting the growing requirement for compute at the network edge.
  • It is used in SDS (software-defined storage) systems to manage high data throughput conditions.
  • The architecture of Neoverse E1 can be scaled to make high-performance DPUs for handling network traffic by offloading tasks from the key CPU.
  • The E1 processor can be optimized for the higher throughput and 5G efficiency demands by allowing the change from 4G networks by delivering scalable compute for both core and edge network functions.

Thus, this is an overview of the Arm Neoverse E1 processor, working & its applications. It uses an advanced architecture with features like SMT (simultaneous multithreading) & an out-of-order execution engine, making it perfect for workloads that are memory-latency-bound, like packet processing. In addition, it is a highly efficient CPU, designed for increasing data throughput within infrastructure applications, mostly in networking and edge computing. Here is a question for you: What is the Arm Neoverse V1 processor?