Neoverse N3 : Specifications, Architecture, Working, Differences & Its Applications

The Neoverse N series by Arm is a family of CPU cores that includes various key processors, such as the N1, N2, and N3. It is designed for cloud, edge computing, and networking infrastructure to focus on performance-per-watt efficiency and scalability. Likewise, Neoverse N3 in Arm’s N-series is the third-generation general-purpose CPU, launched in early 2024 alongside Neoverse E3 & V3. It focuses on performance-per-watt efficiency for various workloads, including networking, edge infrastructure, and telecommunications. It improves efficiency significantly over the previous N2, which provides 20% better performance for each watt, almost three times the performance of ML & more scalable core count. This article elaborates on Neoverse N3, its working, and its applications.


What is Neoverse N3?

Neoverse N3 is a general-purpose Arm-based CPU that is built on the Armv9.2-A architecture. It is designed for infrastructure like data centers and 5G networks, which focus on performance-per-watt efficiency. It is optimized for a power efficiency and performance balance, which provides a 20% enhancement in performance-per-watt than its predecessor. Its key features mainly include scalability and enhanced machine learning performance, which support a wide range of edge to hyperscale computing workloads. In addition, it supports advanced features like large L2 cache and SVE2.

Neoverse N3 Specifications

The Arm Neoverse N3 is a general-purpose CPU core optimized for performance-per-watt efficiency in 5G, networking, infrastructure edge, and general server workloads. It is a power-efficient and highly scalable solution based on the Armv9.2-A architecture.

  • Its Architecture is Armv9.2-A, AArch64 ISA support
  • Pipeline is Out-of-order, superscalar execution.
  • Caches L1 I-Cache is 32KB and L1 D-Cache is 64KB.
  • L2 Cache is Configurable for each core, like 128KB, 256KB, 512KB, 1MB/2MB)
  • Interconnect is AMBA 5 CHI.E with a 256-bit interface.
  • It supports DDR5/LPDDR5 memory channels & PCIe Gen5 through CXL support when used in a CSS N3 (Compute Subsystem)
  • Its scalability ranges from 8 cores to 192+ cores, frequently using the CMN S3 mesh interconnect and Neoverse S3 platform IP.

How does Neoverse N3 Work?

The Arm Neoverse N3 CPU core works by using the Armv9.2-A architecture to provide high performance-per-watt efficiency, especially for infrastructure edge devices. It can be achieved through numerous mechanisms like a scalable memory hierarchy with L1 & L2 configurable caches, efficient store/load units & a connection to a shared DSU-120 cluster. Thus, it allows systems to scale from 8 cores to 192+ cores.

Neoverse N3 Architecture

The Neoverse N3 Architecture is a general-purpose, Armv9.2-A-based CPU, designed for infrastructure, edge, and cloud workloads, highlighting performance-per-watt efficiency. It is a low-power core with balanced performance that enhances efficiency by 20% over the Neoverse N2. In addition, it supports different $128$KB to $2$MB for each core L2 cache sizes, incorporation in a DSU-120 (DynamIQ Shared Unit-120) & telemetry features depending on the Arm Telemetry framework.

Neoverse N3 Architecture
 Neoverse N3 Architecture

Components

The Neoverse N3 architecture can be designed with different components like the DSU-120 (DynamIQ Shared Unit-120), Armv9.2-A core & configurable L1 and L2 memory hierarchy caches. In addition, it supports a range of Armv9.2-A extensions like SVE (Scalable Vector Extension) & is built on a system-level IP platform with a CMN S3 (coherent mesh network), interrupt controller, and system memory management unit. It connects via the AMBA 5 CHI interface to other system components and is an element of the DynamIQ Shared Unit-120 by supporting direct-connect configurations. In addition, it features support for the most recent CXL IO, PCIe & UCIe chiplet standards.

Neoverse N3 Core

The Neoverse N3 Core is a specific central processing unit (CPU) that implements the Armv9.2-A architecture. It is designed for performance-per-watt efficiency within edge and cloud infrastructure workloads. This core is built on the Armv9.2-A architecture by supporting Arm’s DynamIQ technology, characterized by its low-power design and balanced performance. Thus, it is ideal to use in SoCs (System on Chips).

Armv9.2-A Architecture

The Armv9.2-A architecture can be implemented by the Neoverse N3 core by extending the earlier Armv8-A architectures with the features of Armv8.7-A. Thus, this implementation allows the Neoverse N3 CPU to hold a large range of architectural features like all EL0-EL3 exception levels, safety extensions like performance enhancements, and RME (Realm Management Extension). When it supports the Armv9.2-A standard, then a particular set of implemented features may differ. In addition, not all architectural features are fundamentally included within the core.

L1 Cache

This is a private, two-part, and configurable cache system with separate L1I instruction and L1D data caches. Thus, its sizes range from 8KB to 64KB for each core. It is the fastest level of cache, designed for high performance with just a few CPU cycles of latency, which connects to the L2 shared cache. The L1 cache is separated into two caches. Thus, L1 Instruction Cache or L1I stores instructions for the processor to fetch. L1 Data Cache or L1D stores data for both load & store operations. The Configurable size of both the caches is configurable in implementation with 8KB, 16KB, 32KB, or 64KB options for each.

L2 Cache

The Neoverse N3 architecture can also support configurable L2 caches, which range from 128KB to 2MB for each core. It is a unified cache for both data & instructions. A larger 2MB L2 cache choice is specially offered for machine learning & cloud data analytics applications. Thus, it benefits from a larger cache capacity, which leads to almost 3x performance gains on ML workloads as compared to the previous Neoverse N2.

Load/Store Unit

The Load/Store Unit or LSU is a multiple in Arm’s Neoverse N3 architecture, which handles all memory access instructions like stores, loads, and atomics. Therefore, it handles the flow of data between the memory subsystem and the CPU. It decouples memory operations from the arithmetic & other pipelines of the core by allowing them to run concurrently. These units use dedicated queues like an LDQ (Load Queue) and STQ (Store Queue), and several load/store units to develop both read & write operations proficiently.

AMBA 5 CHI Interface

The AMBA 5 CHI or Coherent Hub Interface in this architecture is a high-performance and packet-based on-chip interconnects specification. It is used to provide high-performance, cache-coherent communication between several processors, memory controllers, and accelerators in the SoC (system-on-chip). It particularly supports the CHI.E protocol with the Memory Tagging Extension or MTE features. In addition, it is necessary for current, scalable, and multicore designs, mainly for data center and infrastructure applications.

DynamIQ Shared Unit or DSU

The Neoverse N3 core can always be implemented in a DSU-120 cluster. This is the inner component that handles a cluster of cores by providing a shared L3 cache, control logic & external interfaces. It incorporates the cores into a processing cluster by allowing flexible configurations of different core types. This ensures coherence between them through an SCU (Snoop Control Unit).

DSU Direct Connect

It is a specific configuration option or on-chip component for Arm’s DynamIQ Shared Unit-120 that handles CPU clusters. The L3 cache, snoop filter & SCU or Snoop Control Unit logic in this simplified configuration can be removed to decrease latency & area, depending on the interconnect for performance. Thus, it is suitable for single-complex or single-core systems that utilize a CHI interconnect.

Coherent Mesh Network or CMN S3

CMN S3 or coherent mesh network S3 is a high-performance interconnect in the Arm Neoverse N3 processor, built on the Armv9 architecture. It is an extremely scalable & configurable mesh network, designed for low-latency, high-bandwidth communication between compute, memory, & I/O by supporting superior features like chiplet-based & confidential compute designs.

CMN S3’s main function is to provide memory sharing and coherent communication across several components within a system. It is a mesh network that uses routers called Crosspoints to connect a variety of devices like accelerators, memory controllers, and CPUs. It is extremely configurable & scalable, thus allowing for customization to meet the specific requirements of different workloads and applications.

System Memory Management Unit or MMU S3

The System Memory Management Unit, or SMMU, of the Neoverse N3 core provides I/O memory management to allow virtual device support. It is a feature essential for current data centers & cloud environments. So this hardware component transforms addresses for peripherals by ensuring they can access memory with virtual addresses. These are well-matched with the memory model of the Arm architecture. Its main features include a register-based architecture, better support for memory-mapped configuration for better I/O systems & support for PCI Express ATS & PRI features.

Interrupt Controller

The Neoverse N3 core utilizes the Arm /navGICv3 for its interrupt controller, which handles & routes interrupts to the processor. The GIC system registers, like the ICV_CTLR_EL1 (Interrupt Controller Virtual Control Register), are used to handle & control interrupts at different levels by supporting both physical and virtual interrupts. The interrupt controller, at its basic level, routes events to the processor core for action. Thus, it identifies the source of the interrupt so that the processor can transfer control to the exact handling function known as an ISR (Interrupt Service Routine).

System Management & Local Control Processors

The system management and local control processors in the Arm Neoverse N3 architecture are used to handle system-level functions. System management can be managed by an SCP or System Control Processor, whereas the LCPs (Local Control Processors) manage power & other functions for separate cores. So this architecture is designed for efficiency with the SCP to control the overall system & the LCPs, allowing per-application processor DVFS (Dynamic Voltage & Frequency Scaling) to react to workloads without overloading the major SCP.

Arm Neoverse N3 Software

The Arm Neoverse N3 is a general-purpose CPU that implements the Armv9.2-A architecture. It is designed for workloads in enterprise networking, the infrastructure edge, and 5G, supported by a complete software & tools ecosystem. The software support for this Neoverse N3 is wide, which uses the broader Arm ecosystem for infrastructure and servers.

The architecture of Neoverse N3 comprises numerous extensions that software can leverage for better security and performance. In general, the Neoverse N3 can be supported by a strong and evolving software ecosystem. Thus, it allows developers to design & deploy a large range of infrastructure & cloud workloads with high power efficiency and performance.

Neoverse N3 Vs. Neoverse N2

The difference between Neoverse N3 and Neoverse N2 includes the following.

Neoverse N3

Neoverse N2

Arm Neoverse N3 is the latest CPU from Arm, optimized for performance-per-watt efficiency on enterprise networking, infrastructure edge, and 5G workloads. Neoverse N2 is an Armv9-based CPU core, designed for infrastructure and cloud workloads
Architecture is Armv9.2. Architecture is Armv8.
ML performance is almost 3x higher than N2. Its performance is the baseline for comparison.
Performance-per-Watt is more efficient by up to 20% than N2. It is less efficient as compared to N3.
L2 cache is 2MB for each core It is configurable, but lower on average over N3.
Performance uplift is ~1.1x on integer performance & higher on specific SQL and compression workloads. Its performance is strong for its time.
This processor supports newer PCIe & CXL standards. It supports PCIe 5.0.

Advantages

The advantages of Neoverse N3 include the following.

  • It provides 20% improved performance-per-watt as compared to the Neoverse N2.
  • This architecture has an energy-efficient design that reduces power consumption while improving performance.
  • The Neoverse N3 in Arm’s CSS roadmap allows partners to design custom silicon solutions very easily.
  • It is optimized for general-purpose workloads in enterprise networking, infrastructure edge, and 5G areas.
  • In addition, it provides a performance boost by supporting SVE for vector processing tasks.
  • It supports L2 cache sizes ranging from 128KB to 2MB for each core

Disadvantages

The disadvantages of Neoverse N3 include the following.

  • Its absolute per-core performance is lower than Neoverse V3 & top-tier x86 server CPUs.
  • It is designed especially for power-sensitive environments like networking, general cloud workloads, and edge computing.
  • This CPU still has particular micro-architectural constraints that can impact performance on certain specialized ML/AI workloads unless specially optimized.
  • It requires system integration effort while integrating it with other components to create a complete system-on-chip.

Applications

The applications of Neoverse N3 include the following.

  • It is suitable for the power and high-performance demands of current 5G base stations & core network components.
  • This CPU powers high-performance networking equipment in enterprise networking, like routers and switches, wherever efficiency is significant for managing large data volumes.
  • It is perfect for deployment within devices and edge servers that need significant processing power, which are close to the data source but contain power constraints.
  • The Arm Neoverse N3 is designed for use where high performance-per-watt efficiency is essential, like hyperscale, infrastructure, telecom, edge computing, and networking applications.
  • The optimized performance-per-watt efficiency of N3 makes it appropriate for data center and general-purpose cloud workloads.
  • It is an essential component in enterprise & network infrastructure, like 5G infrastructure and connected systems.
  • It is perfect for performance-sensitive workloads at the edge of the network due to scalability and efficiency.
  • The N3 delivers significant performance gains for ML and AI workloads, including specific features and a large L2 cache..
  • In addition, this is designed to be an initial CPU for a large array of infrastructure-based applications.
  • This GPU is optimized for a broad range of workloads like high-performance computing, general-purpose data center tasks, artificial intelligence, and machine learning tasks.

Thus, this is an overview of Arm Neoverse N3, which is a general-purpose and highly power-efficient CPU solution. It is designed to expand the leadership of Arm in infrastructure markets like telecom, hyperscale cloud, edge computing, and networking. Its main achievement is 20% a significant improvement in performance-per-watt efficiency over its earlier Neoverse N2 processor. Here is a question for you: What is Neoverse N2?