Cortex-A series processors are known for their high performance, so ARM targets them for high-performance markets like PCs, gaming, mobile, and enterprise. Therefore, they are mostly used to run advanced operating systems on smartphones, tablets, laptops, and similar devices.
But the processor cores in this series were a bit lacking at the start. Built across different generations, the CPUs have advanced to become extremely powerful to the point of handling AI and ML.
Let’s compare the performance of these processors to find out which are more suitable for SoCs running modern IoT infrastructure, especially gateways.
ARM Cortex-A Series Processor Performance Comparison
We’ll split these processor cores into the 32-bit and 64-bit execution environments for the ARM architecture.
Cortex-A Series 32-Bit Execution Environment Architectures
32-bit (AArch 32) processors are backward compatible with legacy 32-bit applications, and they fall into these two architectures.
ARMv7-A
This traditional ARM architecture features multiple modes and supports the following.
- ARM (A32) and Thumb (T32) instruction sets
- VMSA (Virtual Memory System Architecture) based on a memory management unit
- Multiprocessing extensions
- Virtualization extensions
- Security extensions
- Generic timer extensions
- Large physical address extensions
- Performance monitors extensions
The architecture is implemented in these cores.
Cortex-A5
Launched in 2011, Cortex-A5 replaced ARM9 and ARM11 cores to power low-end devices. It has a maximum CPU clock rate of 1 GHz and can have 1–4 cores with 4–64 KB of L1 cache.
Some SoCs that implement this core include the quad-core Actions Semiconductor ATM7029, Atmel SAMA5Dxx, Amlogic (S805, M805, and A111), and Qualcomm Snapdragon S4 Play.
Cortex-A7
The Cortex-A7 was also launched in 2011, and it increases the clock speed to a maximum of 2.3 GHz, although the typical speed is 1.5 GHz. It provides 1–8 cores with an L1 cache ranging from 8–64 KB and an optional L2 cache that can be up to one megabyte.
Cortex-A7 was primarily designed to meet the demands of two target applications. The first is to be a simpler, smaller, and power-efficient alternative to the Cortex-A8. The second is to be used in the big.LITTLE architecture, where an SoC can have one or more Cortex-A7 cores and one or more Cortex-A15 cores to form a heterogeneous system.
Devices like the DSGW-081 NXP i.MX6 ULL industrial edge computing gateway run on this processor at 800 MHz, which is enough to handle applications like energy management, smart manufacturing, predictive maintenance, and smart cities.
Cortex-A8
This core was the first Cortex architecture design to be adopted on a large scale in consumer devices. It has relatively low clock speeds that range from 0.6–1 GHz but the cache is sizable with L1 being 32 KB and L2 ranging from 0–4 MB.
Cortex-A9
Cortex-A9 provides up to four cache-coherent cores and multicore processing with a maximum CPU clock rate of 2 GHz.
Cortex-A12
As the successor to the A9, this A12 also features four cache-coherent cores and was targeted at mid-range mobile devices. However, ARM named it a Cortex-A17 variant after the second revision in early 2014 because the two provided similar performance figures.
Cortex-A15
According to ARM, the A15 is 40 times more powerful than the A9 with a similar number of cores at the same speed. The multicore processor features an out-of-order superscalar pipeline that runs at clock speeds of up to 2.5 GHz and can be grouped into 1–4 cores per cluster. Each SoC can have 1–2 clusters.
Cortex-A17
With a maximum clock speed of 2.75 GHz, Arm claims the A17 provides 60% better performance than the A9 due to features like deep 10–12 stages of integer instruction pipelines. It consumes 20% less power as well.
A cluster can have 1–4 A17 cores or be paired with an A7 core in a big.LITTLE configuration to form a heterogeneous chip.
ARMv8-A
ARMv8-A introduces the ability to use both 32-bit and 64-bit execution states, which are known as AArch32 and AArch64. We’ll focus on the 32-bit execution state, which is backward compatible with ARMv7-A and supports A32 and T32 instruction sets.
Cortex-A32
This core uses an efficient 8-stage in-order pipeline that is highly optimized to run the ARM 32-bit architecture instructions with the lowest power consumption and smallest footprint. It can have 1–4 cores per cluster with SMP (Symmetrical Multicore Processing) support within the cluster, and SoCs can have multiple coherent SMP core clusters via AMBA 4 technology.
But ARM focussed on energy efficiency, implementing features like idle power management to reduce leakages when the cores are idle. Therefore, this processor is ideal for diverse embedded devices, especially in IoT.
Cortex-A Series 64-Bit Execution Environment Architectures
Also known as AArch64, ARM’s 64-bit execution environment provides access to larger address spaces and a modern programming model.
ARMv8-A
The rest of the ARMv8-A architecture processors have a 64-bit execution environment and include these six.
Cortex-A34
This low-power SIP core is the smallest and most energy-efficient 64-bit ARMv8-A application processor because it uses the same 8-stage in-order pipeline as its 32-bit counterpart. The core is basically an A32 but with a 64-bit execution environment. It is suitable for industrial IoT, machine learning, AI, and home networking devices.
Cortex-A35
The A35 is a tiny, power-efficient processor that can run 32-bit and 64-bit applications. It can be scaled from 1–4 cores in a single cluster and configured as the little CPU in a big.LITTLE system in heterogeneous chips.
Combined with its power management features, this core is suitable for IoT nodes and gateways, home networking devices, ML, and AI. The DSGW-014 LoRaWAN outdoor gateway, for instance, employs this A35 in a quad-core cluster for IIoT.
Cortex-A53
A53s are the most widely used low-power ARM processor because they provide high single thread and Neon/FPU performance in power-constrained environments. You can find them in smartphones, digital cockpits, IoT, AI, and ML.
The DSGW-210 RK3328 multi-protocol smart gateway, for instance, features this processor in a quad-core cluster, and it can run on lithium battery backup power if the lights go off because A53 CPUs are highly energy efficient.
Cortex-A57
The primary difference between the A57 and A53 is the cache. A53 has 8–64 KB L1 cache and 128 KB to 2 MB L2 cache while A57 has 80 KB (48 KB i-cache with parity and 32 KB D-cache with ECC) L1 per core. L2 in A57 cores has more capacity as well, ranging from 512 KB to 2 MB.
A larger L1 cache increases CPU and system-wide performance, making the A57 a more powerful option for similar applications.
Cortex-A72
Cortex-A72 has a similar L1 cache structure as the A57, but it provides 90% more processing capabilities while consuming 20% less power. Its L2 cache upper limit is also higher, going up to 4 MB. Typical applications are ADAS (Advanced Driver Assistance Systems) and data storage solutions.
Cortex-A73
As the successor to the A72, the Cortex-A73 is optimized to achieve peak, sustained performance at frequencies up to 2.8 GHz and features a 2-wide decode out-of-order superscalar pipeline.
Overall, these features provide 30% more processing power and 30% better power efficiency than its predecessor to increase battery life in mobile devices. It is more common in premium smartphones.
ARMv8.2-A
ARMv8.2-A is an improvement of ARMv8-A because it brings in these four enhancements.
- Half-precision floating point data processing
- RAS (Reliability, Availability, and Serviceability) support
- Memory model enhancements
- Statistical profiling
The processor cores that implement these enhancements include:
Cortex-A55
ARM designed the A55 as the successor to the A53 to improve performance by 18% and power efficiency by 15%. The performance enhancement is courtesy of a high clock speed that maxes out at 2.31 GHz, a larger L1 cache (32–128KB), and better branch prediction.
This processor also features an L3 cache that is shared across up to eight cores in a single cluster and a configurable L2 cache that reduces memory access latency.
Cortex-A65
The A65 is a multi-threaded processor that features an out-of-order execution pipeline and can execute two threads simultaneously (in parallel). Throughput efficiency is also enhanced for memory-intensive workloads. The core is built for non-safety applications, such as vision-based systems, navigation, and sensor fusion.
Cortex-A65AE
Unlike the A65, the A65AE is built for safety-critical applications, such as ADAS, because it features split lock (two operational modes).
Under split mode, the processor provides the highest multicore performance. You can switch it to lock mode, which optimizes the core for safety for advanced multicore fault tolerance with ASIL (Automotive Safety Integrity Level) level D.
Cortex-A75
The A75 is the successor to the A73 and is the first high-performance CPU based on DynamIQ technology (heterogenous CPUs). It has a max clock speed of 3 GHz and can execute up to three instructions in parallel per clock cycle, which increases throughput.
On the cache side, the A75 has a non-blocking, high throughput L1 cache, a private and size configurable L2 cache to reduce latency, and a shared L3 cache for the heterogeneous core clusters.
Cortex-A76
Cortex-A76 is specifically built for AI/ML to improve responsiveness at the edge. ARM states this processor increases the integer floating performance over the A75 by 25% and the floating point performance by 35%.
The core can fetch 4 instructions per cycle while renaming and dispatching 4 Mops and 8 μops in the same cycle. Memory bandwidth also increases by 90% compared to the A75.
This processor supports DynamIQ to provide high-performance computing when used with the more energy-efficient A55 cores.
Hubs like the DSGW-380 RK3588 industrial machine learning edge AI gateway benefit from this heterogeneous system, enabling them to provide a blend of high performance and power efficiency when handling complex edge AI tasks. This gateway also features a built-in NEON coprocessor in its 8-core CPU and a 6 TOPS NPU in its SoC to enhance its AI capabilities.
Cortex-A76AE
This A76AE is similar to the A76 but with the added advantage of slit lock, making it suitable for automotive, aviation, robotics, and other autonomous applications.
Cortex-A77
Relative to the A76, the Cortex-A77 has a 23% and 35% increase in integer and floating point performance, respectively. Its memory bandwidth also increases by 15%. The core can fetch 4 instructions per cycle while simultaneously renaming and dispatching 6 Mops and 13 μops.
Overall, this performance enables intelligent 5G computing to support mobile phones and always-connected laptops.
Cortex-A78
The Cortex-A78 is built on the standard Cortex-A roadmap and offers a 5nm (2.1 GHz) chipset that provides 7% better performance and 4% lower power consumption. It is also 5% smaller than the A77, leaving more space for NPUs and GPUs in the SoC.
The core’s pipeline is one cycle longer (depth of 14 stages) than in the A77, which ensures the processor hits the 3 GHz clock frequency target. Also, the core can fetch 6 instructions per cycle, 2 more than its predecessor.
This impressive computing power is ideal for supporting new consumer device innovation in the fields of AI and 5G.
Cortex-A78AE
Built on the A78 platform, the A76AE introduces the split lock hybrid mode to support autonomous vehicles, IVI (In-Vehicle Infotainment), digital cockpits, and industrial automation.
Cortex-A78C
The A78C is also built on the A78 platform, but it introduces advanced security features to support gaming on-the-go, and always-on, always-connected laptops. One of these security features is pointer authentication support, which reduces surface attacks of malicious software.
ARMv9-A
ARMv9-A is backward compatible with ARMv8-A and builds on it by introducing these features.
- SVE2 (Scalable Vector Extension): Extends scalable vectors to more use cases
- BRBE (Branch Record Buffer Extension): Provides profiling information
- RME (Realm Management Extension): Extends confidential compute or ARM to developers
- ETE (Embedded Trace Extension) and TRBE (Trace Buffer Extension): To enhance trace capabilities
- TME (Transactional Memory Extension): Transactional memory hardware support for ARM
Cores implementing this architecture include:
Cortex-A510
The A510 succeeds the A55 as a “LITTLE” CPU in heterogeneous CPU clusters to work alongside the A710. It focuses on enhancing efficiency over the A55 by introducing:
- 3-wide in-order design (one extra than the A55)
- 3-wide fetch and decode front-end
- 3-wide issue and execute on the back-end (3 ALUs)
- 35% better performance
- 20% more energy efficiency
- 3X ML performance
Some of the SoCs that have this core include the Qualcomm Snapdragon 8 gen 1 and MediaTek Dimensity 9200+.
Cortex-A710
The Cortex-A710 can be paired with the Cortex-A510 or Cortex-X2 as the “big” CPU. It improves on the A78’s features in these areas.
- 10% better performance
- 30% better power efficiency
- 2X ML uplift
Cortex-A715
Cortex-A715 is the second-gen ARMv9-A “big” CPU that improves on the A710 by having 20% better power efficiency and 5% better performance. This performance is impressive enough to match the Cortex-X, making the processor suitable for laptops, smartphones, and 5G devices.
Cortex-A520
The Cortex-A500 series CPU cores focus on power efficiency, and the A520 takes it up a notch from the A510 as the “LITTLE” processor in a total compute solution. It is 22% more energy efficient, 8% more powerful, has more private L2 cache memory (up to 512 KB), and supports pointer authentication.
Cortex-A720
Cortex-A720 cores are premium efficiency CPUs in the ARMV9.2 architecture that provide industry-leading sustained processing performance in a constrained power envelope. They succeed A715 cores and provide 15% more peak performance, 10% normal performance improvement, and 20% better power efficiency.
The core comes with DSU-120 and can be mated with a Cortex-X4 and/or Cortex-520, where it can function as the “big” or “LITTLE” cluster.
Conclusion: Which Is the Most Powerful ARM Cortex-A Series Processor?
It is evident that the Cortex-A720 is the most powerful ARM Cortex-A series processor core in either the solo, big, or LITTLE configuration. And it is very new in the market because it was launched in 2023.
But in most current IoT devices, the processing power provided by the A720 is more than required. Even if you include it in a gateway design, the cost will be too high. So by the current IoT requirements, the AArch64 ARMv8-A cores are sufficient, with the Cortex-A53 being the most common.
However, AI gateways require more powerful computing power, which makes cores like the Cortex-A76 more suitable, which you get in the DSGW-380 RK3588 industrial machine learning edge AI gateway.
Over time, the “things” in IoT will increase, and the solutions will require more computing power, especially as AI advances to provide edge analytics. Therefore, heterogeneous cores (DynamIQ) will become more prevalent in the future to provide a balance between performance and energy efficiency.
But even then, the most critical factor to consider when determining the most suitable ARM Cortex-A series core for your IoT devices will be cost vs. required performance/capabilities.
We consider this factor when developing our products but can customize them to deliver higher performance according to your project requirements. Send us your project details in this contact form, and we’ll be in touch to discuss the most suitable core to use.