1 ARM发展

ARM是Advanced RISC Machine的缩写，即进阶精简指令集机器。arm更早称为Acorn RISC Machine，是一个32位精简指令集（RISC）处理器架构。也有基于ARM设计的派生产品，主要产品包括Marvell的XScale架构和和德州仪器的OMAP系列。ARM家族中32位嵌入式处理器占比达75%，由于ARM的低功耗特性，被广泛反应于移动通信领域、便携式设备等领域。

1983年Acorn电脑公司（Acorn Computers Ltd）开始开发一颗主要用于路由器的Conexant ARM处理器，由Roger Wilson和Steve Furber带领团队，着手开发一种新架构，类似进阶的MOS Technology 6502处理器。Acorn有一大堆建构在6502架构上的电脑。该团队在1985年时开发出ARM1 Sample版，并于次年量产了ARM2，ARM2具有32位的数据总线、26位的寻址空间，并提供64 Mbyte的寻址范围与16个32-bit的暂存器。

在1980年代晚期，苹果电脑开始与Acorn合作开发新版的ARM核心。1990年将设计团队另组成一间名为安谋国际科技（Advanced RISC Machines Ltd.）的新公司,。1991年首版ARM6出样，然后苹果电脑使用ARM6架构的ARM 610来当作他们Apple Newton PDA的基础。在1994年，Acorn使用ARM 610做为他们Risc PC电脑内的CPU。

ARM是一家微处理器行业的知名企业，该企业设计了大量高性能、廉价、耗能低的RISC （精简指令集）处理器，它只设计芯片而不生产。ARM的经营模式在于出售其知识产权核（IP core），将技术授权给世界上许多著名的半导体、软件和OEM厂商，并提供技术服务。

ARM的版本分为两类，一个是内核版本，一个处理器版本。内核版本也就是ARM架构，如ARMv1、ARMv2、ARMv3、ARMv4、ARMv5、ARMv6、ARMv7、ARMv8等。处理器版本也就是ARM处理器，如ARM1、ARM9、ARM11、ARM Cortex-A（A7、A9、A15），ARM Cortex-M（M1、M3、M4）、ARM Cortex-R，这个也是我们通常意义上所指的ARM版本。

2 ARM版本

ARM版本信息简化表如下表所示。

内核（架构）版本	处理器版本
ARMv1	ARM1
ARMv2	ARM2、ARM3
ARMv3	ARM6、ARM7
ARMv4	StrongARM、ARM7TDMI、ARM9TDMI
ARMv5	ARM7EJ、ARM9E、ARM10E、XScale
ARMv6	ARM11、ARM Cortex-M
ARMv7	ARM Cortex-A、ARM Cortex-M、ARM Cortex-R
ARMv8	ARM Cortex-A30、ARM Cortex-A50、ARM Cortex-A70

ARM版本信息详细表如下表所示。（参考https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures）

ARM family	ARM architecture	ARM core	Feature	Cache (I / D), MMU	Typical MIPS @ MHz	Reference
ARM1	ARMv1	ARM1	First implementation	None
ARM2	ARMv2	ARM2	ARMv2 added the MUL (multiply) instruction	None	4 MIPS @ 8 MHz 0.33 DMIPS/MHz
ARM2	ARMv2a	ARM250	Integrated MEMC (MMU), graphics and I/O processor. ARMv2a added the SWP and SWPB (swap) instructions	None, MEMC1a	7 MIPS @ 12 MHz
ARM3	ARMv2a	ARM3	First integrated memory cache	4 KB unified	12 MIPS @ 25 MHz 0.50 DMIPS/MHz
ARM6	ARMv3	ARM60	ARMv3 first to support 32-bit memory address space (previously 26-bit). ARMv3M first added long multiply instructions (32x32=64).	None	10 MIPS @ 12 MHz
		ARM600	As ARM60, cache and coprocessor bus (for FPA10 floating-point unit)	4 KB unified	28 MIPS @ 33 MHz
		ARM610	As ARM60, cache, no coprocessor bus	4 KB unified	17 MIPS @ 20 MHz 0.65 DMIPS/MHz	[4]
ARM7	ARMv3	ARM700		8 KB unified	40 MHz
		ARM710	As ARM700, no coprocessor bus	8 KB unified	40 MHz	[5]
		ARM710a	As ARM710	8 KB unified	40 MHz 0.68 DMIPS/MHz
ARM7T	ARMv4T	ARM7TDMI(-S)	3-stage pipeline, Thumb, ARMv4 first to drop legacy ARM 26-bit addressing	None	15 MIPS @ 16.8 MHz 63 DMIPS @ 70 MHz
		ARM710T	As ARM7TDMI, cache	8 KB unified, MMU	36 MIPS @ 40 MHz
		ARM720T	As ARM7TDMI, cache	8 KB unified, MMU with FCSE (Fast Context Switch Extension)	60 MIPS @ 59.8 MHz
		ARM740T	As ARM7TDMI, cache	MPU
ARM7EJ	ARMv5TEJ	ARM7EJ-S	5-stage pipeline, Thumb, Jazelle DBX, enhanced DSP instructions	None
ARM8	ARMv4	ARM810	5-stage pipeline, static branch prediction, double-bandwidth memory	8 KB unified, MMU	84 MIPS @ 72 MHz 1.16 DMIPS/MHz	[6][7]
ARM9T	ARMv4T	ARM9TDMI	5-stage pipeline, Thumb	None
		ARM920T	As ARM9TDMI, cache	16 KB / 16 KB, MMU with FCSE (Fast Context Switch Extension)	200 MIPS @ 180 MHz	[8]
		ARM922T	As ARM9TDMI, caches	8 KB / 8 KB, MMU
		ARM940T	As ARM9TDMI, caches	4 KB / 4 KB, MPU
ARM9E	ARMv5TE	ARM946E-S	Thumb, enhanced DSP instructions, caches	Variable, tightly coupled memories, MPU
		ARM966E-S	Thumb, enhanced DSP instructions	No cache, TCMs
		ARM968E-S	As ARM966E-S	No cache, TCMs
	ARMv5TEJ	ARM926EJ-S	Thumb, Jazelle DBX, enhanced DSP instructions	Variable, TCMs, MMU	220 MIPS @ 200 MHz
	ARMv5TE	ARM996HS	Clockless processor, as ARM966E-S	No caches, TCMs, MPU
ARM10E	ARMv5TE	ARM1020E	6-stage pipeline, Thumb, enhanced DSP instructions, (VFP)	32 KB / 32 KB, MMU
	ARMv5TE	ARM1022E	As ARM1020E	16 KB / 16 KB, MMU
	ARMv5TEJ	ARM1026EJ-S	Thumb, Jazelle DBX, enhanced DSP instructions, (VFP)	Variable, MMU or MPU
ARM11	ARMv6	ARM1136J(F)-S	8-stage pipeline, SIMD, Thumb, Jazelle DBX, (VFP), enhanced DSP instructions, unaligned memory access	Variable, MMU	740 @ 532–665 MHz (i.MX31 SoC), 400–528 MHz	[9]
	ARMv6T2	ARM1156T2(F)-S	9-stage pipeline, SIMD, Thumb-2, (VFP), enhanced DSP instructions	Variable, MPU		[10]
	ARMv6Z	ARM1176JZ(F)-S	As ARM1136EJ(F)-S	Variable, MMU + TrustZone	965 DMIPS @ 772 MHz, up to 2,600 DMIPS with four processors	[11]
	ARMv6K	ARM11MPCore	As ARM1136EJ(F)-S, 1–4 core SMP	Variable, MMU
SecurCore	ARMv6-M	SC000			0.9 DMIPS/MHz
	ARMv4T	SC100
	ARMv7-M	SC300			1.25 DMIPS/MHz
Cortex-M	ARMv6-M	Cortex-M0[12]	Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), optional system timer, optional bit-banding memory	Optional cache, no TCM, no MPU	0.84 DMIPS/MHz
		Cortex-M0+[14]	Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), optional system timer, optional bit-banding memory	Optional cache, no TCM, optional MPU with 8 regions	0.93 DMIPS/MHz
		Cortex-M1[15]	Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), OS option adds SVC / banked stack pointer, optional system timer, no bit-banding memory	Optional cache, 0–1024 KB I-TCM, 0–1024 KB D-TCM, no MPU	136 DMIPS @ 170 MHz,[16] (0.8 DMIPS/MHz FPGA-dependent)[17]
	ARMv7-M	Cortex-M3[18]	Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide instructions, optional bit-banding memory	Optional cache, no TCM, optional MPU with 8 regions	1.25 DMIPS/MHz
	ARMv7E-M	Cortex-M4[19]	Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv4-SP single-precision FPU, hardware multiply and divide instructions, optional bit-banding memory	Optional cache, no TCM, optional MPU with 8 regions	1.25 DMIPS/MHz (1.27 w/FPU)
	ARMv7E-M	Cortex-M7[20]	Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv5 single and double precision FPU, hardware multiply and divide instructions	0−64 KB I-cache, 0−64 KB D-cache, 0–16 MB I-TCM, 0–16 MB D-TCM (all these w/optional ECC), optional MPU with 8 or 16 regions	2.14 DMIPS/MHz
	ARMv8-M	Cortex-M23[21]	Microcontroller profile, Thumb-1 (most), Thumb-2 (some), Divide, TrustZone	Optional cache, no TCM, optional MPU with 16 regions	0.99 DMIPS/MHz
		Cortex-M33[22]	Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processor	Optional cache, no TCM, optional MPU with 16 regions	1.50 DMIPS/MHz
		Cortex-M35P[23]	Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processor	Built-in cache (with option 2–16 KB), I-cache, no TCM, optional MPU with 16 regions	1.50 DMIPS/MHz
Cortex-R	ARMv7-R	Cortex-R4[24]	Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lockstep with fault logic	0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 8/12 regions	1.67 DMIPS/MHz[25]
		Cortex-R5[26]	Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lock-step with fault logic / optional as 2 independent cores, low-latency peripheral port (LLPP), accelerator coherency port (ACP)[27]	0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 12/16 regions	1.67 DMIPS/MHz[25]
		Cortex-R7[28]	Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 11-stage pipeline dual-core running lock-step with fault logic / out-of-order execution / dynamic register renaming / optional as 2 independent cores, low-latency peripheral port (LLPP), ACP[27]	0–64 KB / 0–64 KB, ? of 0–128 KB TCM, opt. MPU with 16 regions	2.50 DMIPS/MHz[25]
		Cortex-R8[29]	TBD	TBD	2.50 DMIPS/MHz[25]
	ARMv8-R	Cortex-R52[30]	TBD	TBD	2.16 DMIPS/MHz[31]
Cortex-A (32-bit)	ARMv7-A	Cortex-A5[32]	Application profile, ARM / Thumb / Thumb-2 / DSP / SIMD / Optional VFPv4-D16 FPU / Optional NEON / Jazelle RCT and DBX, 1–4 cores / optional MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)	4−64 KB / 4−64 KB L1, MMU + TrustZone	1.57 DMIPS/MHz per core
		Cortex-A7[33]	Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / Jazelle RCT and DBX / Hardware virtualization, in-order execution, superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), architecture and feature set are identical to A15, 8–10 stage pipeline, low-power design[34]	8−64 KB / 8−64 KB L1, 0–1 MB L2, MMU + TrustZone	1.9 DMIPS/MHz per core
		Cortex-A8[35]	Application profile, ARM / Thumb / Thumb-2 / VFPv3 FPU / NEON / Jazelle RCT and DAC, 13-stage superscalar pipeline	16–32 KB / 16–32 KB L1, 0–1 MB L2 opt. ECC, MMU + TrustZone	Up to 2000 (2.0 DMIPS/MHz in speed from 600 MHz to greater than 1 GHz)
		Cortex-A9[36]	Application profile, ARM / Thumb / Thumb-2 / DSP / Optional VFPv3 FPU / Optional NEON / Jazelle RCT and DBX, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)	16–64 KB / 16–64 KB L1, 0–8 MB L2 opt. parity, MMU + TrustZone	2.5 DMIPS/MHz per core, 10,000 DMIPS @ 2 GHz on Performance Optimized TSMC 40G (dual-core)
		Cortex-A12[37]	Application profile, ARM / Thumb-2 / DSP / VFPv4 FPU / NEON / Hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)	32−64 KB	3.0 DMIPS/MHz per core
		Cortex-A15[38]	Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP, 15-24 stage pipeline[34]	32 KB w/parity / 32 KB w/ECC L1, 0–4 MB L2, L2 has ECC, MMU + TrustZone	At least 3.5 DMIPS/MHz per core (up to 4.01 DMIPS/MHz depending on implementation)[39]
		Cortex-A17[40]	Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP	32 KB L1, 256 KB–8 MB L2 w/optional ECC	2.8 DMIPS/MHz
	ARMv8-A	Cortex-A32[41]	Application profile, AArch32, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in-order pipeline	8–64 KB w/optional parity / 8−64 KB w/optional ECC L1 per core, 128 KB–1 MB L2 w/optional ECC shared
Cortex-A (64-bit)	ARMv8-A	ARM Cortex-A34[42]	Application profile, AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline	8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses
		Cortex-A35[43]	Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline	8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses	1.78 DMIPS/MHz
		Cortex-A53[44]	Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline	8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–2 MB L2 shared, 40-bit physical addresses	2.3 DMIPS/MHz
		Cortex-A57[45]	Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline	48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses	4.1–4.5 DMIPS/MHz[46][47]
		Cortex-A72[48]	Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width superscalar, deeply out-of-order pipeline	48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses	4.7 DMIPS/MHz
		Cortex-A73[49]	Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width superscalar, deeply out-of-order pipeline	64 KB / 32−64 KB L1 per core, 256 KB–8 MB L2 shared w/ optional ECC, 44-bit physical addresses	4.8 DMIPS/MHz[50]
	ARMv8.2-A	Cortex-A55[51]	Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline[52]	16−64 KB / 16−64 KB L1, 256 KB L2 per core, 4 MB L3 shared
		Arm Cortex-A65AE[53]	Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, out-of-order pipeline, SMT	64 / 64 KB L1, 256 KB L2 per core, 4 MB L3 shared
		Cortex-A75[54]	Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline[55]	64 / 64 KB L1, 512 KB L2 per core, 4 MB L3 shared
		Cortex-A76[56]	Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way issue, 13 stage pipeline, deeply out-of-order pipeline[57]	64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared
		Cortex-A77[58]	Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 6-width instruction fetch, 12-way issue, 13 stage pipeline, deeply out-of-order pipeline[57]	1.5K L0 MOPs cache, 64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared
Neoverse		Neoverse N1[59]	Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way dispatch/issue, 13 stage pipeline, deeply out-of-order pipeline[57]	64 / 64 KB L1, 512−1024 KB L2 per core, 2−128 MB L3 shared, 128 MB system level cache
Neoverse		Neoverse E1	Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, 10 stage pipeline, out-of-order pipeline, SMT	32−64 KB / 32−64 KB L1, 256 KB L2 per core, 4 MB L3 shared
ARM family	ARM architecture	ARM core	Feature	Cache (I / D), MMU	Typical MIPS @ MHz	Reference

3 ARM系列说明

3.1 ARM7系列

该系列主要针对某些简单的32位设备，作为目前较旧的一个系列，ARM7处理器已经不建议继续在新品中使用。主要包括ARM7TDMI-S（ARMv4T架构）和ARM7EJ-S（ARMv5TEJ架构）。

3.2 ARM9系列

主要针对嵌入式实时应用，主要包括ARM926EJ-S、ARM946E-S和 ARM968E-S。

3.3 ARM11系列

主要应用在高可靠性和实时嵌入式应用领域，主要包括ARM11MPCore、ARM1176、ARM1156、ARM1136。

3.4 Cortex-R系列

Cortex-R，代表实时的意义（Real-Time），目标是实时任务处理，主要应用领域包括汽车、相机、工业、医学等。

该系列处理器主要包括Cortex-R4、Cortex-R5、Cortex-R7、Cortex-R8、Cortex-R52、Cortex-A17。

3.5 Cortex-M系列

Cortex-M，代表微处理器的意义（Microcontrollers），目标是最节能的嵌入式设备，主要应用领域包括汽车、能源网、医学、嵌入式、智能卡、智能设备。传感器融合、穿戴设备等。

该系列处理器主要包括Cortex-M0、Cortex-M0+、Cortex-M3、Cortex-M4、Cortex-M7、Cortex-M23、Cortex-M33、Cortex-M35P。

3.6 Cortex-A系列

Cortex-A，代表的是先进意义（Advanced），目标是以最佳功耗实现最高性能，主要应用领域包括汽车、工业、医学、调制解调器、存储等。Cortex-A也是目前应用最广的处理器版本。

该系列处理器主要包括Cortex-A5、Cortex-A7、Cortex-A8、Cortex-A9、Cortex-A15、Cortex-A17、Cortex-A32、Cortex-A35、Cortex-A53、Cortex-A57、Cortex-A72、Cortex-A73。Cortex-A8只支持单核。其中，Cortex-A5、Cortex-A7、Cortex-A8、Cortex-A9、Cortex-A15、Cortex-A17基于ARMv7-A架构；Cortex-A32、Cortex-A35、Cortex-A53、Cortex-A57、Cortex-A72、Cortex-A73基于ARMv8-A架构，除了Cortex-A32为32位结构，其它支持64位结构。

Cortex-A处理器从高到低可排序为：Cortex-A73、Cortex-A72、Cortex-A57、Cortex-A53、Cortex-A35、Cortex-A32、Cortex-A17、Cortex-A15、Cortex-A7、Cortex-A9、Cortex-A8、Cortex-A5。

Company	Core	Rele-ased	Revision	Decode	Pipeline depth	Out-of-order execution	Branch prediction	big.LITTLE role	Exec. ports	Fab (in nm)	Simult. MT	L0 cache	L1 cache Instr + Data (in KiB)	L2 cache	L3 cache	Core configu- rations	DMIPS/ MHz
Company	Core	Rele-ased	Revision	Decode	Pipeline depth	Out-of-order execution	Branch prediction	big.LITTLE role	Exec. ports	Fab (in nm)	Simult. MT	L0 cache	L1 cache Instr + Data (in KiB)	L2 cache	L3 cache	Core configu- rations	DMIPS/ MHz
ARM Holdings	Cortex-A32 (32-bit)	2017	ARMv8.0-A (only 32-bit)	2-wide	8	No		LITTLE	?	28	No	No	8–64 + 8–64	0–1 MiB	No	1-4+
	Cortex-A34 (64-bit)	2019	ARMv8.0-A (only 64-bit)	2-wide	8	No		LITTLE	?		No	No	8–64 + 8–64	0–1 MiB	No	1-4+
	Cortex-A35	2017	ARMv8.0-A	2-wide	8	No	Yes	LITTLE	?	28 / 16 / 14 / 10	No	No	8–64 + 8–64	0 / 128 KiB–1 MiB	No	1–4+	1.78
	Cortex-A53	2014	ARMv8.0-A	2-wide	8	No	Conditional+ Indirect branch prediction	big/LITTLE	2	28 / 20 / 16 / 14 / 10	No	No	8–64 + 8–64	128 KiB–2 MiB	No	1–4+	2.24
	Cortex-A55	2017	ARMv8.2-A	2-wide	8	No	Conditional+ Indirect branch prediction	big/LITTLE	2	28 / 20 / 16 / 14 / 10	No	No	16–64 + 16–64	0–256 KiB/core	0–4 MiB	1–8+	2.65[8]
	Cortex-A57	2013	ARMv8.0-A	3-wide	15	Yes 3-wide dispatch	Two-level	big	8	28 / 20 / 16[10] / 14	No	No	48 + 32	0.5–2 MiB	No	1–4+	4.6
	Cortex-A65AE	2019	ARMv8.2-A	?	?	Yes	Two-level	?	2	?	SMT2	No	16-64 + 16-64	64-256 KiB	0-4 MB	1–8	?
	Cortex-A72	2015	ARMv8.0-A	3-wide	15	Yes 5-wide dispatch	Two-level	big	8	28 / 16	No	No	48 + 32	0.5–4 MiB	No	1–4+	4.72
	Cortex-A73	2016	ARMv8.0-A	2-wide	11–12	Yes 4-wide dispatch	Two-level	big	7	28 / 16 / 10	No	No	64 + 32/64	1–8 MiB	No	1–4+	~6.35
	Cortex-A75	2017	ARMv8.2-A	3-wide	11–13	Yes 6-wide dispatch	Two-level	big	8?	28 / 16 / 10	No	No	64 + 64	256–512 KiB/core	0–4 MiB	1–8+	?
	Cortex-A76	2018	ARMv8.2-A	4-wide	11–13	Yes 8-wide dispatch	Two-level	big	8	10 / 7	No	No	64 + 64	256–512 KiB/core	1–4 MiB	1–4	?
	Cortex-A77	2019	ARMv8.2-A	4-wide	11–13	Yes 10-wide dispatch	Two-level	big	12	7	No	1.5K entries	64 + 64	256–512 KiB/core	1–4 MiB	1-4	?
Apple Inc.	Cyclone	2013	ARMv8.0-A	6-wide	16	Yes	Yes	No	9	28	No	No	64 + 64	1 MiB	4 MiB	2	?
	Typhoon	2014	ARMv8.0‑A	6-wide	16	Yes	Yes	No	9	20	No	No	64 + 64	1 MiB	4 MiB	2, 3 (A8X)	?
	Twister	2015	ARMv8.0‑A	6-wide	16[20]	Yes	Yes	No	9	16 / 14	No	No	64 + 64	3 MiB	4 MiB No (A9X)	2	?
	Hurricane	2016	ARMv8.1‑A	6-wide	16	Yes	Yes	"big" (In A10/A10X paired with "LITTLE" Zephyr cores)	9	16 (A10) 10 (A10X)	No	No	64 + 64	3 MiB(A10) 8 MiB (A10X)	4 MiB(A10) No (A10X)	2x Hurricane + 2x Zephyr (A10) 3x Hurricane + 3x Zephyr (A10X)	?
	Zephyr	2016	ARMv8.1‑A	3-wide	12	Yes	Yes	LITTLE	5	16 (A10) 10 (A10X)	No	No	32 + 32	1 MiB	4 MiB[22] (A10) No (A10X)	2x Hurricane + 2x Zephyr (A10) 3x Hurricane + 3x Zephyr (A10X)	?
	Monsoon	2017	ARMv8.2‑A	7-wide	16	Yes	Yes	"big" (In Apple A11 paired with "LITTLE" Mistral cores)	13	10	No	No	64 + 64	8 MiB	No	2x Monsoon + 4× Mistral	?
	Mistral	2017	ARMv8.2‑A	3-wide	12	Yes	Yes	LITTLE	5	10	No	No	32 + 32	1 MiB	No	2x Monsoon + 4× Mistral	?
	Vortex	2018	ARMv8.3‑A	7-wide	16	Yes	Yes	"big" (In Apple A12/Apple A12X/Apple A12Z paired with "LITTLE" Tempest cores)	13	7	No	No	128 + 128	8 MiB	No	2x Vortex + 4x Tempest (A12) 4x Vortex + 4x Tempest (A12X/A12Z)	?
	Tempest	2018	ARMv8.3‑A	3-wide	12	Yes	Yes	LITTLE	5	7	No	No	32 + 32	2 MiB	No	2x Vortex + 4x Tempest (A12) 4x Vortex + 4x Tempest (A12X/A12Z)	?
	Lightning	2019	ARMv8.4‑A	7-wide	16	Yes	Yes	"big" (In Apple A13 paired with "LITTLE" Thunder cores)	13	7	No	No	128 + 128	8 MiB	No	2x Lightning + 4x Thunder	?
	Thunder	2019	ARMv8.4‑A	3-wide	12	Yes	Yes	LITTLE	5	7	No	No	32 + 48	4 MiB	No	2x Lightning + 4x Thunder	?
Nvidia	Denver	2014	ARMv8‑A	2-wide hardware decoder, up to 7-wide variable- length VLIW micro-ops	13	Not if the hardware decoder is in use. Can be provided by dynamic software translation into VLIW.	Direct+ Indirect branch prediction	No	7	28	No	No	128 + 64	2 MiB	No	2	?
	Denver 2	2016	ARMv8‑A	?	13	Not if the hardware decoder is in use. Can be provided by dynamic software translation into VLIW.	Direct+ Indirect branch prediction	"Super" Nvidia's own implementation	?	16	No	No	128 + 64	2 MiB	No	2	?
	Carmel	2018	ARMv8.2‑A	?			Direct+ Indirect branch prediction		?	12	No	No	128 + 64	2 MiB	(4 MiB @ 8 cores)	2 (+ 8)	?
Cavium	ThunderX		ARMv8-A	2-wide	?	No	Two-level		?	28	No	No	78 + 32	16 MiB	No	8–16, 24–48	?
Cavium	ThunderX2 (ex. Broadcom Vulcan)	May 2018	ARMv8.1-A	4-wide "4 μops"	?	Yes	Multi-level	?	?	16	SMT4	No	32 + 32 (data 8-way)	256KB per core	1MB per core	16-32	?
Applied Micro	Helix	?	?	?	?	?	?	?	?	40 / 28	No	No	32 + 32 (per core; write-through w/parity)	256 KiB shared per core pair (with ECC)	1 MiB/core	2, 4, 8	?
	X-Gene		?	4-wide	15	Yes	?	?	?	40	No	No			8 MiB	8	4.2
	X-Gene 2		?	4-wide	15	Yes	?	?	?	28	No	No			8 MiB	8	4.2
	X-Gene 3		?	?	?	?	?	?	?	16	No	No	?	?	32 MiB	32	?
Qualcomm	Kryo	2016	ARMv8-A	?	?	Yes	Two-level?	"big" or "LITTLE" Qualcomm's own similar implementation	?	14	No	No	32+24	0.5–1 MiB		2, 4	6.3
	Kryo 2XX	2017	ARMv8-A	2-wide	11–12	Yes 7-wide dispatch	Two-level	big	7	14 / 11 / 10 [51]	No	No	64 + 32/64?	512 KiB/Gold Core	No	4	?
	Kryo 2XX	2017	ARMv8-A	2-wide	8	No	Conditional+ Indirect branch prediction	?	2	14 / 11 / 10 [51]	No	No	8–64? + 8–64?	256 KiB/Silver Core	No	4	?
	Kryo 3XX	2018	ARMv8.2-A	3-wide	11–13	Yes 8-wide dispatch	Two-level	big	8	10[51]	No	No	64+64[51]	256 KiB/Gold Core	2 MiB	4	?
	Kryo 3XX	2018	ARMv8.2-A	2-wide	8	No	Conditional+ Indirect branch prediction	?	28	10[51]	No	No	16–64? + 16–64?	128 KiB/Silver	2 MiB	4	?
	Kryo 4XX	2019	ARMv8.2-A	4-wide	11–13	Yes 8-wide dispatch	Yes	big	8	11 / 8 / 7	No	No	64 + 64	512 KiB/Gold Prime 256 KiB/Gold	2 MiB	1+3	?
	Kryo 4XX	2019	ARMv8.2-A	2-wide	8	No	Conditional+ Indirect branch prediction	?	2	11 / 8 / 7	No	No	16–64? + 16–64?	128 KiB/Silver	2 MiB	4	?
	Falkor	11-8-2017	"ARMv8.1-A features"; AArch64 only (not 32-bit)	4-wide	10–15	Yes 8-wide dispatch	Yes	?	8	10	No	24 KiB	88[53] + 32	500KiB	1.25MiB	40-48	?
Samsung	M1/M2	2015	ARMv8-A	4-wide	13	Yes 9-wide dispatch	Two-level	big	8	14 / 10	No	No	64 + 32	2 MiB[59]	no	4	?
	M3	2018	ARMv8.2-A	6-wide	15	Yes 12-wide dispatch	Two-level	big	12	10	No	No	64 + 64	512 KiB per core	4096KB	4	?
	M4	2019	ARMv8.2-A	6-wide	15	Yes 12-wide dispatch	Two-level	big	12	8 / 7	No	No	64 + 64	512 KiB per core	4096KB	2	?
Fujitsu	A64fx	2019	ARMv8.2-A	4/2-wide	7+	Yes 5-way?	Yes	n/a	8+	7	No	No	64 + 64	8MiB per 12+1 cores	No	48+4	1.9GHz+; 15GF/W+.
HiSilicon	TaiShan V110	2019	ARMv8.2-A	4-wide	?	Yes	Yes	n/a	8	7	No	No	64 + 64	512 KiB per core	1 MiB per core	?	?

目前国产的CPU以及华为的手机麒麟手机芯片和海思芯片等都是基于ARM V8架构的，也是cortex-A系列。可以说在移动便携式领域设备，ARM几乎全部覆盖。

4 ARM 内核时间表

Year	Classic cores					Cortex cores				Neoverse cores
Year	ARM7	ARM8	ARM9	ARM10	ARM11	Microcontroller	Real-time	Application (32-bit)	Application (64-bit)	Application (64-bit)
1993	ARM700
1994	ARM710 ARM7DI ARM7TDMI
1995	ARM710a
1996		ARM810
1997	ARM710T ARM720T ARM740T
1998			ARM9TDMI ARM940T
1999			ARM9E-S ARM966E-S
2000			ARM920T ARM922T ARM946E-S	ARM1020T
2001	ARM7TDMI-S ARM7EJ-S		ARM9EJ-S ARM926EJ-S	ARM1020E ARM1022E
2002				ARM1026EJ-S	ARM1136J(F)-S
2003			ARM968E-S		ARM1156T2(F)-S ARM1176JZ(F)-S
2004						Cortex-M3
2005					ARM11MPCore			Cortex-A8
2006			ARM996HS
2007						Cortex-M1		Cortex-A9
2008
2009						Cortex-M0		Cortex-A5
2010						Cortex-M4(F)		Cortex-A15
2011							Cortex-R4 Cortex-R5 Cortex-R7	Cortex-A7
2012						Cortex-M0+			Cortex-A53 Cortex-A57
2013								Cortex-A12
2014						Cortex-M7(F)		Cortex-A17
2015									Cortex-A35 Cortex-A72
2016						Cortex-M23 Cortex-M33(F)	Cortex-R8 Cortex-R52	Cortex-A32	Cortex-A73
2017									Cortex-A55 Cortex-A75
2018						Cortex-M35P(F)			Cortex-A65AE Cortex-A76 Cortex-A76AE
2019									Cortex-A77	Neoverse E1 Neoverse N1

5 ARM第三方设计公司

Core Family	Instruction set	Microarchitecture	Feature	Cache (I / D), MMU	Typical MIPS @ MHz
StrongARM (Digital)	ARMv4	SA-110	5-stage pipeline	16 KB / 16 KB, MMU	100–233 MHz 1.0 DMIPS/MHz
StrongARM (Digital)	ARMv4	SA-1100	derivative of the SA-110	16 KB / 8 KB, MMU
Faraday[60] (Faraday Technology)	ARMv4	FA510	6-stage pipeline	Up to 32 KB / 32 KB cache, MPU	1.26 DMIPS/MHz 100–200 MHz
		FA526	6-stage pipeline	Up to 32 KB / 32 KB cache, MMU	1.26 MIPS/MHz 166–300 MHz
		FA626	8-stage pipeline	32 KB / 32 KB cache, MMU	1.35 DMIPS/MHz 500 MHz
	ARMv5TE	FA606TE	5-stage pipeline	No cache, no MMU	1.22 DMIPS/MHz 200 MHz
		FA626TE	8-stage pipeline	32 KB / 32 KB cache, MMU	1.43 MIPS/MHz 800 MHz
		FMP626TE	8-stage pipeline, SMP		1.43 MIPS/MHz 500 MHz
		FA726TE	13 stage pipeline, dual issue		2.4 DMIPS/MHz 1000 MHz
XScale (Intel / Marvell)	ARMv5TE	XScale	7-stage pipeline, Thumb, enhanced DSP instructions	32 KB / 32 KB, MMU	133–400 MHz
		Bulverde	Wireless MMX, wireless SpeedStep added	32 KB / 32 KB, MMU	312–624 MHz
		Monahans[61]	Wireless MMX2 added	32 KB / 32 KB L1, optional L2 cache up to 512 KB, MMU	Up to 1.25 GHz
Sheeva (Marvell)	ARMv5	Feroceon	5–8 stage pipeline, single-issue	16 KB / 16 KB, MMU	600–2000 MHz
		Jolteon	5–8 stage pipeline, dual-issue	32 KB / 32 KB, MMU	600–2000 MHz
		PJ1 (Mohawk)	5–8 stage pipeline, single-issue, Wireless MMX2	32 KB / 32 KB, MMU	1.46 DMIPS/MHz 1.06 GHz
	ARMv6 / ARMv7-A	PJ4	6–9 stage pipeline, dual-issue, Wireless MMX2, SMP	32 KB / 32 KB, MMU	2.41 DMIPS/MHz 1.6 GHz
Snapdragon (Qualcomm)	ARMv7-A	Scorpion[62]	1 or 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv3 FPU / NEON (128-bit wide)	256 KB L2 per core	2.1 DMIPS/MHz per core
	ARMv7-A	Krait[62]	1, 2, or 4 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON (128-bit wide)	4 KB / 4 KB L0, 16 KB / 16 KB L1, 512 KB L2 per core	3.3 DMIPS/MHz per core
	ARMv8-A	Kryo[63]	4 cores.	?	Up to 2.2 GHz (6.3 DMIPS/MHz)
Ax (Apple)	ARMv7-A	Swift[64]	2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON	L1: 32 KB / 32 KB, L2: 1 MB	3.5 DMIPS/MHz per core
	ARMv8-A	Cyclone[65]	2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64	L1: 64 KB / 64 KB, L2: 1 MB, L3: 4 MB	1.3 or 1.4 GHz
	ARMv8-A	Typhoon[65][66]	2 or 3 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64	L1: 64 KB / 64 KB, L2: 1 MB or 2 MB, L3: 4 MB	1.4 or 1.5 GHz
	ARMv8-A	Twister[67]	2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64	L1: 64 KB / 64 KB, L2: 2 MB, L3: 4 MB or 0 MB	1.85 or 2.26 GHz
	ARMv8.1-A	Hurricane[68]	2 or 3 cores. AArch64, 6-decode, 6-issue, 9-wide, superscalar, out-of-order	L1: 64 KB / 64 KB, L2: 3 MB or 8 MB, L3: 4 MB or 0 MB	2.34 or 2.38 GHz
	ARMv8.2-A	Monsoon[69]	2 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order	L1I: 128 KB, L1D: 64 KB, L2: 8 MB, L3: 4 MB	2.39 GHz
	ARMv8.3-A	Vortex[70]	2 or 4 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order	L1: 128 KB / 128 KB, L2: 8 MB, L3: 8 MB	2.5 GHz
	ARMv8.4-A	Lightning[71]	2 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-order	L1: 128 KB / 128 KB, L2: 8 MB, L3: 16 MB	2.66 GHz
X-Gene (Applied Micro)	ARMv8-A	X-Gene	64-bit, quad issue, SMP, 64 cores[72]	Cache, MMU, virtualization	3 GHz (4.2 DMIPS/MHz per core)
Denver (Nvidia)	ARMv8-A	Denver[73][74]	2 cores. AArch64, 7-wide superscalar, in-order, dynamic code optimization, 128 MB optimization cache, Denver1: 28nm, Denver2:16nm	128 KB I-cache / 64 KB D-cache	Up to 2.5 GHz
Carmel (Nvidia)	ARMv8(t.b.d.)	Carmel[75][76]	2 cores. AArch64, 10-wide superscalar, in-order, dynamic code optimization, ? MB optimization cache, functional safety, dual execution, parity & ECC	? KB I-cache / ? KB D-cache	Up to ? GHz
ThunderX (Cavium)	ARMv8-A	ThunderX	64-bit, with two models with 8–16 or 24–48 cores (×2 w/two chips)	?	Up to 2.2 GHz
K12 (AMD)	ARMv8-A	K12[77]	?	?	?
Exynos (Samsung)	ARMv8-A	M1/M2 ("Mongoose")[78]	4 cores. AArch64, 4-wide, quad-issue, superscalar, out-of-order	64 KB I-cache / 32 KB D-cache, L2: 16-way shared 2 MB	5.1 DMIPS/MHz (2.6 GHz)
	ARMv8-A	M3 ("Meerkat")[79]	4 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order	64 KB I-cache / 32 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB	?
	ARMv8.2-A	M4 ("Cheetah")	2 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order	64 KB I-cache / 32 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB	?