ARM架构版本及处理器系列详细介绍

news/2024/11/14 14:06:38/

目录

1 ARM发展

2 ARM版本

3ARM系列说明

3.1ARM7系列

3.2ARM9系列

3.3ARM11系列

3.4Cortex-R系列

3.5Cortex-M系列

3.6Cortex-A系列

4ARM 内核时间表

5ARM第三方设计公司


1 ARM发展

         ARM是Advanced RISC Machine的缩写,即进阶精简指令集机器。arm更早称为Acorn RISC Machine,是一个32位精简指令集(RISC)处理器架构。也有基于ARM设计的派生产品,主要产品包括Marvell的XScale架构和和德州仪器的OMAP系列。ARM家族中32位嵌入式处理器占比达75%,由于ARM的低功耗特性,被广泛反应于移动通信领域、便携式设备等领域。

       1983年Acorn电脑公司(Acorn Computers Ltd)开始开发一颗主要用于路由器的Conexant ARM处理器,由Roger Wilson和Steve Furber带领团队,着手开发一种新架构,类似进阶的MOS Technology 6502处理器。Acorn有一大堆建构在6502架构上的电脑。该团队在1985年时开发出ARM1 Sample版,并于次年量产了ARM2,ARM2具有32位的数据总线、26位的寻址空间,并提供64 Mbyte的寻址范围与16个32-bit的暂存器。

        在1980年代晚期,苹果电脑开始与Acorn合作开发新版的ARM核心。1990年将设计团队另组成一间名为安谋国际科技(Advanced RISC Machines Ltd.)的新公司,。1991年首版ARM6出样,然后苹果电脑使用ARM6架构的ARM 610来当作他们Apple Newton PDA的基础。在1994年,Acorn使用ARM 610做为他们Risc PC电脑内的CPU。

        ARM是一家微处理器行业的知名企业,该企业设计了大量高性能、廉价、耗能低的RISC (精简指令集)处理器,它只设计芯片而不生产。ARM的经营模式在于出售其知识产权核(IP core),将技术授权给世界上许多著名的半导体、软件和OEM厂商,并提供技术服务。

        ARM的版本分为两类,一个是内核版本,一个处理器版本。内核版本也就是ARM架构,如ARMv1、ARMv2、ARMv3、ARMv4、ARMv5、ARMv6、ARMv7、ARMv8等。处理器版本也就是ARM处理器,如ARM1、ARM9、ARM11、ARM Cortex-A(A7、A9、A15),ARM Cortex-M(M1、M3、M4)、ARM Cortex-R,这个也是我们通常意义上所指的ARM版本。

2 ARM版本

ARM版本信息简化表如下表所示。

内核(架构)版本

处理器版本

ARMv1

ARM1

ARMv2

ARM2、ARM3

ARMv3

ARM6、ARM7

ARMv4

StrongARM、ARM7TDMI、ARM9TDMI

ARMv5

ARM7EJ、ARM9E、ARM10E、XScale

ARMv6

ARM11、ARM Cortex-M

ARMv7

ARM Cortex-A、ARM Cortex-M、ARM Cortex-R

ARMv8

ARM Cortex-A30、ARM Cortex-A50、ARM Cortex-A70

ARM版本信息详细表如下表所示。(参考https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures)

ARM familyARM architectureARM coreFeatureCache (I / D), MMUTypical MIPS @ MHzReference
ARM1ARMv1ARM1First implementationNone  
ARM2ARMv2ARM2ARMv2 added the MUL (multiply) instructionNone4 MIPS @ 8 MHz
0.33 DMIPS/MHz
 
ARMv2aARM250Integrated MEMC (MMU), graphics and I/O processor. ARMv2a added the SWP and SWPB (swap) instructionsNone, MEMC1a7 MIPS @ 12 MHz 
ARM3ARMv2aARM3First integrated memory cache4 KB unified12 MIPS @ 25 MHz
0.50 DMIPS/MHz
 
ARM6ARMv3ARM60ARMv3 first to support 32-bit memory address space (previously 26-bit).
ARMv3M first added long multiply instructions (32x32=64).
None10 MIPS @ 12 MHz 
ARM600As ARM60, cache and coprocessor bus (for FPA10 floating-point unit)4 KB unified28 MIPS @ 33 MHz 
ARM610As ARM60, cache, no coprocessor bus4 KB unified17 MIPS @ 20 MHz
0.65 DMIPS/MHz
[4]
ARM7ARMv3ARM700 8 KB unified40 MHz 
ARM710As ARM700, no coprocessor bus8 KB unified40 MHz[5]
ARM710aAs ARM7108 KB unified40 MHz
0.68 DMIPS/MHz
 
ARM7TARMv4TARM7TDMI(-S)3-stage pipeline, Thumb, ARMv4 first to drop legacy ARM 26-bit addressingNone15 MIPS @ 16.8 MHz
63 DMIPS @ 70 MHz
 
ARM710TAs ARM7TDMI, cache8 KB unified, MMU36 MIPS @ 40 MHz 
ARM720TAs ARM7TDMI, cache8 KB unified, MMU with FCSE (Fast Context Switch Extension)60 MIPS @ 59.8 MHz 
ARM740TAs ARM7TDMI, cacheMPU  
ARM7EJARMv5TEJARM7EJ-S5-stage pipeline, Thumb, Jazelle DBX, enhanced DSP instructionsNone  
ARM8ARMv4ARM8105-stage pipeline, static branch prediction, double-bandwidth memory8 KB unified, MMU84 MIPS @ 72 MHz
1.16 DMIPS/MHz
[6][7]
ARM9TARMv4TARM9TDMI5-stage pipeline, ThumbNone  
ARM920TAs ARM9TDMI, cache16 KB / 16 KB, MMU with FCSE (Fast Context Switch Extension)200 MIPS @ 180 MHz[8]
ARM922TAs ARM9TDMI, caches8 KB / 8 KB, MMU  
ARM940TAs ARM9TDMI, caches4 KB / 4 KB, MPU  
ARM9EARMv5TEARM946E-SThumb, enhanced DSP instructions, cachesVariable, tightly coupled memories, MPU  
ARM966E-SThumb, enhanced DSP instructionsNo cache, TCMs  
ARM968E-SAs ARM966E-SNo cache, TCMs  
ARMv5TEJARM926EJ-SThumb, Jazelle DBX, enhanced DSP instructionsVariable, TCMs, MMU220 MIPS @ 200 MHz 
ARMv5TEARM996HSClockless processor, as ARM966E-SNo caches, TCMs, MPU  
ARM10EARMv5TEARM1020E6-stage pipeline, Thumb, enhanced DSP instructions, (VFP)32 KB / 32 KB, MMU  
ARM1022EAs ARM1020E16 KB / 16 KB, MMU  
ARMv5TEJARM1026EJ-SThumb, Jazelle DBX, enhanced DSP instructions, (VFP)Variable, MMU or MPU  
ARM11ARMv6ARM1136J(F)-S8-stage pipeline, SIMD, Thumb, Jazelle DBX, (VFP), enhanced DSP instructions, unaligned memory accessVariable, MMU740 @ 532–665 MHz (i.MX31 SoC), 400–528 MHz[9]
ARMv6T2ARM1156T2(F)-S9-stage pipeline, SIMD, Thumb-2, (VFP), enhanced DSP instructionsVariable, MPU [10]
ARMv6ZARM1176JZ(F)-SAs ARM1136EJ(F)-SVariable, MMU + TrustZone965 DMIPS @ 772 MHz, up to 2,600 DMIPS with four processors[11]
ARMv6KARM11MPCoreAs ARM1136EJ(F)-S, 1–4 core SMPVariable, MMU  
SecurCoreARMv6-MSC000  0.9 DMIPS/MHz 
ARMv4TSC100    
ARMv7-MSC300  1.25 DMIPS/MHz 
Cortex-MARMv6-MCortex-M0[12]Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), optional system timer, optional bit-banding memoryOptional cache, no TCM, no MPU0.84 DMIPS/MHz 
Cortex-M0+[14]Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), optional system timer, optional bit-banding memoryOptional cache, no TCM, optional MPU with 8 regions0.93 DMIPS/MHz 
Cortex-M1[15]Microcontroller profile, most Thumb + some Thumb-2,[13] hardware multiply instruction (optional small), OS option adds SVC / banked stack pointer, optional system timer, no bit-banding memoryOptional cache, 0–1024 KB I-TCM, 0–1024 KB D-TCM, no MPU136 DMIPS @ 170 MHz,[16] (0.8 DMIPS/MHz FPGA-dependent)[17] 
ARMv7-MCortex-M3[18]Microcontroller profile, Thumb / Thumb-2, hardware multiply and divide instructions, optional bit-banding memoryOptional cache, no TCM, optional MPU with 8 regions1.25 DMIPS/MHz 
ARMv7E-MCortex-M4[19]Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv4-SP single-precision FPU, hardware multiply and divide instructions, optional bit-banding memoryOptional cache, no TCM, optional MPU with 8 regions1.25 DMIPS/MHz (1.27 w/FPU) 
Cortex-M7[20]Microcontroller profile, Thumb / Thumb-2 / DSP / optional VFPv5 single and double precision FPU, hardware multiply and divide instructions0−64 KB I-cache, 0−64 KB D-cache, 0–16 MB I-TCM, 0–16 MB D-TCM (all these w/optional ECC), optional MPU with 8 or 16 regions2.14 DMIPS/MHz 
ARMv8-MCortex-M23[21]Microcontroller profile, Thumb-1 (most), Thumb-2 (some), Divide, TrustZoneOptional cache, no TCM, optional MPU with 16 regions0.99 DMIPS/MHz 
Cortex-M33[22]Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processorOptional cache, no TCM, optional MPU with 16 regions1.50 DMIPS/MHz 
Cortex-M35P[23]Microcontroller profile, Thumb-1, Thumb-2, Saturated, DSP, Divide, FPU (SP), TrustZone, Co-processorBuilt-in cache (with option 2–16 KB), I-cache, no TCM, optional MPU with 16 regions1.50 DMIPS/MHz 
Cortex-RARMv7-RCortex-R4[24]Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lockstep with fault logic0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 8/12 regions1.67 DMIPS/MHz[25] 
Cortex-R5[26]Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 8-stage pipeline dual-core running lock-step with fault logic / optional as 2 independent cores, low-latency peripheral port (LLPP), accelerator coherency port (ACP)[27]0–64 KB / 0–64 KB, 0–2 of 0–8 MB TCM, opt. MPU with 12/16 regions1.67 DMIPS/MHz[25] 
Cortex-R7[28]Real-time profile, Thumb / Thumb-2 / DSP / optional VFPv3 FPU and precision, hardware multiply and optional divide instructions, optional parity & ECC for internal buses / cache / TCM, 11-stage pipeline dual-core running lock-step with fault logic / out-of-order execution / dynamic register renaming / optional as 2 independent cores, low-latency peripheral port (LLPP), ACP[27]0–64 KB / 0–64 KB, ? of 0–128 KB TCM, opt. MPU with 16 regions2.50 DMIPS/MHz[25] 
Cortex-R8[29]TBDTBD2.50 DMIPS/MHz[25] 
ARMv8-RCortex-R52[30]TBDTBD2.16 DMIPS/MHz[31] 
Cortex-A
(32-bit)
ARMv7-ACortex-A5[32]Application profile, ARM / Thumb / Thumb-2 / DSP / SIMD / Optional VFPv4-D16 FPU / Optional NEON / Jazelle RCT and DBX, 1–4 cores / optional MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)4−64 KB / 4−64 KB L1, MMU + TrustZone1.57 DMIPS/MHz per core 
Cortex-A7[33]Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / Jazelle RCT and DBX / Hardware virtualization, in-order execution, superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), architecture and feature set are identical to A15, 8–10 stage pipeline, low-power design[34]8−64 KB / 8−64 KB L1, 0–1 MB L2, MMU + TrustZone1.9 DMIPS/MHz per core 
Cortex-A8[35]Application profile, ARM / Thumb / Thumb-2 / VFPv3 FPU / NEON / Jazelle RCT and DAC, 13-stage superscalar pipeline16–32 KB / 16–32 KB L1, 0–1 MB L2 opt. ECC, MMU + TrustZoneUp to 2000 (2.0 DMIPS/MHz in speed from 600 MHz to greater than 1 GHz) 
Cortex-A9[36]Application profile, ARM / Thumb / Thumb-2 / DSP / Optional VFPv3 FPU / Optional NEON / Jazelle RCT and DBX, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)16–64 KB / 16–64 KB L1, 0–8 MB L2 opt. parity, MMU + TrustZone2.5 DMIPS/MHz per core, 10,000 DMIPS @ 2 GHz on Performance Optimized TSMC 40G (dual-core) 
Cortex-A12[37]Application profile, ARM / Thumb-2 / DSP / VFPv4 FPU / NEON / Hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), accelerator coherence port (ACP)32−64 KB3.0 DMIPS/MHz per core 
Cortex-A15[38]Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP, 15-24 stage pipeline[34]32 KB w/parity / 32 KB w/ECC L1, 0–4 MB L2, L2 has ECC, MMU + TrustZoneAt least 3.5 DMIPS/MHz per core (up to 4.01 DMIPS/MHz depending on implementation)[39] 
Cortex-A17[40]Application profile, ARM / Thumb / Thumb-2 / DSP / VFPv4 FPU / NEON / integer divide / fused MAC / Jazelle RCT / hardware virtualization, out-of-order speculative issue superscalar, 1–4 SMP cores, MPCore, Large Physical Address Extensions (LPAE), snoop control unit (SCU), generic interrupt controller (GIC), ACP32 KB L1, 256 KB–8 MB L2 w/optional ECC2.8 DMIPS/MHz 
ARMv8-ACortex-A32[41]Application profile, AArch32, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, dual issue, in-order pipeline8–64 KB w/optional parity / 8−64 KB w/optional ECC L1 per core, 128 KB–1 MB L2 w/optional ECC shared  
Cortex-A
(64-bit)
ARMv8-AARM Cortex-A34[42]Application profile, AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses  
Cortex-A35[43]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–1 MB L2 shared, 40-bit physical addresses1.78 DMIPS/MHz 
Cortex-A53[44]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline8−64 KB w/parity / 8−64 KB w/ECC L1 per core, 128 KB–2 MB L2 shared, 40-bit physical addresses2.3 DMIPS/MHz 
Cortex-A57[45]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses4.1–4.5 DMIPS/MHz[46][47] 
Cortex-A72[48]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width superscalar, deeply out-of-order pipeline48 KB w/DED parity / 32 KB w/ECC L1 per core; 512 KB–2 MB L2 shared w/ECC; 44-bit physical addresses4.7 DMIPS/MHz 
Cortex-A73[49]Application profile, AArch32 and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width superscalar, deeply out-of-order pipeline64 KB / 32−64 KB L1 per core, 256 KB–8 MB L2 shared w/ optional ECC, 44-bit physical addresses4.8 DMIPS/MHz[50] 
ARMv8.2-ACortex-A55[51]Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-width decode, in-order pipeline[52]16−64 KB / 16−64 KB L1, 256 KB L2 per core, 4 MB L3 shared  
Arm Cortex-A65AE[53]Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, out-of-order pipeline, SMT64 / 64 KB L1, 256 KB L2 per core, 4 MB L3 shared  
Cortex-A75[54]Application profile, AArch32 and AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 3-width decode superscalar, deeply out-of-order pipeline[55]64 / 64 KB L1, 512 KB L2 per core, 4 MB L3 shared  
Cortex-A76[56]Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way issue, 13 stage pipeline, deeply out-of-order pipeline[57]64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared  
Cortex-A77[58]Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 6-width instruction fetch, 12-way issue, 13 stage pipeline, deeply out-of-order pipeline[57]1.5K L0 MOPs cache, 64 / 64 KB L1, 256−512 KB L2 per core, 512 KB−4 MB L3 shared  
NeoverseNeoverse N1[59]Application profile, AArch32 (non-privileged level or EL0 only) and AArch64, 1–4 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 4-width decode superscalar, 8-way dispatch/issue, 13 stage pipeline, deeply out-of-order pipeline[57]64 / 64 KB L1, 512−1024 KB L2 per core, 2−128 MB L3 shared, 128 MB system level cache  
Neoverse E1Application profile, AArch64, 1–8 SMP cores, TrustZone, NEON advanced SIMD, VFPv4, hardware virtualization, 2-wide decode superscalar, 3-width issue, 10 stage pipeline, out-of-order pipeline, SMT32−64 KB / 32−64 KB L1, 256 KB L2 per core, 4 MB L3 shared  
ARM familyARM architectureARM coreFeatureCache (I / D), MMUTypical MIPS @ MHzReference

 

3 ARM系列说明

3.1 ARM7系列

         该系列主要针对某些简单的32位设备,作为目前较旧的一个系列,ARM7处理器已经不建议继续在新品中使用。主要包括ARM7TDMI-S(ARMv4T架构)和ARM7EJ-S(ARMv5TEJ架构)。

3.2 ARM9系列

         主要针对嵌入式实时应用,主要包括ARM926EJ-S、ARM946E-S和 ARM968E-S。

3.3 ARM11系列

         主要应用在高可靠性和实时嵌入式应用领域,主要包括ARM11MPCore、ARM1176、ARM1156、ARM1136。

3.4 Cortex-R系列

         Cortex-R,代表实时的意义(Real-Time),目标是实时任务处理,主要应用领域包括汽车、相机、工业、医学等。

该系列处理器主要包括Cortex-R4、Cortex-R5、Cortex-R7、Cortex-R8、Cortex-R52、Cortex-A17。

3.5 Cortex-M系列

          Cortex-M,代表微处理器的意义(Microcontrollers),目标是最节能的嵌入式设备,主要应用领域包括汽车、能源网、医学、嵌入式、智能卡、智能设备。传感器融合、穿戴设备等。

该系列处理器主要包括Cortex-M0、Cortex-M0+、Cortex-M3、Cortex-M4、Cortex-M7、Cortex-M23、Cortex-M33、Cortex-M35P。

3.6 Cortex-A系列

         Cortex-A,代表的是先进意义(Advanced),目标是以最佳功耗实现最高性能,主要应用领域包括汽车、工业、医学、调制解调器、存储等。Cortex-A也是目前应用最广的处理器版本。

         该系列处理器主要包括Cortex-A5、Cortex-A7、Cortex-A8、Cortex-A9、Cortex-A15、Cortex-A17、Cortex-A32、Cortex-A35、Cortex-A53、Cortex-A57、Cortex-A72、Cortex-A73。Cortex-A8只支持单核。其中,Cortex-A5、Cortex-A7、Cortex-A8、Cortex-A9、Cortex-A15、Cortex-A17基于ARMv7-A架构;Cortex-A32、Cortex-A35、Cortex-A53、Cortex-A57、Cortex-A72、Cortex-A73基于ARMv8-A架构,除了Cortex-A32为32位结构,其它支持64位结构。

         Cortex-A处理器从高到低可排序为:Cortex-A73、Cortex-A72、Cortex-A57、Cortex-A53、Cortex-A35、Cortex-A32、Cortex-A17、Cortex-A15、Cortex-A7、Cortex-A9、Cortex-A8、Cortex-A5。

CompanyCoreRele-asedRevisionDecodePipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE roleExec.
ports
Fab
(in nm)
Simult. MTL0 cacheL1 cache
Instr + Data
(in KiB)
L2 cacheL3 cacheCore
configu-
rations
DMIPS/
MHz
CompanyCoreRele-asedRevisionDecodePipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE roleExec.
ports
Fab
(in nm)
Simult. MTL0 cacheL1 cache
Instr + Data
(in KiB)
L2 cacheL3 cacheCore
configu-
rations
DMIPS/
MHz
ARM HoldingsCortex-A32 (32-bit)2017ARMv8.0-A
(only 32-bit)
2-wide8No LITTLE?28NoNo8–64 + 8–640–1 MiBNo1-4+ 
Cortex-A34 (64-bit)2019ARMv8.0-A
(only 64-bit)
2-wide8No LITTLE? NoNo8–64 + 8–640–1 MiBNo1-4+ 
Cortex-A352017ARMv8.0-A2-wide8NoYesLITTLE?28 / 16 /

14 / 10

NoNo8–64 + 8–640 / 128 KiB–1 MiBNo1–4+1.78
Cortex-A532014ARMv8.0-A2-wide8NoConditional+
Indirect branch
prediction
big/LITTLE228 / 20 /

16 / 14 / 10

NoNo8–64 + 8–64128 KiB–2 MiBNo1–4+2.24
Cortex-A552017ARMv8.2-A2-wide8Nobig/LITTLE228 / 20 /

16 / 14 / 10

NoNo16–64 + 16–640–256 KiB/core0–4 MiB1–8+2.65[8]
Cortex-A572013ARMv8.0-A3-wide15Yes
3-wide dispatch
Two-levelbig828 / 20 /

16[10] / 14

NoNo48 + 320.5–2 MiBNo1–4+4.6
Cortex-A65AE2019ARMv8.2-A??YesTwo-level?2?SMT2No16-64 + 16-6464-256 KiB0-4 MB1–8?
Cortex-A722015ARMv8.0-A3-wide15Yes
5-wide dispatch
Two-levelbig828 / 16NoNo48 + 320.5–4 MiBNo1–4+4.72
Cortex-A732016ARMv8.0-A2-wide11–12Yes
4-wide dispatch
Two-levelbig728 / 16 / 10NoNo64 + 32/641–8 MiBNo1–4+~6.35
Cortex-A752017ARMv8.2-A3-wide11–13Yes
6-wide dispatch
Two-levelbig8?28 / 16 / 10NoNo64 + 64256–512 KiB/core0–4 MiB1–8+?
Cortex-A762018ARMv8.2-A4-wide11–13Yes
8-wide dispatch
Two-levelbig810 / 7NoNo64 + 64256–512 KiB/core1–4 MiB1–4?
Cortex-A772019ARMv8.2-A4-wide11–13Yes
10-wide dispatch
Two-levelbig127No1.5K entries64 + 64256–512 KiB/core1–4 MiB1-4?
Apple Inc.Cyclone2013ARMv8.0-A6-wide16YesYesNo928NoNo64 + 641 MiB4 MiB2?
Typhoon2014ARMv8.0‑A6-wide16YesYesNo920NoNo64 + 641 MiB4 MiB2, 3 (A8X)?
Twister2015ARMv8.0‑A6-wide16[20]YesYesNo916 / 14NoNo64 + 643 MiB4 MiB
No (A9X)
2?
Hurricane2016ARMv8.1‑A6-wide16YesYes"big" (In A10/A10X paired with "LITTLE" Zephyr
cores)
916 (A10)
10 (A10X)
NoNo64 + 643 MiB(A10)
8 MiB (A10X)
4 MiB(A10)
No (A10X)
2x Hurricane + 2x Zephyr (A10)
3x Hurricane + 3x Zephyr (A10X)
?
Zephyr2016ARMv8.1‑A3-wide12YesYesLITTLE516 (A10)
10 (A10X)
NoNo32 + 321 MiB4 MiB[22] (A10)
No (A10X)
2x Hurricane + 2x Zephyr (A10)
3x Hurricane + 3x Zephyr (A10X)
?
Monsoon2017ARMv8.2‑A7-wide16YesYes"big" (In Apple A11 paired with "LITTLE" Mistral
cores)
1310NoNo64 + 648 MiBNo2x Monsoon + 4× Mistral?
Mistral2017ARMv8.2‑A3-wide12YesYesLITTLE510NoNo32 + 321 MiBNo2x Monsoon + 4× Mistral?
Vortex2018ARMv8.3‑A7-wide16YesYes"big" (In Apple A12/Apple A12X/Apple A12Z paired with "LITTLE" Tempest
cores)
137NoNo128 + 1288 MiBNo2x Vortex + 4x Tempest (A12)
4x Vortex + 4x Tempest (A12X/A12Z)
?
Tempest2018ARMv8.3‑A3-wide12YesYesLITTLE57NoNo32 + 322 MiBNo2x Vortex + 4x Tempest (A12)
4x Vortex + 4x Tempest (A12X/A12Z)
?
Lightning2019ARMv8.4‑A 7-wide16YesYes"big" (In Apple A13 paired with "LITTLE" Thunder
cores)
137NoNo128 + 1288 MiBNo2x Lightning + 4x Thunder?
Thunder2019ARMv8.4‑A 3-wide12YesYesLITTLE57NoNo32 + 484 MiBNo2x Lightning + 4x Thunder?
NvidiaDenver2014ARMv8‑A2-wide hardware
decoder, up to
7-wide variable-
length VLIW
micro-ops
13Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
No728NoNo128 + 642 MiBNo2?
Denver 22016ARMv8‑A?13Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
"Super" Nvidia's own implementation?16NoNo128 + 642 MiBNo2?
Carmel2018ARMv8.2‑A?  Direct+
Indirect branch
prediction
 ?12NoNo128 + 642 MiB(4 MiB @ 8 cores)2 (+ 8)?
CaviumThunderX ARMv8-A2-wide?NoTwo-level ?28NoNo78 + 3216 MiBNo8–16, 24–48?
ThunderX2
(ex. Broadcom Vulcan)
May 2018ARMv8.1-A
 
4-wide
"4 μops"
?YesMulti-level??16SMT4No32 + 32
(data 8-way)
256KB
per core
1MB
per core
16-32?
Applied

Micro

Helix????????40 / 28NoNo32 + 32 (per core;
write-through
w/parity)
256 KiB shared
per core pair (with ECC)
1 MiB/core2, 4, 8?
X-Gene ?4-wide15Yes???40NoNo8 MiB84.2
X-Gene 2 ?4-wide15Yes???28NoNo8 MiB84.2
X-Gene 3 ???????16NoNo??32 MiB32?
QualcommKryo2016ARMv8-A??YesTwo-level?"big" or "LITTLE"
Qualcomm's own similar implementation
?14NoNo32+240.5–1 MiB 2, 46.3
Kryo 2XX2017ARMv8-A2-wide11–12Yes
7-wide dispatch
Two-levelbig714 / 11 / 10 [51]NoNo64 + 32/64?512 KiB/Gold CoreNo4?
2-wide8NoConditional+
Indirect branch
prediction
?2NoNo8–64? + 8–64?256 KiB/Silver Core4?
Kryo 3XX2018ARMv8.2-A3-wide11–13Yes
8-wide dispatch
Two-levelbig810[51]NoNo64+64[51]256 KiB/Gold Core2 MiB4?
2-wide8NoConditional+
Indirect branch
prediction
?28NoNo16–64? + 16–64?128 KiB/Silver4?
Kryo 4XX2019ARMv8.2-A4-wide11–13Yes
8-wide dispatch
Yesbig811 / 8 / 7NoNo64 + 64512 KiB/Gold Prime

256 KiB/Gold

2 MiB1+3?
2-wide8NoConditional+
Indirect branch
prediction
?2NoNo16–64? + 16–64?128 KiB/Silver4?
Falkor11-8-2017"ARMv8.1-A features"; AArch64 only (not 32-bit)4-wide10–15Yes
8-wide dispatch
Yes?810No24 KiB88[53] + 32500KiB1.25MiB40-48?
SamsungM1/M22015ARMv8-A4-wide13Yes
9-wide dispatch
Two-levelbig814 / 10NoNo64 + 322 MiB[59]no4?
M32018ARMv8.2-A6-wide15Yes
12-wide dispatch
Two-levelbig1210NoNo64 + 64512 KiB per core4096KB4?
M42019ARMv8.2-A6-wide15Yes
12-wide dispatch
Two-levelbig128 / 7NoNo64 + 64512 KiB per core4096KB2?
FujitsuA64fx2019ARMv8.2-A4/2-wide7+Yes
5-way?
Yesn/a8+7NoNo64 + 648MiB per 12+1 coresNo48+41.9GHz+; 15GF/W+.
HiSiliconTaiShan V1102019ARMv8.2-A4-wide?YesYesn/a87NoNo64 + 64512 KiB per core1 MiB per core??

         目前国产的CPU以及华为的手机麒麟手机芯片和海思芯片等都是基于ARM V8架构的,也是cortex-A系列。可以说在移动便携式领域设备,ARM几乎全部覆盖。

4 ARM 内核时间表

YearClassic coresCortex coresNeoverse cores
ARM7ARM8ARM9ARM10ARM11MicrocontrollerReal-timeApplication
(32-bit)
Application
(64-bit)
Application
(64-bit)
1993ARM700         
1994ARM710
ARM7DI
ARM7TDMI
         
1995ARM710a         
1996 ARM810        
1997ARM710T
ARM720T
ARM740T
         
1998  ARM9TDMI
ARM940T
       
1999  ARM9E-S
ARM966E-S
       
2000  ARM920T
ARM922T
ARM946E-S
ARM1020T      
2001ARM7TDMI-S
ARM7EJ-S
 ARM9EJ-S
ARM926EJ-S
ARM1020E
ARM1022E
      
2002   ARM1026EJ-SARM1136J(F)-S     
2003  ARM968E-S ARM1156T2(F)-S
ARM1176JZ(F)-S
     
2004     Cortex-M3    
2005    ARM11MPCore  Cortex-A8  
2006  ARM996HS       
2007     Cortex-M1 Cortex-A9  
2008          
2009     Cortex-M0 Cortex-A5  
2010     Cortex-M4(F) Cortex-A15  
2011      Cortex-R4
Cortex-R5
Cortex-R7
Cortex-A7  
2012     Cortex-M0+  Cortex-A53
Cortex-A57
 
2013       Cortex-A12  
2014     Cortex-M7(F) Cortex-A17  
2015        Cortex-A35
Cortex-A72
 
2016     Cortex-M23
Cortex-M33(F)
Cortex-R8
Cortex-R52
Cortex-A32Cortex-A73 
2017        Cortex-A55
Cortex-A75
 
2018     Cortex-M35P(F)  Cortex-A65AE
Cortex-A76
Cortex-A76AE
 
2019        Cortex-A77Neoverse E1
Neoverse N1

5 ARM第三方设计公司

Core FamilyInstruction setMicroarchitectureFeatureCache (I / D), MMUTypical MIPS @ MHz
StrongARM
(Digital)
ARMv4SA-1105-stage pipeline16 KB / 16 KB, MMU100–233 MHz
1.0 DMIPS/MHz
SA-1100derivative of the SA-11016 KB / 8 KB, MMU 
Faraday[60]
(Faraday Technology)
ARMv4FA5106-stage pipelineUp to 32 KB / 32 KB cache, MPU1.26 DMIPS/MHz
100–200 MHz
FA526Up to 32 KB / 32 KB cache, MMU1.26 MIPS/MHz
166–300 MHz
FA6268-stage pipeline32 KB / 32 KB cache, MMU1.35 DMIPS/MHz
500 MHz
ARMv5TEFA606TE5-stage pipelineNo cache, no MMU1.22 DMIPS/MHz
200 MHz
FA626TE8-stage pipeline32 KB / 32 KB cache, MMU1.43 MIPS/MHz
800 MHz
FMP626TE8-stage pipeline, SMP1.43 MIPS/MHz
500 MHz
FA726TE13 stage pipeline, dual issue2.4 DMIPS/MHz
1000 MHz
XScale
(Intel / Marvell)
ARMv5TEXScale7-stage pipeline, Thumb, enhanced DSP instructions32 KB / 32 KB, MMU133–400 MHz
BulverdeWireless MMX, wireless SpeedStep added32 KB / 32 KB, MMU312–624 MHz
Monahans[61]Wireless MMX2 added32 KB / 32 KB L1, optional L2 cache up to 512 KB, MMUUp to 1.25 GHz
Sheeva
(Marvell)
ARMv5Feroceon5–8 stage pipeline, single-issue16 KB / 16 KB, MMU600–2000 MHz
Jolteon5–8 stage pipeline, dual-issue32 KB / 32 KB, MMU
PJ1 (Mohawk)5–8 stage pipeline, single-issue, Wireless MMX232 KB / 32 KB, MMU1.46 DMIPS/MHz
1.06 GHz
ARMv6 / ARMv7-APJ46–9 stage pipeline, dual-issue, Wireless MMX2, SMP32 KB / 32 KB, MMU2.41 DMIPS/MHz
1.6 GHz
Snapdragon
(Qualcomm)
ARMv7-AScorpion[62]1 or 2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv3 FPU / NEON (128-bit wide)256 KB L2 per core2.1 DMIPS/MHz per core
Krait[62]1, 2, or 4 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON (128-bit wide)4 KB / 4 KB L0, 16 KB / 16 KB L1, 512 KB L2 per core3.3 DMIPS/MHz per core
ARMv8-AKryo[63]4 cores.?Up to 2.2 GHz

(6.3 DMIPS/MHz)

Ax
(Apple)
ARMv7-ASwift[64]2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEONL1: 32 KB / 32 KB, L2: 1 MB3.5 DMIPS/MHz per core
ARMv8-ACyclone[65]2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64L1: 64 KB / 64 KB, L2: 1 MB, L3: 4 MB1.3 or 1.4 GHz
ARMv8-ATyphoon[65][66]2 or 3 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64L1: 64 KB / 64 KB, L2: 1 MB or 2 MB, L3: 4 MB1.4 or 1.5 GHz
ARMv8-ATwister[67]2 cores. ARM / Thumb / Thumb-2 / DSP / SIMD / VFPv4 FPU / NEON / TrustZone / AArch64L1: 64 KB / 64 KB, L2: 2 MB, L3: 4 MB or 0 MB1.85 or 2.26 GHz
ARMv8.1-AHurricane[68]2 or 3 cores. AArch64, 6-decode, 6-issue, 9-wide, superscalar, out-of-orderL1: 64 KB / 64 KB, L2: 3 MB or 8 MB, L3: 4 MB or 0 MB2.34 or 2.38 GHz
ARMv8.2-AMonsoon[69]2 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-orderL1I: 128 KB, L1D: 64 KB, L2: 8 MB, L3: 4 MB2.39 GHz
ARMv8.3-AVortex[70]2 or 4 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-orderL1: 128 KB / 128 KB, L2: 8 MB, L3: 8 MB2.5 GHz
ARMv8.4-ALightning[71]2 cores. AArch64, 7-decode, ?-issue, 11-wide, superscalar, out-of-orderL1: 128 KB / 128 KB, L2: 8 MB, L3: 16 MB2.66 GHz
X-Gene
(Applied Micro)
ARMv8-AX-Gene64-bit, quad issue, SMP, 64 cores[72]Cache, MMU, virtualization3 GHz (4.2 DMIPS/MHz per core)
Denver
(Nvidia)
ARMv8-ADenver[73][74]2 cores. AArch64, 7-wide superscalar, in-order, dynamic code optimization, 128 MB optimization cache,
Denver1: 28nm, Denver2:16nm
128 KB I-cache / 64 KB D-cacheUp to 2.5 GHz
Carmel
(Nvidia)
ARMv8(t.b.d.)Carmel[75][76]2 cores. AArch64, 10-wide superscalar, in-order, dynamic code optimization, ? MB optimization cache,
functional safety, dual execution, parity & ECC
? KB I-cache / ? KB D-cacheUp to ? GHz
ThunderX
(Cavium)
ARMv8-AThunderX64-bit, with two models with 8–16 or 24–48 cores (×2 w/two chips)?Up to 2.2 GHz
K12
(AMD)
ARMv8-AK12[77]???
Exynos
(Samsung)
ARMv8-AM1/M2 ("Mongoose")[78]4 cores. AArch64, 4-wide, quad-issue, superscalar, out-of-order64 KB I-cache / 32 KB D-cache, L2: 16-way shared 2 MB5.1 DMIPS/MHz

(2.6 GHz)

ARMv8-AM3 ("Meerkat")[79]4 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order64 KB I-cache / 32 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB?
ARMv8.2-AM4 ("Cheetah")2 cores, AArch64, 6-decode, 6-issue, 6-wide. superscalar, out-of-order64 KB I-cache / 32 KB D-cache, L2: 8-way private 512 KB, L3: 16-way shared 4 MB?

本文参考:https://en.wikipedia.org/wiki/List_of_ARM_microarchitectures 


http://www.ppmy.cn/news/139786.html

相关文章

BertGCN的fastNLP实现

目的 本文主要介绍如何实现fastNLP 来复现今年发表在顶会的一篇论文BertGCN: Transductive Text Classification by Combining GCN and BERT。 FastNLP配置 本文采用的fastNLP版本号为0.6.0,可采用一下命令来安装 pip install -b dev https://github.com/fastnl…

迪文屏OS汇编代码开发-参数修改 保存 翻页(七)

; DWIN OS ;程序功能:上翻页,下翻页,参数修改,保存 ;软件环境: DWIN OS ASM Builder V1.5 ;硬件环境:DW K600平台 ;变量 ;用户数据区地址从0x0600 0000开始分配,目前定义的参数区为40个 最大处方数。 ;参…

安卓app+esp8266+51单片机+光敏电阻+lcd1602实现智能照明系统

本文是本人51单片机和物联网的期末课程设计,没学过打板焊接,只用面包板和公母线实现。 安卓和esp8266控灯主要参考Android Studio设计APP实现与51单片机通过WIFI模块(ESP8266-01S)通讯控制LED灯亮灭的设计源码【详解】_手机app通…

码农的自我修养 - ARM处理器天梯图

ARM芯片族 - 架构 - 内核 - 总线速度列表: ARM GROUP ARM architecture ARM core Bus Speed ARM1 ARMv1 ARM1 ARM2 ARMv2 ARM2 4 MIPS 8 MHz 0.33 DMIPS/MHz ARMv2a ARM250 7 MIPS 12 MHz ARM3 ARMv2a ARM3 12 MIPS 25 MHz 0…

ARM各内核系列整型运算能力对比---DMIPS / MHz

DMIPS:Dhrystone Million Instructions executed Per Second (百万条整数运算指令/秒),用于衡量CPU整数计算能力。 超标量处理器: 是指在一颗处理器内核中实现了指令级并行的一类并行运算。在这里就是 DMIPS/MHz 大于…

RISC-V与ARM

RISC-V与ARM RISC-V 架构RISC-V架构特点ARM 架构RISC-V 与 ARM 指令集架构 (ISA) 基本上是汇编级程序员,或编译器编写者可见的机器部分。 ISA 是软件与硬件相遇的地方。 ISA 定义了机器及其微架构本身可以理解的命令/指令,它还定义了如何存储、访问和实施…

STM32 WAVWM8978简介

​ WAV即WAVE文件,是最常用的数字化声音文件格式之一,其扩展名为“.wav”。符合RIFF(Resource Interchange File Format)文件规范,用于保存Windows平台的音频信息资源,被Windows平台及其应用程序所广泛支持。 WAV格式还支持MS ADP…

ARM在汽车电子电器架构的应用

整理自ARM中国FAE高级经理及技术专家丁先生在集微网的演讲,侵删。 该演讲涵盖了汽车电子电器架构的多个方面,整体包含的知识面非常广。整个演讲非常精彩,也是非常佩服丁先生在汽车电子电器架构及ARM在其中的应用的精彩阐述。 本人也是从事汽…