基础数据结构——数组（动态数组，二维数组，缓存与局部性原理）

1.概述

在计算机科学中，数组是由一组元素（值或变量）组成的数据结构，每个元素有至少一个索引或键来标识

因为数组内的元素是连续存储的，所以数组中元素的地址，可以通过其索引计算出来，例如：

int[] array = {1,2,3,4,5}

知道了数组的数据起始地址 $B a se A dd ress$ ，就可以由公式 $B a se A dd ress + i * s i ze$ 计算出索引 $i$ 元素的地址

$i$ 即索引，在 Java、C 等语言都是从 0 开始
$s i ze$ 是每个元素占用字节，例如 $in t$ 占 $4$ ， $d o u b l e$ 占 $8$

空间占用

Java 中数组结构为

8 字节 markword（记录了这个对象的 HashCode，分代年龄，锁信息等等）
4 字节 class 指针（压缩 class 指针的情况）
4 字节数组大小（决定了数组最大容量是 $2^{32}$ ）
数组元素 + 对齐字节（java 中所有对象大小都是 8 字节的整数倍[^12]，不足的要用对齐字节补足）

随机访问性能

即根据索引查找元素，时间复杂度是 $O (1)$

2.动态数组

java">package com.lemon.demo.array;import java.util.Arrays;
import java.util.Iterator;
import java.util.function.Consumer;
import java.util.stream.IntStream;/*** @author 李猛* @datetime 2024/10/17 17:41* @description 动态数组*/
public class DynamicArray implements Iterable<Integer> {private int capacity = 8;//容量，初始容量8private int size = 0;//有效的元素个数，初始个数0private int[] array = new int[capacity];//数组信息/*** 向数组末尾添加元素** @param element*/public void addLast(int element) {add(size, element);}/*** 根据索引位置添加元素** @param index* @param element*/public void add(int index, int element) {if (index < 0 || index > size) {throw new IndexOutOfBoundsException("index:" + index + " error");}//扩容checkAddExpand();System.arraycopy(array, index, array, index + 1, size - index);array[index] = element;size++;}/*** 根据索引获取元素** @param index* @return*/public int get(int index) {if (index < 0 || index >= size) {throw new IndexOutOfBoundsException("index:" + index + " error");}return array[index];}public int remove(int index) {if (index < 0 || index >= size) {throw new IndexOutOfBoundsException("index:" + index + " error");}int element = array[index];if (index == size - 1) {//如果是删除最后一个元素/*** 数组copy* 1.原数组* 2.原数组起始位置* 3.目标数组* 4.目标数组起始位置* 5.要复制的数组元素的数量*/System.arraycopy(array, index + 1, array, index, size - index - 1);}size--;return element;}/*** 扩容*/private void checkAddExpand() {if (capacity == size) {/*** 扩容1.5倍*///capacity = capacity + capacity >> 1;capacity += capacity >> 1;int[] newArray = new int[capacity];System.arraycopy(array, 0, newArray, 0, size);array = newArray;}}/*** 循环遍历** @param consumer*/public void foreach(Consumer<Integer> consumer) {for (int i = 0; i < size; i++) {consumer.accept(array[i]);}}/*** 流遍历** @return*/public IntStream stream() {int[] range = Arrays.copyOfRange(array, 0, size - 1);return IntStream.of(range);}/*** 迭代器** @return*/@Overridepublic Iterator<Integer> iterator() {return new Iterator<>() {int pointer = 0;@Overridepublic boolean hasNext() {return pointer < size;}@Overridepublic Integer next() {return array[pointer++];}};}
}

3.二维数组

java">int[][] arr = {{11, 12, 13, 14, 15},{21, 22, 23, 24, 25},{31, 32, 33, 34, 35},
};

在这里插入图片描述

二维数组占 32 个字节，其中 array[0]，array[1]，array[2] 三个元素分别保存了指向三个一维数组的引用
三个一维数组各占 40 个字节
它们在内层布局上是连续的

更一般的，对一个二维数组 $A rr a y [m] [n]$

$m$ 是外层数组的长度，可以看作 row 行
$n$ 是内层数组的长度，可以看作 column 列
当访问 $A rr a y [i] [j]$ ， $0\leq i \lt m, 0\leq j \lt n$ 时，就相当于
- 先找到第 $i$ 个内层数组（行）
- 再找到此内层数组中第 $j$ 个元素（列）

4.缓存与局部性原理

这里只讨论空间局部性

cpu 读取内存（速度慢）数据后，会将其放入高速缓存（速度快）当中，如果后来的计算再用到此数据，在缓存中能读到的话，就不必读内存了
缓存的最小存储单位是缓存行（cache line），一般是 64 bytes，一次读的数据少了不划算啊，因此最少读 64 bytes 填满一个缓存行，因此读入某个数据时也会读取其临近的数据，这就是所谓空间局部性

定义两个求和方法

java">public static void sum1(int[][] arr, int rows, int columns) {long sum = 0;for (int i = 0; i < rows; i++) {for (int j = 0; j < columns; j++) {sum += arr[i][j];}}System.out.println("sum1:" + sum);
}

java">public static void sum2(int[][] arr, int rows, int columns) {long sum = 0;for (int j = 0; j < columns; j++) {for (int i = 0; i < rows; i++) {sum += arr[i][j];}}System.out.println("sum2:" + sum);
}

比较下面 $s u m 1$ 和 $s u m 2$ 两个方法的执行效率

java">int rows = 1000000;
int columns = 14;
int[][] a = new int[rows][columns];StopWatch sw = new StopWatch();sw.start("sum1");
sum1(a, rows, columns);
sw.stop();sw.start("sum2");
sum2(a, rows, columns);
sw.stop();System.out.println(sw.prettyPrint());