Java进阶—哈希冲突的解决

1. 什么是哈希冲突

哈希函数：哈希函数是一种将输入数据(键)映射到固定大小范围的输出值(哈希值)的函数。哈希函数通常用于存储数据存储和检索领域，例如哈希表中。
哈希表：哈希表(Hash Table)，也成为哈希映射(Hash Map)或字典（Dictionary），是一种常见的数据结构，用于实现关联数组，它可以将键映（Key）射到值(Value)
哈希冲突：由于哈希算法被计算的数据是无限的，而计算后的结果范围有限，当两个或更多键被哈希函数映射到相同的索引位置时，就会发生哈希冲突。

在Java中，哈希函数通常是通过hashCode()方法实现的，hashCode()方法是Object类中定义的一个方法，因此所有Java类都可以使用它。hashCode()方法的默认实现是返回对象的内存地址的哈希码，但通常情况下，我们需要重写这个方法来提供更合适的哈希函数。

在重写hashCode()方法时，通常需要遵循以下法则：

相等的对象必须具有相等哈希码。也就是说，如果两个对象通过‘equals()’方法相等，那么它们的哈希码应该相等。

如果 a.equals(b)==true,它们的hashCode一定相同吗？
哈希表中，首先会比较对象的‘hashCode’值确定它们所在的桶(bucket),然后再根据‘equals()’方法来精确匹配对象。
所以在Java中a.equals(b)==true，它们的hashCode一定是相同的。除非重写了equals方法或hashCode方法()
因此，为了保持一致性，在重写equals()方法时通常也需要重写hashCode()方法，以确保相等的对象具有相同的哈希码。

尽量减少哈希冲突，即不同的对象尽量产生不同的哈希码，以提高哈希表等数据结构的性能。
例如，对于一个简单的Person类，我们可以重写hashCode()方法如下：
```
public class Person {private String name;private int age;// 构造函数、getter和setter等方法省略@Overridepublic int hashCode() {int result = 17;result = 31 * result + name.hashCode();result = 31 * result + age;return result;}
}
```
在这个例子中，我们使用了一种常见的计算哈希码的方式，即将一个基本数值（这里是17）与对象的属性哈希码相结合，然后用一个质数（31）作为乘数不断累加。这种方式可以比较有效地减少哈希冲突，但并不是万能的，具体的哈希函数设计取决于对象的属性特点和应用场景。

2. 怎么解决哈希冲突

为了解决哈希冲突，常见的方法包括开放寻址法和链表法（拉链法）。

2.1 开放寻址法

在发生哈希冲突时，通过一定的方法(如线性探测、二次探测、双重哈希等)寻找下一个空的哈希表位置，键冲突的键值对放置在新的位置上。这样可以避免使用额外的数据结构存储冲突的键值对，节省空间。但是，如果哈希表填充率过高，开放寻址法的性能可能会收到影响。

线性探测(Linear Probing)：在发生哈希冲突时，线性探测会顺序地检查哈希表中的下一个位置，直到找到一个空闲的位置为止。具体地说，如果哈希表中位置‘i’发生了冲突，那么线性探索位置‘i+1’,‘i+2’，直到找到一个空闲位置或遍历完整个哈希表。正常不会发生找不到位置，通常情况下，但哈希表中元素达到一定阈值，会自动触发扩容操作。

当哈希表使用线性探测解决哈希冲突时，插入和查找元素的过程如下：

插入元素时，如果计算出的哈希位置已经被占用，则向后顺序查找直到找到一个空闲位置。

查找元素时，计算出元素的哈希位置后，如果该位置为空或者元素的键与要查找的键不相等，则向后顺序查找直到要查找的元素或者遍历完整个哈希表
以下是使用线性探测法解决哈希冲突的简单示例代码：

class LinearProbingHashTable {private static final int DEFAULT_CAPACITY = 10;private Entry[] table;private int size;public LinearProbingHashTable() {table = new Entry[DEFAULT_CAPACITY];size = 0;}public void put(int key, String value) {int index = hash(key);while (table[index] != null && table[index].getKey() != key) {index = (index + 1) % table.length; // 线性探测下一个位置}if (table[index] == null) {table[index] = new Entry(key, value);size++;} else {table[index].setValue(value);}}public String get(int key) {int index = hash(key);while (table[index] != null && table[index].getKey() != key) {index = (index + 1) % table.length; // 线性探测下一个位置}return table[index] != null ? table[index].getValue() : null;}private int hash(int key) {return key % table.length;}private static class Entry {private int key;private String value;public Entry(int key, String value) {this.key = key;this.value = value;}public int getKey() {return key;}public String getValue() {return value;}public void setValue(String value) {this.value = value;}}
}public class Main {public static void main(String[] args) {LinearProbingHashTable hashTable = new LinearProbingHashTable();hashTable.put(1, "A");hashTable.put(11, "B");hashTable.put(21, "C");System.out.println(hashTable.get(1));  // 输出 "A"System.out.println(hashTable.get(11)); // 输出 "B"System.out.println(hashTable.get(21)); // 输出 "C"}
}

二次探测(Quadratic Probing)：二次探测是线性探测地改进版本，它使用一个二次方程来计算下一个探测位置。具体来说，如果哈希表中位置‘i’发生了冲突，那么二次探测法会一次检查位置 (i + 1^2)、(i + 2^2)、(i + 3^2)，直到找到一个空闲位置或者遍历完整个哈希表。

以下是使用二次探测解决哈希冲突的示例代码：

class QuadraticProbingHashTable {private static final int DEFAULT_CAPACITY = 10;private Entry[] table;private int size;public QuadraticProbingHashTable() {table = new Entry[DEFAULT_CAPACITY];size = 0;}public void put(int key, String value) {int index = hash(key);int i = 1;while (table[index] != null && table[index].getKey() != key) {index = (index + i * i) % table.length; // 二次探测下一个位置i++;}if (table[index] == null) {table[index] = new Entry(key, value);size++;} else {table[index].setValue(value);}}public String get(int key) {int index = hash(key);int i = 1;while (table[index] != null && table[index].getKey() != key) {index = (index + i * i) % table.length; // 二次探测下一个位置i++;}return table[index] != null ? table[index].getValue() : null;}private int hash(int key) {return key % table.length;}private static class Entry {private int key;private String value;public Entry(int key, String value) {this.key = key;this.value = value;}public int getKey() {return key;}public String getValue() {return value;}public void setValue(String value) {this.value = value;}}
}public class Main {public static void main(String[] args) {QuadraticProbingHashTable hashTable = new QuadraticProbingHashTable();hashTable.put(1, "A");hashTable.put(11, "B");hashTable.put(21, "C");System.out.println(hashTable.get(1));  // 输出 "A"System.out.println(hashTable.get(11)); // 输出 "B"System.out.println(hashTable.get(21)); // 输出 "C"}
}

双重哈希(Double Hashing)：双重哈希使用两个哈希函数来计算下一个探测位置。具体地说，如果哈希表中位置‘i’发生了冲突，双重哈希会使用第二个哈希函数计算一个偏移量来计算下一个探测位置。如果这个位置仍然发生冲突，就会使用相同的方法，直到找到一个空间位置或者遍历完整个哈希表。
以下是使用双重哈希解决哈希冲突的示例代码：

class DoubleHashingHashTable {private static final int DEFAULT_CAPACITY = 10;private Entry[] table;private int size;public DoubleHashingHashTable() {table = new Entry[DEFAULT_CAPACITY];size = 0;}public void put(int key, String value) {int index = hash1(key);int offset = hash2(key);while (table[index] != null && table[index].getKey() != key) {index = (index + offset) % table.length; // 双重哈希探测下一个位置}if (table[index] == null) {table[index] = new Entry(key, value);size++;} else {table[index].setValue(value);}}public String get(int key) {int index = hash1(key);int offset = hash2(key);while (table[index] != null && table[index].getKey() != key) {index = (index + offset) % table.length; // 双重哈希探测下一个位置}return table[index] != null ? table[index].getValue() : null;}private int hash1(int key) {return key % table.length;}private int hash2(int key) {return 7 - (key % 7); // 可以选择不同的哈希函数}private static class Entry {private int key;private String value;public Entry(int key, String value) {this.key = key;this.value = value;}public int getKey() {return key;}public String getValue() {return value;}public void setValue(String value) {this.value = value;}}
}public class Main {public static void main(String[] args) {DoubleHashingHashTable hashTable = new DoubleHashingHashTable();hashTable.put(1, "A");hashTable.put(11, "B");hashTable.put(21, "C");System.out.println(hashTable.get(1));  // 输出 "A"System.out.println(hashTable.get(11)); // 输出 "B"System.out.println(hashTable.get(21)); // 输出 "C"}
}

2. 2 链表法(拉链法)

链表法(Separte Chaining)是一种常见到的解决哈希冲突的方法，它使用链表来存储哈希表中发生冲突的键值对。具体来说，每个哈希表的槽位上都对应一个链表，当发生哈希冲突时，新的键值对会被插入到对应槽位上的链表中，而不是直接覆盖原有的键值对。
链表法适用于经常进行插入和删除的情况。
如下一组数字,(32、40、36、53、16、46、71、27、42、24、49、64)哈希表长度为13，哈希函数为H(key)=key%13,则链表法结果如下：

0       
1  -> 40 -> 27 -> 53 
2
3  -> 16 -> 42
4
5
6  -> 32 -> 71
7  -> 46
8
9
10 -> 36 -> 49
11 -> 24
12 -> 64

在java中，链接地址法也是HashMap解决哈希冲突的方法之一，jdk1.7完全采用单链表来存储同义词，jdk1.8则采用了一种混合模式，对于链表长度大于8的，会转换为红黑树存储。
以下是使用链表法解决哈希冲突的示例代码：

import java.util.*;class SeparateChainingHashTable {private static final int DEFAULT_CAPACITY = 10;private List<Entry>[] table;private int size;public SeparateChainingHashTable() {table = new LinkedList[DEFAULT_CAPACITY];size = 0;}public void put(int key, String value) {int index = hash(key);if (table[index] == null) {table[index] = new LinkedList<>();}for (Entry entry : table[index]) {if (entry.getKey() == key) {entry.setValue(value);return;}}table[index].add(new Entry(key, value));size++;}public String get(int key) {int index = hash(key);if (table[index] != null) {for (Entry entry : table[index]) {if (entry.getKey() == key) {return entry.getValue();}}}return null;}private int hash(int key) {return key % table.length;}private static class Entry {private int key;private String value;public Entry(int key, String value) {this.key = key;this.value = value;}public int getKey() {return key;}public String getValue() {return value;}public void setValue(String value) {this.value = value;}}
}public class Main {public static void main(String[] args) {SeparateChainingHashTable hashTable = new SeparateChainingHashTable();hashTable.put(1, "A");hashTable.put(11, "B");hashTable.put(21, "C");System.out.println(hashTable.get(1));  // 输出 "A"System.out.println(hashTable.get(11)); // 输出 "B"System.out.println(hashTable.get(21)); // 输出 "C"}
}

3 建议和注意事项

选择合适的哈希函数：哈希函数的选择对于减少哈希冲突非常重要。一个好的哈希函数应该能够将键均匀地分布到哈希表中的槽位上，减少碰撞的概率。
考虑装载因子：装载因子是指哈希表中已存储键值对的数量与哈希表总容量的比值。当装载因子过高时，哈希冲突的概率会增加，影响性能。因此，定期调整哈希表的大小，以保持适当的装载因子是很重要的。
选择合适的解决冲突方法：根据应用场景和性能需求，选择合适的解决哈希冲突的方法。例如，开放寻址法适用于空间紧张的情况，而链表法适用于处理大量哈希冲突的情况和频繁插入删除的操作。
考虑并发情况：在并发环境下，需要考虑多线程同时访问哈希表可能引发的问题。可以使用线程安全的哈希表实现或者在访问哈希表时进行适当的同步操作来处理并发访问问题。