c/c++蓝桥杯经典编程题100道（15）字符串匹配

字符串匹配

->返回c/c++蓝桥杯经典编程题100道-目录

字符串匹配

一、题型解释

二、例题问题描述

三、C语言实现

解法1：暴力匹配（难度★）

解法2：KMP算法（难度★★★）

解法3：Boyer-Moore算法（难度★★★★）

四、C++实现

解法1：STL的find方法（难度★）

解法2：正则表达式（难度★★☆）

五、总结对比表

六、特殊方法与内置函数补充

1. C语言 strstr 函数

2. C++ std::regex 库

3. 多模式匹配算法

一、题型解释

字符串匹配是在一个 主字符串（文本） 中查找 子字符串（模式） 出现位置的问题。常见题型：

基础匹配：判断子串是否存在于主串中，返回首个匹配位置。
多模式匹配：同时查找多个子串的出现位置。
通配符匹配：支持 ?（匹配任意单字符）或 *（匹配任意字符序列）的特殊匹配。
正则表达式匹配：支持正则语法（如 . 匹配任意字符，* 匹配前导字符零次或多次）。

二、例题问题描述

例题1：主串 "ABABDABACDABABCABAB"，子串 "ABABC"，输出匹配位置 10。
例题2：主串 "hello world"，子串 "world"，输出匹配位置 6。
例题3：主串 "abcababcabc"，子串 "abc"，输出所有匹配位置 [0, 4, 7]。
例题4：主串 "aab"，模式 "c*a*b"（正则表达式），输出 true（匹配成功）。

三、C语言实现

解法1：暴力匹配（难度★）

通俗解释：

像逐页翻书一样，逐个字符比较主串和子串，发现不匹配时回退主串指针。

#include <stdio.h>
#include <string.h>int bruteForce(const char *text, const char *pattern) {int textLen = strlen(text);int patternLen = strlen(pattern);// 遍历主串所有可能的起始位置for (int i = 0; i <= textLen - patternLen; i++) { int j;// 逐个字符比较子串for (j = 0; j < patternLen; j++) { if (text[i + j] != pattern[j]) break; // 不匹配时跳出}if (j == patternLen) return i; // 完全匹配时返回起始位置}return -1; // 未找到
}int main() {const char *text = "ABABDABACDABABCABAB";const char *pattern = "ABABC";printf("匹配位置：%d\n", bruteForce(text, pattern)); // 输出 10return 0;
}

代码逻辑：

外层循环：遍历主串的每个可能的起始位置，范围是 0 到 textLen - patternLen（确保子串不越界）。
内层循环：从当前位置 i 开始，逐个字符比较主串和子串。
匹配判断：若内层循环完整遍历子串（j == patternLen），说明找到匹配，返回起始位置 i。
时间复杂度：O(mn)，其中 m 是主串长度，n 是子串长度。

解法2：KMP算法（难度★★★）

通俗解释：

利用已匹配的信息跳过不必要的比较，像查字典时快速翻页。

#include <stdio.h>
#include <string.h>// 构建next数组，记录最长公共前后缀长度
void buildNext(const char *pattern, int *next) {int len = strlen(pattern);next[0] = -1; // 初始值，表示无前缀匹配int i = 0, j = -1;while (i < len - 1) {if (j == -1 || pattern[i] == pattern[j]) {i++;j++;next[i] = j; // 记录当前位置的最长公共前后缀长度} else {j = next[j]; // 回退到前一个匹配位置}}
}int kmpSearch(const char *text, const char *pattern) {int textLen = strlen(text);int patternLen = strlen(pattern);int next[patternLen];buildNext(pattern, next); // 预处理生成next数组int i = 0, j = 0; // i为主串指针，j为子串指针while (i < textLen && j < patternLen) {if (j == -1 || text[i] == pattern[j]) { // 字符匹配时，指针后移i++;j++;} else { // 不匹配时，子串指针跳转到next[j]位置j = next[j];}}return (j == patternLen) ? i - j : -1; // 返回匹配起始位置
}int main() {const char *text = "ABABDABACDABABCABAB";const char *pattern = "ABABC";printf("匹配位置：%d\n", kmpSearch(text, pattern)); // 输出 10return 0;
}

代码逻辑：

next数组：存储子串每个位置的最长公共前后缀长度，用于跳过不必要的比较。
- 例如，子串 "ABABC" 的 next 数组为 [-1, 0, 0, 1, 2]。
匹配过程：
- 主串指针 i 不回溯，子串指针 j 根据 next 数组跳转。
- 当 j == -1 时，表示子串需从头开始匹配。
时间复杂度：O(m + n)，预处理 O(n)，匹配 O(m)。

解法3：Boyer-Moore算法（难度★★★★）

通俗解释：

从右向左比较字符，利用坏字符规则跳过大量不匹配区域。

#include <stdio.h>
#include <string.h>
#define CHAR_SET_SIZE 256 // 假设字符集为ASCII// 构建坏字符表，记录每个字符在子串中最后出现的位置
void buildBadCharTable(const char *pattern, int *badChar) {int len = strlen(pattern);for (int i = 0; i < CHAR_SET_SIZE; i++) {badChar[i] = -1; // 初始化所有字符位置为-1（未出现）}for (int i = 0; i < len; i++) {badChar[(int)pattern[i]] = i; // 更新字符最后出现的位置}
}int boyerMoore(const char *text, const char *pattern) {int textLen = strlen(text);int patternLen = strlen(pattern);int badChar[CHAR_SET_SIZE];buildBadCharTable(pattern, badChar);int shift = 0; // 主串的滑动偏移量while (shift <= textLen - patternLen) {int j = patternLen - 1; // 从子串末尾开始比较while (j >= 0 && pattern[j] == text[shift + j]) j--;if (j < 0) { // 完全匹配return shift;} else { // 计算滑动距离：坏字符在子串中的位置 - 坏字符表中的位置int slide = j - badChar[(int)text[shift + j]];shift += (slide > 0) ? slide : 1; // 至少滑动1位}}return -1;
}int main() {const char *text = "ABABDABACDABABCABAB";const char *pattern = "ABABC";printf("匹配位置：%d\n", boyerMoore(text, pattern)); // 输出 10return 0;
}

代码逻辑：

坏字符表：记录子串中每个字符最后出现的位置。
- 例如，子串 "ABABC" 的坏字符表中 'A' 的位置为 2，'B' 为 3。
滑动规则：
- 当字符不匹配时，根据坏字符在主串中的位置计算滑动距离，跳过无效比较。
时间复杂度：最坏 O(mn)，但实际效率通常优于暴力法。

四、C++实现

解法1：STL的find方法（难度★）

通俗解释：

直接使用 string::find 函数，无需手动实现算法。

cpp

#include <iostream>
#include <string>
using namespace std;int main() {string text = "ABABDABACDABABCABAB";string pattern = "ABABC";size_t pos = text.find(pattern); // 查找子串位置if (pos != string::npos) { // npos表示未找到cout << "匹配位置：" << pos << endl; // 输出 10} else {cout << "未找到" << endl;}return 0;
}

代码逻辑：

string::find：返回子串在主串中首次出现的位置，若未找到返回 string::npos。
优点：代码极简，适合快速开发。
底层实现：通常为暴力匹配或优化算法（依赖编译器和标准库实现）。

解法2：正则表达式（难度★★☆）

通俗解释：

使用C++11的正则表达式库支持复杂模式匹配。

cpp

#include <iostream>
#include <regex>
using namespace std;int main() {string text = "aab";regex pattern("c*a*b"); // 正则表达式：c出现0次或多次，a出现1次，b出现1次if (regex_match(text, pattern)) { // 全字符串匹配cout << "匹配成功" << endl; // 输出 "匹配成功"} else {cout << "匹配失败" << endl;}return 0;
}

代码逻辑：

regex_match：检查整个字符串是否完全匹配正则表达式。
正则语法：
- . 匹配任意字符。
- * 匹配前导字符零次或多次。
- + 匹配前导字符一次或多次。
应用场景：适合复杂模式匹配（如邮箱、URL验证）。

五、总结对比表

方法	时间复杂度	空间复杂度	优点	缺点
暴力匹配	O(mn)	O(1)	简单直观	效率低
KMP算法	O(m + n)	O(n)	高效，适合重复子串	理解难度大
Boyer-Moore	O(mn)（实际高效）	O(n)	实际应用中最快	实现复杂
STL find	O(mn)	O(1)	代码极简	不支持复杂模式
正则表达式	取决于实现	O(1)	支持复杂模式	性能较低，语法复杂

六、特殊方法与内置函数补充

1. C语言 `strstr` 函数

作用：标准库函数，用于查找子串位置。
示例：char *pos = strstr(text, pattern);
底层实现：通常为暴力匹配，效率较低。

2. C++ `std::regex` 库

功能：支持正则表达式匹配、搜索和替换。
常用函数：
- regex_match：全字符串匹配。
- regex_search：查找子串匹配。
- regex_replace：替换匹配内容。

3. 多模式匹配算法

扩展思路：使用 Trie树 或 AC自动机 高效匹配多个子串（适合敏感词过滤）。

->返回c/c++蓝桥杯经典编程题100道-目录