字符串-KMP

news/2024/9/22 18:25:23/

文章目录

  • KMP
    • 原理
    • 模板
    • 例题
      • HDU-1686Oulipo
      • HDU-2087剪花布条
      • POJ-2752Seek the Name, Seek the Fame
      • POJ-2406Power Strings
  • 后记

KMP


KMP是单模式匹配算法,即在一个长度为 n n n的文本串S中查找一个长度 m m m的模式串P。它的复杂度是 O ( n + m ) O(n+m) O(n+m),差不多是此类算法能达到的最优复杂度。
它是如何做到的?简单说,它通过分析P的特征对P进行预处理,从而在与S匹配的时候能够跳过一些字符串,达到快速匹配的目的。

原理


S [ ] = " a b c a b c a b c d " , P [ ] = “ a b c d ” S[]="abcabcabcd",P[]=“abcd” S[]="abcabcabcd",P[]=abcd为例, i i i指向 S [ i ] S[i] S[i] j j j指向 P [ j ] P[j] P[j]
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
图(c)说明KMP算法,指向 S S S i i i指针不会回溯,而是一直往后走到底;
同图(b)的朴素算法相比,大大减少了匹配次数。
在这里插入图片描述

那么KMP是如何做到 i i i不回溯,只回溯 j j j呢? j j j应该怎么回溯?这就是KMP的核心—— n e x t [ ] next[] next[]数组,当匹配失败后,用 n e x t [ ] next[] next[]数组指出 j j j回溯的位置。

next
n e x t [ ] next[] next[]数组是对串P预处理得到的, n e x t next next数组的值是除当前字符外的字符串的前缀与后缀相同的最大长度。

  • 前缀:以 j j j为起点的一段字符串,终点不限(别越界)
  • 后缀:以 j j j为终点的一段字符串,起点不限(别越界)

比如求串“abab”的next数组:初值赋-1;
第3个字符a之前的字符串ab中有长度为0的相同前缀后缀,所以第3个字符a对应的next值为0;
第4个字符b之前的字符串aba中有长度为1的相同前缀后缀a,所以第4个字符b对应的next值为1。

abab
-1001

如果还是很难消化next的原理,就先知道它的作用即可:指出 j j j匹配失败后回溯的位置。
知道原理后,对于代码编程实现求 n e x t [ ] next[] next[]数组其实还是不好理解,虽然代码不长,自惭形秽,免得误人子弟,给出一篇大牛的详解参考,小伙伴们可以刨根究底:从头到尾彻底理解KMP

模板

void getnext(char* p, int lp) {nex[0] = nex[1] = 0;for (int i = 1; i < lp; i++) {int j = nex[i];while (j && p[i] != p[j])j = nex[j];nex[i + 1] = p[i] == p[j] ? j + 1 : 0;}
}
int kmp(char* s, char* p) {	//统计s中有多少个pint ans = 0;int ls = strlen(s), lp = strlen(p);getnext(p, lp);for (int i = 0, j = 0; i < ls; i++) {while (j && s[i] != p[j])j = nex[j];//失配则回溯jif (s[i] == p[j])j++;//匹配则继续if (j >= lp)ans++;//统计}return ans;
}

例题


HDU-1686Oulipo

HDU-1686Oulipo

Problem Description
The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter ‘e’. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T’s is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {‘A’, ‘B’, ‘C’, …, ‘Z’} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {‘A’, ‘B’, ‘C’, …, ‘Z’}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {‘A’, ‘B’, ‘C’, …, ‘Z’}, with |W| ≤ |T| ≤ 1,000,000.
Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
Sample Output
1
3
0

分析:先来个模板题

#include<bits/stdc++.h>
using namespace std;
const int maxn = 10004;
int nex[maxn];
char s[maxn * 100], p[maxn];
void getnext() {int lp = strlen(p);nex[0] = nex[1] = 0;for (int i = 1; i < lp; i++) {int j = nex[i];while (j && p[i] != p[j])j = nex[j];nex[i + 1] = p[i] == p[j] ? j + 1 : 0;}
}
int kmp() {int ans = 0;int ls = strlen(s), lp = strlen(p);getnext();for (int i = 0, j = 0; i < ls; i++) {while (j && s[i] != p[j])j = nex[j];if (s[i] == p[j])j++;if (j >= lp)ans++;}return ans;
}
int main() {int t;scanf("%d", &t);while (t--) {scanf("%s%s", p, s);printf("%d\n", kmp());}return 0;
}

HDU-2087剪花布条

HDU-2087剪花布条

Problem Description
一块花布条,里面有些图案,另有一块直接可用的小饰条,里面也有一些图案。对于给定的花布条和小饰条,计算一下能从花布条中尽可能剪出几块小饰条来呢?
Input
输入中含有一些数据,分别是成对出现的花布条和小饰条,其布条都是用可见ASCII字符表示的,可见的ASCII字符有多少个,布条的花纹也有多少种花样。花纹条和小饰条不会超过1000个字符长。如果遇见#字符,则不再进行工作。
Output
输出能从花纹布中剪出的最多小饰条个数,如果一块都没有,那就老老实实输出0,每个结果之间应换行。
Sample Input
abcde a3
aaaaaa aa
#
Sample Output
0
3

分析:找到能分开的子串数量,套用KMP,统计的时候判断与上一个是否重合即可。

#include<bits/stdc++.h>
using namespace std;
const int maxn = 1003;
int nex[maxn];
char p[maxn], s[maxn];
void getnext() {nex[0] = nex[1] = 0;int lp = strlen(p);for (int i = 1; i < lp; i++) {int j = nex[i];while (j && p[i] != p[j])j = nex[j];nex[i + 1] = p[i] == p[j] ? j + 1 : 0;}
}
int kmp() { int ls = strlen(s), lp = strlen(p);getnext();int last = -1; //指向上一个匹配的末尾int ans = 0;for (int i = 0, j = 0; i < ls; i++) {while (j && s[i] != p[j])j = nex[j];if (s[i] == p[j])j++;if (j >= lp) { //若完全匹配if (i - last >= lp) { //且与上一个不重合ans++;last = i;}}}return ans;
}
int main() {while (~scanf("%s", s)) {if (s[0] == '#')break;scanf("%s", p);printf("%d\n", kmp());}return 0;
}

POJ-2752Seek the Name, Seek the Fame

POJ-2752Seek the Name, Seek the Fame

Description
The little cat is so famous, that many couples tramp over hill and dale to Byteland, and asked the little cat to give names to their newly-born babies. They seek the name, and at the same time seek the fame. In order to escape from such boring job, the innovative little cat works out an easy but fantastic algorithm:
Step1. Connect the father’s name and the mother’s name, to a new string S.
Step2. Find a proper prefix-suffix string of S (which is not only the prefix, but also the suffix of S).
Example: Father=‘ala’, Mother=‘la’, we have S = ‘ala’+‘la’ = ‘alala’. Potential prefix-suffix strings of S are {‘a’, ‘ala’, ‘alala’}. Given the string S, could you help the little cat to write a program to calculate the length of possible prefix-suffix strings of S? (He might thank you by giving your baby a name:)
Input
The input contains a number of test cases. Each test case occupies a single line that contains the string S described above.
Restrictions: Only lowercase letters may appear in the input. 1 <= Length of S <= 400000.
Output
For each test case, output a single line with integer numbers in increasing order, denoting the possible length of the new baby’s name.
Sample Input
ababcababababcabab
aaaaa
Sample Output
2 4 9 18
1 2 3 4 5

分析:问前缀和后缀相同的长度可能是多少?就是考察对 n e x t [ ] next[] next[]的理解,设 l e n len len为字符串长度,首先原字符串肯定是自己前缀和后缀,len是答案,然后 n e x [ l e n ] nex[len] nex[len]就是代表这个字符串的最大前缀后缀相同长度,然后将这个长度为 n e x [ l e n ] nex[len] nex[len]的子串继续分析前缀后缀最大相同长度,递归求解即可。
眼尖选手可以直接打表套样例找规律。

#include<cstdio>
#include<cstring>
#include<vector>
using namespace std;
const int maxn = 400005;
int nex[maxn], ls;
char s[maxn];
void getnext() {nex[0] = nex[1] = 0;for (int i = 1; i < ls; i++) {int j = nex[i];while (j && s[i] != s[j])j = nex[j];nex[i + 1] = s[i] == s[j] ? j + 1 : 0;}
}
int main() {while (~scanf("%s", s)) {ls = strlen(s);getnext();int idx = ls ;vector<int>ans;while (nex[idx]) {ans.push_back(nex[idx] );idx = nex[idx];}int len = ans.size();for (int i = len - 1; i >= 0; i--)printf("%d ", ans[i]);printf("%d\n", ls);}return 0;
}

POJ-2406Power Strings

POJ-2406Power Strings

Description
Given two strings a and b we define ab to be their concatenation. For example, if a = “abc” and b = “def” then ab = “abcdef”. If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = “” (the empty string) and a^(n+1) = a*(a^n).
Input
Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case.
Output
For each s you should print the largest n such that s = a^n for some string a.
Sample Input
abcd
aaaa
ababab
.
Sample Output
1
4
3
Hint
This problem has huge input, use scanf instead of cin to avoid time limit exceed.

分析:问字符串是由多少个子串循环组成 ?还是对 n e x t [ ] next[] next[]的考察, n e x t [ ] next[] next[]是前缀和后缀最大相同长度,那么 n − n e x t [ n ] n-next[n] nnext[n]就是可能的循环节长度,再判断下是否为0及是否整除,否则循环节长度为1。举图说明,也可以打表观察。
居然不给n范围,我测出来是1e6
在这里插入图片描述

#include<cstdio>
#include<cstring>
using namespace std;
const int maxn = 1000006;
int nex[maxn], ls;
char s[maxn];
void getnext() {nex[0] = nex[1] = 0;for (int i = 1; i < ls; i++) {int j = nex[i];while (j && s[i] != s[j])j = nex[j];nex[i + 1] = s[i] == s[j] ? j + 1 : 0;}
}
int main() {while (~scanf("%s", s)) {if (s[0] == '.')break;ls = strlen(s);getnext();int len = ls - nex[ls];if (len && ls % len == 0)printf("%d\n", ls / len);else puts("1");}return 0;
}

后记


小结一下,KMP是单模式匹配算法,复杂度是 O ( n + m ) O(n+m) O(n+m),多模匹配算法用AC自动机。其中需要重点理解的是 n e x t [ ] next[] next[],很多衍生题都是从 n e x [ ] t nex[]t nex[]t切入的,如果没思路可以试试打表套样例找规律,多观察 n e x t [ ] next[] next[]

原创不易,请勿转载本不富裕的访问量雪上加霜
博主首页:https://blog.csdn.net/qq_45034708
如果文章对你有帮助,记得一键三连❤


http://www.ppmy.cn/news/735265.html

相关文章

算法 | 03 字符串(KMP)

1.字符串 string 的定义string 的初始化string 的长度string 的元素的访问 数组迭代器 元素的操作 insert()erase()clear() 运算符 连接 比较运算符判断是否相等 常用函数 find()substr() /*** author: Qiuyue Zhang* date: 2021/2/10 14:14* description: string的基础*/ #…

一些KMP的题目

普通的KMP代码&#xff1a; #include<iostream> #include<vector> #include<cstring> #include<algorithm> using namespace std; int nexts[10000],k; void get_nexts(string t) {nexts[0] -1;int i 0, j -1;int len t.size();while(i < len)…

kmp算法模板题

剪花布条 一块花布条&#xff0c;里面有些图案&#xff0c;另有一块直接可用的小饰条&#xff0c;里面也有一些图案。对于给定的花布条和小饰条&#xff0c;计算一下能从花布条中尽可能剪出几块小饰条来呢&#xff1f; Input 输入中含有一些数据&#xff0c;分别是成对出现的花…

摄像机标定和图像径向畸变校正

这段时间断断续续在弄这个摄像头的标定和图像畸形矫正的问题&#xff0c;基本差不多了&#xff0c;乘休息时间&#xff0c;发点成果和大家分享吧&#xff01; 一、标定 关于摄像头的标定&#xff0c;说实话网上的资料太多了&#xff0c;方法也很多的。个人觉得想要研究这个的话…

ESP8266类库的使用——总体概述

在文章《ArduinoESP8266连接WiFi》与文章《ESP8266联网测试》中&#xff0c;我们通过查询ESP8266的AT指令&#xff0c;编写相应的函数实现ESP8266连接WiFi的功能。但是连接WiFi只是芯片的功能之一&#xff0c;为了实现物联网的目的&#xff0c;我们还有更远的路要走。 君子善假…

未检测到扫描仪Win10解决 WIA服务1061

之前正常使用的扫描仪&#xff0c;突然不能用了&#xff0c;出现未检测到扫描仪错误 Windows画图板的文件菜单里从扫描仪或相机获取而已是灰色的。 网上搜索解决方案&#xff0c;一堆垃圾文章&#xff0c;毫无帮助。在设备管理器里查看图像设备是在的&#xff0c;确定驱动没有…

keil4与proteus的联调

联调 顾名思义即是联合调试的意思&#xff0c;是指keil4与proteus联合调试&#xff0c;在进行联调是需要做好以下配置的。 1.第一步是将.DLL文复制到安装keil路径的BIN目录&#xff0c;所以一定要记住自己把keil装在哪儿了,博文下面有需要的.DLL文件。 2.万一忘记了自己装在哪…

ActiveX控件v7.2.0.1,Viscom Scanner ActiveX控件

ActiveX控件能够使用带有进纸器的扫描仪扫描多页&#xff0c;在最后一页扫描时自动保存为多页PDF或TIFF。 ActiveX控件有能力检测卡纸事件。q2315702359 ActiveX控件从所有TWAIN兼容扫描仪和网络摄像头设备捕获图像。 支持将扫描的图像保存到 Microsoft Word(docx)。 ActiveX控…