python之正则表达式总结

正则表达式

对于正则表达式的学习，我整理了网上的一些资料，希望可以帮助到各位！！！

我们可以使用正则表达式来定义字符串的匹配模式，即如何检查一个字符串是否有跟某种模式匹配的部分或者从一个字符串中将与模式匹配的部分提取出来或者替换掉。

概述

正则表达式[Regular Expression]，简写为regex,RE,使用单个字符串来描述一系列具有特殊
格式的字符串。

功能：
        a.搜索
        b.替换
        c.匹配
使用情景：
        爬虫        验证手机号，验证邮箱，密码【用户名】

python">import re# re.match()
# 匹配字符串是否以指定的正则内容开头，匹配成功返回对象，匹配失败返回None
# 第一个参数：正则表达式
# 第二个参数：要验证的字符串
# 第三个参数：可选参数，正则表达式修饰符# \d:0 - 9
# +:表示出现1次或者多次
print(re.match(r"\d+", "12345esd"))
# <re.Match object; span=(0, 5), match='12345'>
print(re.match(r"\d+", "as12345esd"))
# None# re.search()
# 匹配字符串中是否包含指定的正则内容，匹配成功返回对象，匹配失败返回 None
# 第一个参数：正则表达式
# 第二个参数：要验证的字符串
# 第三个参数：可选参数，正则表达式修饰符search_result_1 = re.search(r"\d+", "12345esd")
if search_result_1:print("re.search() - 匹配成功:", search_result_1.group())
else:print("re.search() - 匹配失败")search_result_2 = re.search(r"\d+", "as12345esd")
if search_result_2:print("re.search() - 匹配成功:", search_result_2.group())
else:print("re.search() - 匹配失败")# 3. re.findall()
# 获取所有匹配的内容，会得到一个列表
# 第一个参数：正则表达式
# 第二个参数：要验证的字符串
findall_result = re.findall(r"\d+", "12abc34def56")
print("re.findall()结果:", findall_result)

表达式含义示例说明
. 匹配除换行符以外的任意字符 -
[0123456789] 是字符集合，表示匹配方括号中所包含的任意一个字符匹配“123abc”中的1、2、3
[good] 匹配good中任意一个字符匹配“good”中的g、o、o、d其中一个
[a-z] 匹配任意小写字母匹配“abc”中的a、b、c
[A-Z] 匹配任意大写字母匹配“ABC”中的A、B、C
[0-9] 匹配任意数字，类似[0123456789] 匹配“123abc”中的1、2、3
[0-9a-zA-Z] 匹配任意的数字和字母匹配“123abcABC”中的任何字符
[0-9a-zA-Z_] 匹配任意的数字、字母和下划线匹配“123abc_ABC”中的任何字符
[^good] 匹配除了g、o、o、d这几个字符以外的所有字符，中括号里的^称为脱字符，表示不匹配集合中的字符匹配“hello”中的h、e、l、l
[^0-9] 匹配所有的非数字字符匹配“abc”中的a、b、c
\d 匹配数字，效果同[0-9] 匹配“123abc”中的1、2、3
\D 匹配非数字字符，效果同[^\d] 匹配“abc”中的a、b、c
\w 匹配数字、字母和下划线，效果同[0-9a-zA-Z_] 匹配“123abc_ABC”中的任何字符
\W 匹配非数字、字母和下划线，效果同[^\w] 匹配“!@#”中的!、@、#
\s 匹配任意的空白符(空格，回车，换行，制表，换页)，效果同[ \n\t\f\r] 匹配文本中的空格、回车等空白部分

python">import re# [ ]：只匹配其中的一位
# - ：表示一个区间
print(re.search("he[0-9]11o", "he911o"))
# <re.Match object; span=(0, 6), match='he911o'>1print(re.search(r"go[zxc]od", "goxod"))
# <re.Match object; span=(0, 5), match='goxod'>print(re.search("he[a-z]llo", "hepllo"))
# <re.Match object; span=(0, 6), match='hepllo'>print(re.search("hello[0-9a-zA-Z_]", "hello9"))
# <re.Match object; span=(0, 6), match='hell09'>print(re.search(r"hello\d", "hello2"))
# <re.Match object; span=(0, 6), match='hello2'>print(re.search(r"hello\D", "hellowklo_"))
# <re.Match object; span=(0, 6), match='hellow'>print(re.search(r"hello\w", "hello1"))
# <re.Match object; span=(0, 6), match='hello1'>print(re.search(r"hello\W", "hello!"))
# <re.Match object; span=(0, 6), match='hello!'print(re.search(r"mone\sy", "mone\ny"))
# <re.Match object; span=(0, 6), match='mone\ny'>print(re.search(r"money[^0-9]", "money!"))
# <re.Match object; span=(0, 6), match='money!'

模式修饰符

模式修饰符：修饰我们写的正则表达式

.：表示匹配除了换行以外的任意单个字符 \n表示换行
re.S：可以通过，匹配到n（换行）
re.I：忽略字母大小写

python">import re# 匹配 shenzhen 加一个除换行外的任意字符，这里能匹配 shenzhen9
result1 = re.search("shenzhen.", "shenzhen9")
print(result1)  # <re.Match object; span=(0, 9), match='shenzhen9'># 匹配 shenzhen 加一个除换行外的任意字符，这里不能匹配 shenzhen\n
result2 = re.search("shenzhen.", "shenzhen\n")
print(result2)  # None# 匹配 shenzhen 加一个任意字符（因为使用了re.S，可匹配换行），这里能匹配 shenzhen\n
result3 = re.search("shenzhen.", "shenzhen\n", re.S)
print(result3)  # <re.Match object; span=(0, 9), match='shenzhen\n'># 匹配 shenzhen 加一个小写字母，这里不能匹配 shenzhenS 中的大写 S
result4 = re.search("shenzhen[a-z]", "shenzhenS")
print(result4)  # None# 匹配 shenzhen 加一个字母（不区分大小写），这里能匹配 shenzhenS
result5 = re.search("shenzhen[a-z]", "shenzhenS", re.I)
print(result5)  # <re.Match object; span=(0, 9), match='shenzhenS'>

匹配多个字符

- - - - - - - - - - - - - - - - - - - - - - - - - - - - 匹配多个字符 - - - - - - - - - - - - - - - - - - - - - - - - - -

说明：下方的 x、y、z 均为假设的普通字符，n、m（非负整数），不是正则表达式的元字符。

(xyz)：匹配小括号内的 xyz（作为一个整体去匹配）。
x?：匹配 0 个或者 1 个 x。
x*：匹配 0 个或者任意多个 x（.* 表示匹配 0 个或者任意多个字符（换行符除外））。
x+：匹配至少一个 x。
x{n}：匹配确定的 n 个 x（n 是一个非负整数）。
x{n,}：匹配至少 n 个 x。
x{,n}：匹配最多 n 个 x。
x{n,m}：匹配至少 n 个最多 m 个 x。注意：n ≤ m。
python">import re# 匹配多个字符
#?：表示前面的字符可以出现 0 次或者 1 次（非贪婪模式）
#+：表示前面的字符可以出现 1 次或者多次（贪婪模式）
#*：表示前面的字符可以出现 0 次或者多次（贪婪模式）
#{}：表示前面的字符可以出现指定的次数或者次数的范围（贪婪模式）
# {3}：表示前面的字符只能出现 3 次
# {3,6}：表示前面的字符可以出现 3 - 6 次
# {3,}：表示前面的字符至少出现 3 次
# {,3}：表示前面的字符最多出现 3 次

python">import re# ? 表示前面的字符出现0次或1次
# 0次的情况
result1 = re.search("goog?le", "goole")
print(result1)  # <re.Match object; span=(0, 5), match='goole'>
# 1次的情况
result2 = re.search("goog?le", "google")
print(result2)  # <re.Match object; span=(0, 6), match='google'>
# g出现多次的情况（不符合?的规则）
result3 = re.search("goog?le", "googggggle")
print(result3)  # None# + 表示前面的字符出现1次或多次
# 0次不符合+规则
result4 = re.search("goog+le", "goole")
print(result4)  # None
# 1次的情况
result5 = re.search("goog+le", "google")
print(result5)  # <re.Match object; span=(0, 6), match='google'>
# 多次的情况
result6 = re.search("goog+le", "googgggggggggggle")
print(result6)  # <re.Match object; span=(0, 17), match='googgggggggggggle'># *表示前面的字符出现0次或多次
# 0次的情况
result7 = re.search("goog*le", "goole")
print(result7)  # <re.Match object; span=(0, 5), match='goole'>
# 多次的情况
result8 = re.search("goog*le", "googgggggggggggle")
print(result8)  # <re.Match object; span=(0, 17), match='googgggggggggggle'># {3}表示前面的字符恰好出现3次
# 不足3次
result9 = re.search("goog{3}le", "goole")
print(result9)  # None
# 不足3次
result10 = re.search("goog{3}le", "google")
print(result10)  # None
# 超过3次
result11 = re.search("goog{3}le", "googgggggggggle")
print(result11)  # None
# 恰好3次
result12 = re.search("goog{3}le", "googggle")
print(result12)  # <re.Match object; span=(0, 8), match='googggle'># {3,6}表示前面的字符出现3到6次
# 不足3次
result13 = re.search("goog{3,6}le", "goole")
print(result13)  # None
# 不足3次
result14 = re.search("goog{3,6}le", "googgle")
print(result14)  # None
# 在范围内
result15 = re.search("goog{3,6}le", "googgggle")
print(result15)  # <re.Match object; span=(0, 9), match='googgggle'># {3,}表示前面的字符至少出现3次
# 不足3次
result16 = re.search("goog{3,}le", "goole")
print(result16)  # None
# 不足3次
result17 = re.search("goog{3,}le", "google")
print(result17)  # None
# 至少3次
result18 = re.search("goog{3,}le", "googggle")
print(result18)  # <re.Match object; span=(0, 8), match='googggle'>
# 至少3次
result19 = re.search("goog{3,}le", "googgggggggggggggggle")
print(result19)  # <re.Match object; span=(0, 21), match='googgggggggggggggggle'># {,3}表示前面的字符最多出现3次
# 超过3次
result20 = re.search("goog{,3}le", "googgggle")
print(result20)  # None
# 在范围内
result21 = re.search("goog{,3}le", "googgle")
print(result21)  # <re.Match object; span=(0, 7), match='googgle'>
# 在范围内
result22 = re.search("goog{,3}le", "goole")
print(result22)  # <re.Match object; span=(0, 5), match='goole'>

匹配边界字符

python">import re# ===== 边界字符 =====
# ^行首匹配（以指定字符开头），和在[]里的不是一个意思
# $行尾匹配
# ^文本$: 完全匹配
print(re.search("^world", "world"))  # <re.Match object; span=(0, 5), match='world'>
print(re.search("^world", "hworld"))  # Noneprint(re.search("world$", "12world"))  # <re.Match object; span=(2, 7), match='world'>
print(re.search("world$", "worlds"))  # Noneprint(re.search("^world$", "Iworlds"))  # None
print(re.search("^world$", "world"))  # <re.Match object; span=(0, 5), match='world'>
print(re.search("^world$", "worldworld"))  # Noneprint(re.search("^worl+ds$", "wor11111111d"))  # None# 词边界
# \b匹配一个单词的边界，也就是单词和空格间的位置
# \B匹配非单词边界（了解）
print(re.search(r"google\b", "abc google 123google xcvgoogle456"))  # <re.Match object; span=(4, 10), match='google'>
print(re.search(r"google\B", "abcgoogle 123google xcvgoogle456"))  # <re.Match object; span=(0, 7), match='goog.le'># 转义、让正则表达式中的一些字符失去原有的意义
# \.表示一个单纯的.不是正则中的除了换行以外任意一个字符
print(re.search("goog\\.le", "goog.le"))  # <re.Match object; span=(0, 7), match='goog.le'># |表示或者（正则表达式1|正则表达式2只要满足其中一个正则表达式就能被匹配成功）
print(re.search("ef|cd", "123ef567"))  # <re.Match object; span=(3, 5), match='ef'>

匹配分组

() : 表示一个整体 , 表示分组 , 然后捕获

python">import retel = "0755-88988888"
pattern = r'(\d{4})-(\d{8})'  # 在字符串前面加上 r 表示原始字符串
result = re.search(pattern, tel)
if result:print(result)  # <re.Match object; span=(0, 13), match='0755-88988888'>print(result.group(0))  # 0755-88988888print(result.group(1))  # 0755print(result.group(2))  # 88988888print(result.groups())  # ('0755', '88988888')
else:print("未找到匹配的电话号码格式")

贪婪和非贪婪

贪婪匹配与非贪婪匹配概念

在正则表达式中，贪婪匹配和非贪婪匹配主要决定了匹配的字符数量。

贪婪匹配：在满足匹配条件的情况下，尽可能多地匹配字符。例如，+ 是贪婪匹配的量词，当使用 d+ 时，它会尝试匹配尽可能多的数字。
非贪婪匹配：在满足匹配条件的情况下，尽可能少地匹配字符。通过在贪婪量词（如 +、*）后面添加 ? 来实现非贪婪匹配。例如，d+? 会尽可能少地匹配数字。

python">import re# 正则表达式中的贪婪和非贪婪
# 贪婪匹配示例
result1 = re.findall(r"abc(\d+)", "abc2345678vf")
print("贪婪匹配结果:", result1)  # 贪婪匹配结果: ['2345678']# 非贪婪匹配示例
result2 = re.findall(r"abc(\d+?)", "abc2345678vf")
print("非贪婪匹配结果:", result2)  # 非贪婪匹配结果: ['2']

re模块中常用功能函数

函数

说明

compile(pattern, flags=0)

编译正则表达式pattern，并返回一个正则表达式对象。flags用于指定正则表达式的匹配模式，如忽略大小写等。

match(pattern, string, flags=0)

从字符串string的起始位置匹配正则表达式pattern。如果匹配成功，返回一个匹配对象；否则返回None。

search(pattern, string, flags=0)

搜索字符串string中第一次出现正则表达式pattern的模式。如果找到匹配项，返回一个匹配对象；否则返回None。

split(pattern, string, maxsplit=0, flags=0)

使用正则表达式pattern作为分隔符拆分字符串string。maxsplit指定最大分割次数，返回分割后的列表。

sub(pattern, repl, string, count=0, flags=0)

使用字符串repl替换字符串string中与正则表达式pattern匹配的所有模式。count指定替换次数，返回替换后的字符串。

fullmatch(pattern, string, flags=0)

如果字符串string与正则表达式pattern完全匹配（从开头到结尾），则返回匹配对象；否则返回None。

findall(pattern, string, flags=0)

查找字符串string中所有与正则表达式pattern匹配的模式，并返回一个包含所有匹配项的列表。

finditer(pattern, string, flags=0)

查找字符串string中所有与正则表达式pattern匹配的模式，并返回一个迭代器，每个元素都是一个匹配对象。

purge()

清除隐式编译的正则表达式的缓存。

标志

说明

re.I 或 re.IGNORECASE

忽略大小写匹配。

re.M 或 re.MULTILINE

多行匹配，改变^和$的行为，使它们分别匹配每一行的开始和结束，而不是整个字符串的开始和结束。

python">import re
# 1. re.match()
# 匹配字符串是否以指定的正则内容开头，匹配成功返回对象，匹配失败返回None
# 第一个参数：正则表达式
# 第二个参数：要验证的字符串
# 第三个参数：可选参数，正则表达式修饰符
text1 = "abc123"
match_result1 = re.match(r"abc", text1)
if match_result1:print("re.match()匹配成功:", match_result1.group())
else:print("re.match()匹配失败")# 2. re.search()
# 匹配字符串中是否包含指定的正则内容，匹配成功返回对象，匹配失败返回None
# 第一个参数：正则表达式
# 第二个参数：要验证的字符串
# 第三个参数：可选参数，正则表达式修饰符
text2 = "hello abc world"
search_result = re.search(r"abc", text2)
if search_result:print("re.search()匹配成功:", search_result.group())
else:print("re.search()匹配失败")# 3. re.findall()
# 获取所有匹配的内容，会得到一个列表
# 第一个参数：正则表达式
# 第二个参数：要验证的字符串
text3 = "a1b2c3a4b5"
findall_result = re.findall(r"\d", text3)
print("re.findall()结果:", findall_result)# 4. re.compile()编译正则表达式，提高正则匹配的效率
string = "0755-89787654"
com = re.compile(r'(\d{4})-(\d{8})')
print(com.findall(string))# 5. 拆分
# re.split()
print(re.split(r"\d", "sdf1234mkj5431km"))# 6. 替换
# re.sub()或者re.subn()
str1 = "难以掩盖内心的心情"
print(re.sub(r"\s+", "..", str1))
print(re.subn(r"\s+", "..", str1))# 7. 匹配中文
chinese = "[\u4e00-\u9fa5]+"
print(re.search(chinese, "hello!世界 345"))

综合案例

案例1：

python">"""
要求：用户名必须由字母、数字或下划线构成且长度在6~20个字符之间，QQ号是5~12的数字且首位不能为0
"""
import reusername = input('请输入用户名: ')
qq = input('请输入QQ号: ')
# match函数的第一个参数是正则表达式字符串或正则表达式对象
# match函数的第二个参数是要跟正则表达式做匹配的字符串对象
m1 = re.match(r'^[0-9a-zA-Z_]{6,20}$', username)
if not m1:print('请输入有效的用户名.')
# fullmatch函数要求字符串和正则表达式完全匹配
# 所以正则表达式没有写起始符和结束符
m2 = re.fullmatch(r'[1-9]\d{4,11}', qq)
if not m2:print('请输入有效的QQ号.')
if m1 and m2:print('你输入的信息是有效的!')

案例2：

python">import repoem = '窗前明月光，疑是地上霜。举头望明月，低头思故乡。'
sentences_list = re.findall(r'([^，。]+[，。]?)', poem)
for sentence in sentences_list:print(sentence)print()poem = '窗前明月光，疑是地上霜。举头望明月，低头思故乡。'
sentences_list = re.split(r'[，。]', poem)
sentences_list = [sentence for sentence in sentences_list if sentence]
for sentence in sentences_list:print(sentence)

总结

“很多事情都是熟能生巧，请大胆的去尝试吧！”

恭喜你学会了正则表达式，快去试试吧！！！

python之正则表达式总结

概述

模式修饰符

匹配多个字符

匹配边界字符

贪婪和非贪婪

re模块中常用功能函数

综合案例

总结

相关文章

第六章：DNS域名解析服务器

论文阅读笔记：Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

mysql 快速解决死锁方式

Qt Udp的组播（多播）、广播和单播

2024.11.09【BUG报错】| Fastuniq “Error in Reading pair-end FASTQ sequence!”解决方案

GEE 图表——sentinel-2和Landsat-8 影像各波段的图表展示和对比

华为大咖说 | 浅谈智能运维技术

推荐一款PowerPoint转Flash工具：iSpring Suite

表达式	含义	示例说明
.	匹配除换行符以外的任意字符	-
[0123456789]	是字符集合，表示匹配方括号中所包含的任意一个字符	匹配“123abc”中的1、2、3
[good]	匹配good中任意一个字符	匹配“good”中的g、o、o、d其中一个
[a-z]	匹配任意小写字母	匹配“abc”中的a、b、c
[A-Z]	匹配任意大写字母	匹配“ABC”中的A、B、C
[0-9]	匹配任意数字，类似[0123456789]	匹配“123abc”中的1、2、3
[0-9a-zA-Z]	匹配任意的数字和字母	匹配“123abcABC”中的任何字符
[0-9a-zA-Z_]	匹配任意的数字、字母和下划线	匹配“123abc_ABC”中的任何字符
[^good]	匹配除了g、o、o、d这几个字符以外的所有字符，中括号里的^称为脱字符，表示不匹配集合中的字符	匹配“hello”中的h、e、l、l
[^0-9]	匹配所有的非数字字符	匹配“abc”中的a、b、c
\d	匹配数字，效果同[0-9]	匹配“123abc”中的1、2、3
\D	匹配非数字字符，效果同[^\d]	匹配“abc”中的a、b、c
\w	匹配数字、字母和下划线，效果同[0-9a-zA-Z_]	匹配“123abc_ABC”中的任何字符
\W	匹配非数字、字母和下划线，效果同[^\w]	匹配“!@#”中的!、@、#
\s	匹配任意的空白符(空格，回车，换行，制表，换页)，效果同[ \n\t\f\r]	匹配文本中的空格、回车等空白部分

函数	说明
compile(pattern, flags=0)	编译正则表达式pattern，并返回一个正则表达式对象。flags用于指定正则表达式的匹配模式，如忽略大小写等。
match(pattern, string, flags=0)	从字符串string的起始位置匹配正则表达式pattern。如果匹配成功，返回一个匹配对象；否则返回None。
search(pattern, string, flags=0)	搜索字符串string中第一次出现正则表达式pattern的模式。如果找到匹配项，返回一个匹配对象；否则返回None。
split(pattern, string, maxsplit=0, flags=0)	使用正则表达式pattern作为分隔符拆分字符串string。maxsplit指定最大分割次数，返回分割后的列表。
sub(pattern, repl, string, count=0, flags=0)	使用字符串repl替换字符串string中与正则表达式pattern匹配的所有模式。count指定替换次数，返回替换后的字符串。
fullmatch(pattern, string, flags=0)	如果字符串string与正则表达式pattern完全匹配（从开头到结尾），则返回匹配对象；否则返回None。
findall(pattern, string, flags=0)	查找字符串string中所有与正则表达式pattern匹配的模式，并返回一个包含所有匹配项的列表。
finditer(pattern, string, flags=0)	查找字符串string中所有与正则表达式pattern匹配的模式，并返回一个迭代器，每个元素都是一个匹配对象。
purge()	清除隐式编译的正则表达式的缓存。

标志	说明
re.I 或 re.IGNORECASE	忽略大小写匹配。
re.M 或 re.MULTILINE	多行匹配，改变^和$的行为，使它们分别匹配每一行的开始和结束，而不是整个字符串的开始和结束。