Python re 模块：正则表达式的强大工具

文章目录

Python `re` 模块：正则表达式的强大工具
导入 `re` 模块
基本匹配方法
- re.match()
- re.search()
- re.findall()
- re.finditer()
替换操作
- re.sub()
分割字符串
- re.split()
捕获组和非捕获组
- 捕获组
- 非捕获组
常用模式符号
实际应用示例
- 验证电子邮件格式
- 提取 URL
预定义字符简介
- 示例代码
- - 1. 匹配数字
  - 2. 匹配非数字字符
  - 3. 匹配空白字符
  - 4. 匹配字母数字字符
性能考虑
总结

Python `re` 模块：正则表达式的强大工具

正则表达式是处理字符串的强大工具，而 Python 的 re 模块为我们提供了灵活且高效的方式来使用正则表达式。本文将深入探讨 re 模块的常用功能和实际应用示例。

导入 `re` 模块

在使用正则表达式之前，首先需要导入 re 模块：

python">import re

基本匹配方法

re.match()

re.match() 从字符串的开始位置匹配一个模式。

python">pattern = r'\d+'
string = '123abc'match = re.match(pattern, string)
if match:print("匹配成功:", match.group())
else:print("匹配失败")

re.search()

re.search() 在整个字符串中搜索模式。

python">pattern = r'\d+'
string = 'abc123xyz'search = re.search(pattern, string)
if search:print("找到的匹配:", search.group())
else:print("没有找到匹配")

re.findall()

re.findall() 返回字符串中所有匹配的子串，以列表形式返回。

python">pattern = r'\d+'
string = 'abc123xyz456'matches = re.findall(pattern, string)
print("所有匹配:", matches)

re.finditer()

re.finditer() 返回一个迭代器，包含所有匹配的对象。

python">pattern = r'\d+'
string = 'abc123xyz456'for match in re.finditer(pattern, string):print("匹配到:", match.group(), "位置:", match.start())

替换操作

re.sub()

re.sub() 用于替换匹配的字符串。

python">pattern = r'\d+'
string = 'abc123xyz456'
result = re.sub(pattern, '#', string)
print("替换后的字符串:", result)

分割字符串

re.split()

re.split() 按照正则表达式分割字符串。

python">pattern = r'\d+'
string = 'abc123xyz456'
result = re.split(pattern, string)
print("分割后的结果:", result)

捕获组和非捕获组

捕获组

使用括号 () 来定义捕获组。

python">pattern = r'(\d+)-(\d+)-(\d+)'
string = '2024-10-23'match = re.match(pattern, string)
if match:print("年份:", match.group(1), "月份:", match.group(2), "日期:", match.group(3))

非捕获组

使用 (?:...) 来定义非捕获组。

python">pattern = r'(?:\d+)-(\d+)'
string = '2024-10'match = re.match(pattern, string)
if match:print("月份:", match.group(1))

常用模式符号

符号	描述	示例
`.`	匹配除换行符外的任何字符	`a.b` 匹配 `acb`, `a1b`
`^`	匹配字符串的开头	`^abc` 匹配 `abcde`
`$`	匹配字符串的结尾	`xyz$` 匹配 `abcxyz`
`*`	匹配 0 次或多次	`ab*c` 匹配 `ac`, `abc`, `abbc`
`+`	匹配 1 次或多次	`ab+c` 匹配 `abc`, `abbc` 但不匹配 `ac`
`?`	匹配 0 次或 1 次	`ab?c` 匹配 `ac` 或 `abc`
`{m,n}`	匹配 m 到 n 次	`a{2,4}` 匹配 `aa`, `aaa`, `aaaa`
`[]`	匹配字符集	`[abc]` 匹配 `a`, `b`, `c`
`	`	或运算符
`()`	捕获组	`(abc)` 捕获 `abc`
`(?:...)`	非捕获组	`(?:abc)` 不捕获 `abc`
`\d`	匹配数字	`\d` 匹配 `0-9`
`\D`	匹配非数字	`\D` 匹配 `a-z`
`\w`	匹配字母、数字及下划线	`\w` 匹配 `a-z`, `A-Z`, `0-9`, `_`
`\W`	匹配非字母、数字及下划线	`\W` 匹配空格、标点等
`\s`	匹配空白字符	`\s` 匹配空格、制表符等
`\S`	匹配非空白字符	`\S` 匹配任何非空格字符

实际应用示例

验证电子邮件格式

python">def is_valid_email(email):pattern = r'^[\w\.-]+@[\w\.-]+\.\w+$'return re.match(pattern, email) is not Noneemail = 'test@example.com'
print("邮箱格式是否正确:", is_valid_email(email))

提取 URL

python">def extract_urls(text):pattern = r'https?://[^\s]+'return re.findall(pattern, text)text = "访问我们的官网 https://example.com 和 http://test.com"
urls = extract_urls(text)
print("提取的 URL:", urls)

预定义字符简介

符号	描述	相当于
`\d`	匹配任何十进制数	`[0-9]`
`\D`	匹配任何非数字字符	`[^0-9]`
`\s`	匹配任何空白字符	`[\t\n\r\f\v]`
`\S`	匹配任何非空白字符	`[^\t\n\r\f\v]`
`\w`	匹配任何字母数字字符（包括下划线）	`[a-zA-Z0-9_]`
`\W`	匹配任何非字母数字字符（包括下划线）	`[^a-zA-Z0-9_]`

示例代码

1. 匹配数字

使用 \d 匹配字符串中的数字：

python">import retext = "匹配规则这2个字符串3是否匹配规则5则则则7则"
matches = re.findall(r"\d", text)  # 匹配任何单个数字
print(matches)  # 输出: ['2', '3', '5', '7']

如果需要匹配一位或多位数字，可以使用 \d+：

python">import retext = "匹配规则这2个字符串134444是否匹配规则5则则则7则"
matches = re.findall(r"\d+", text)  # 匹配一位或多位数字
print(matches)  # 输出: ['2', '134444', '5', '7']

2. 匹配非数字字符

使用 \D 匹配非数字字符：

python">import retext = "匹配规则这2个字符串3是否匹配规则5则则则7则"
matches = re.findall(r"\D", text)  # 匹配任何非数字字符
print(matches)  # 输出: ['匹', '配', '规', '则', '这', '个', '字', '符', '串', '是', '否', '匹', '配', '规', '则', '则', '则', '则', '则']

3. 匹配空白字符

使用 \s 匹配字符串中的空白字符：

python">import retext = "匹配规则   这2个字符串3是否匹\n配规则5则则则7则"
matches = re.findall(r"\s", text)  # 匹配任何空白字符
print(matches)  # 输出: [' ', ' ', ' ', '\n']

使用 \S 匹配非空白字符：

python">import retext = "匹配规则   这2个字符串3是否匹\n配规则5则则则7则"
matches = re.findall(r"\S", text)  # 匹配任何非空白字符
print(matches)  # 输出: ['匹', '配', '规', '则', '这', '2', '个', '字', '符', '串', '3', '是', '否', '匹', '配', '规', '则', '5', '则', '则', '则', '7', '则']

4. 匹配字母数字字符

使用 \w 匹配包括下划线在内的字母数字字符：

python">import retext = "https://www.cnblogs.com/"
matches = re.findall(r'\w', text)  # 匹配字母数字字符
print(matches)  # 输出: ['h', 't', 't', 'p', 's', 'w', 'w', 'w', 'c', 'n', 'b', 'l', 'o', 'g', 's', 'c', 'o', 'm']

使用 \W 匹配非字母数字字符：

python">import retext = "https://www.cnblogs.com/"
matches = re.findall(r'\W', text)  # 匹配非字母数字字符
print(matches)  # 输出: [':', '/', '/', '.', '.', '/']

性能考虑

在处理大量数据时，正则表达式的性能可能会受到影响。可以考虑以下优化策略：

使用原始字符串（r''）来避免转义字符。
避免复杂的表达式，尽量简化模式。
使用编译后的正则表达式。

python">compiled_pattern = re.compile(r'\d+')
matches = compiled_pattern.findall('abc123xyz456')
print("匹配结果:", matches)

总结

Python 的 re 模块为字符串处理提供了强大的正则表达式支持。通过掌握基本用法和实际应用，能够高效地处理复杂的字符串匹配和替换任务。掌握正则表达式的语法和方法，可以显著提升你的 Python 编程能力。

Python re 模块：正则表达式的强大工具

文章目录

Python `re` 模块：正则表达式的强大工具

导入 `re` 模块

基本匹配方法

re.match()

re.search()

re.findall()

re.finditer()

替换操作

re.sub()

分割字符串

re.split()

捕获组和非捕获组

捕获组

非捕获组

常用模式符号

实际应用示例

验证电子邮件格式

提取 URL

预定义字符简介

示例代码

1. 匹配数字

2. 匹配非数字字符

3. 匹配空白字符

4. 匹配字母数字字符

性能考虑

总结

相关文章

Qt中使用线程之QRunnable

C#中的LINQ之美：优雅的数据查询与操作

欧盟 RED 网络安全法规 EN 18031

安全见闻（7）——开阔眼界，不做井底之蛙

Windows系统PyCharm右键运行.sh文件

LeetCode 110. 平衡二叉树

STMicroelectronics 意法半导体芯片选型表

2024软考网络工程师笔记 - 第8章.网络安全

Python re 模块：正则表达式的强大工具

文章目录

Python re 模块：正则表达式的强大工具

导入 re 模块

基本匹配方法

re.match()

re.search()

re.findall()

re.finditer()

替换操作

re.sub()

分割字符串

re.split()

捕获组和非捕获组

捕获组

非捕获组

常用模式符号

实际应用示例

验证电子邮件格式

提取 URL

预定义字符简介

示例代码

1. 匹配数字

2. 匹配非数字字符

3. 匹配空白字符

4. 匹配字母数字字符

性能考虑

总结

相关文章

Python `re` 模块：正则表达式的强大工具

导入 `re` 模块