用Python把QQ聊天记录文件转成WORD并排版

和女票在一起五年了，保留了几年的QQ聊天记录，偶然翻到，感觉很温暖，就想把这些文字做成一本属于我们的书，应该会很有纪念意义。然而qq备份的聊天记录是txt格式，网上找了半天也没有合适的排版工具，上百页的文字一点一点粘贴到word里也不现实，就想到了用万能的python，把txt里的文字写到word里，并进行了简单的排版。书已经做出来了，效果还不错，如果有正在发愁送什么礼物的男生，可以参考下，哈哈~

首先txt中的文字格式是这样的：（文字是我随便写的）

我想把文字写成word，并排版成类似于qq聊天界面一样的格式，让人读的时候就像在和当年的对方聊天一样，很有代入感。如下图所示：

要做成这种，只需要两步：

1 把txt中的文字写入word

2 在word中进行排版

用python处理word，最好用python-docx库：

http://python-docx.readthedocs.io/en/latest/user/install.html

关于如何处理word，库的文档中写的很清楚，可以满足基本的word排版，我的排版比较简单，已经完全够用了。下面是我的处理方法。

首先是把txt中的文字写入word，这里我建两个文件夹，分别存放所有的txt文件和word文件：

把txt写入word的代码如下。由于自动导出的聊天记录里总有一些乱七八糟的文字，这里用到了正则表达式，把它们自动删除。

def chat():# 正则表达式pattern3 = re.compile(r'|\[图片\]')pattern4 = re.compile(r'\(来自手机QQ2012 \[Android\]:语音对讲，高效沟通！\)|\(来自手机QQ2012 \[Android\] \)')# 文件目录path_txt = r'G:\TEST\python_text\聊天界面\txt\\'path_word = r'G:\TEST\python_text\聊天界面\word\\'pathDir = os.listdir(path_txt)for childfile in pathDir:print(childfile)# 打开文件，按行读取f = codecs.open(path_txt + childfile, 'rb', 'utf-8', )newline1 = f.readlines()f.close()# 去掉换行符和空行，存入ss = []for line in newline1:line = line.strip()             # 去掉换行符if line == '':                  # 去掉空行continueif re.findall(pattern3, line):  # 去掉错误行continueif re.findall(pattern4, line):  # 去掉错误行continues.append(line)# 存入wordfile_word = docx.Document()for line in s:file_word.add_paragraph(line)file_word.save(path_word + '{0}-word.docx'.format(childfile))# 调word格式word_format(path_word, childfile)

随后在word中进行排版，这个就看个人喜好了，字体、字号、间距、缩进之类的，都可以利用python-docx库自己调整。我的思路是，先把需要用到的样式（页面样式、段落样式、字体样式）都设定好，随后还是利用正则表达式，分别对含有‘老婆’、‘老公’称谓的段落，以及其下一段，调整样式。调整完后，再把word另存为。考虑到内容太多，我就把word页面做成了A5的，字体也尽量减小，不过就这样还是有好几百页。

word排版完之后，为了尽量做成聊天界面的效果，我又用visio画了手机状态栏和qq界面上的状态栏，作为页眉插入到了页面中，还插入了页码，当然这些都是手动调整的。（画图渣渣~）

文字部分完成了，就可以打印出来了，可以在页面中间夹几张照片之类的，丰富一下内容，最后用PS设计个封皮（请大神设计的哈哈），找个打印店包装下就可以啦，一本很有纪念意义的书就完成了。

所有代码如下：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-'''
功能：
把txt格式的聊天记录备份文件，改为word格式的聊天记录文件'''import sys
import os
import codecs
import csv
import docx
import re
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.oxml.ns import qn
from docx.enum.style import WD_STYLE_TYPE
from docx.shared import Inches, Pt, Mm, Cm, RGBColordef chat():# 正则表达式pattern3 = re.compile(r'|\[图片\]')pattern4 = re.compile(r'\(来自手机QQ2012 \[Android\]:语音对讲，高效沟通！\)|\(来自手机QQ2012 \[Android\] \)')# 文件目录path_txt = r'G:\TEST\python_text\聊天界面\txt\\'path_word = r'G:\TEST\python_text\聊天界面\word\\'pathDir = os.listdir(path_txt)for childfile in pathDir:print(childfile)# 打开文件，按行读取f = codecs.open(path_txt + childfile, 'rb', 'utf-8', )newline1 = f.readlines()f.close()# 去掉换行符和空行，存入ss = []for line in newline1:line = line.strip()             # 去掉换行符if line == '':                  # 去掉空行continueif re.findall(pattern3, line):  # 去掉错误行continueif re.findall(pattern4, line):  # 去掉错误行continues.append(line)# 存入wordfile_word = docx.Document()for line in s:file_word.add_paragraph(line)file_word.save(path_word + '{0}-word.docx'.format(childfile))# 调word格式word_format(path_word, childfile)#################################################################################################################def inverted_txt(pathDir, path_txt, pattern3, path_word):'''功能： 把聊天顺序颠倒的txt文件正过来:param pathDir::param path_txt::param pattern3::param path_word::return:'''# 正则表达式pattern1 = re.compile(r'老婆|老公')for childfile in pathDir:print(childfile)# 打开文件，按行读取f = codecs.open(path_txt + childfile, 'rb', 'utf-8', )newline1 = f.readlines()f.close()# 去掉换行符和空行，存入ss = []for i in range(len(newline1)):if re.findall(pattern3, newline1[i]):  # 去掉错误行continueif re.findall(pattern1, newline1[i]):  # 如果某一段有老婆或老公s.insert(0, newline1[i])s.insert(1, newline1[i+1])else:continue# 存入txtfile_txt = codecs.open(path_word + '{0}.txt'.format(childfile), 'w', 'utf-8')for line in s:print(line)file_txt.write(line + '\n')file_txt.close()
################################################################################################################def word_format(path_word, childfile):'''功能：调word格式:param path_word: 文件目录:param childfile: 文件名:return:'''# 正则表达式pattern1 = re.compile(r'老婆')pattern2 = re.compile(r'老公')# 打开wordf_word = open(path_word + '{0}-word.docx'.format(childfile), 'rb')document1 = docx.Document(f_word)f_word.close()# 页面样式sections = document1.sectionsfor section in sections:print(section.start_type)# 页面大小section.page_width = Mm(149)section.page_height = Mm(210)# 页边距section.left_margin = Inches(0.8)section.right_margin = Inches(0.8)section.up_margin = Inches(0.4)section.bottom_margin = Inches(0.4)# 页眉页脚section.header_distance = Inches(0.2)# 字体样式font_name_1 = u'微软雅黑'font_name_2 = u'宋体'font_name_3 = u'方正宋刻本秀楷简体'font_name_4 = u'方正清刻本悦宋简体'# 颜色样式color_gray = RGBColor(0x14,0x14,0x14)   # 灰色# 段落样式styles = document1.stylesstyle_1 = styles.add_style('Name', WD_STYLE_TYPE.PARAGRAPH)             # 第一个段样式：Namestyle_1.base_style = styles['Normal']                   # 继承Normal样式style_1.font.name = font_name_2                         # 字体style_1._element.rPr.rFonts.set(qn('w:eastAsia'), font_name_2)style_1.font.size = Pt(6)                               # 字号style_1.font.italic = True                              # 斜体style_1.paragraph_format.line_spacing = Pt(7)           # 行距style_1.paragraph_format.space_before = Pt(3)           # 段前间距style_1.paragraph_format.space_after = Pt(0)            # 段后间距style_2 = styles.add_style('Conversation', WD_STYLE_TYPE.PARAGRAPH)     # 第二个段样式：Conversationstyle_2.base_style = styles['Normal']                   # 继承Normal样式style_2.font.name = font_name_3                         # 字体style_2._element.rPr.rFonts.set(qn('w:eastAsia'), font_name_3)style_2.font.size = Pt(9)                              # 字号style_2.paragraph_format.line_spacing = Pt(9)          # 行距style_2.paragraph_format.space_before = Pt(0)           # 段前间距style_2.paragraph_format.space_after = Pt(0.5)          # 段后间距# 提取所有段落paragraphs = document1.paragraphsfor i in range(len(paragraphs)):  # 对于每一段n = 20if re.findall(pattern1, paragraphs[i].text):        # 如果某一段有‘老婆’paragraphs[i].style = style_1                                           # 当前段格式if re.findall(pattern2, paragraphs[i + 1].text):                        # 如果下一段有'老公'continueif len(paragraphs[i + 1].text) > n:                                     # 如果下一段长，右缩进paragraphs[i + 1].paragraph_format.right_indent = Inches(2)paragraphs[i + 1].style = style_2                                       # 下一段格式if re.findall(pattern2, paragraphs[i].text):        # 如果某一段有'老公'# 调页边距后paragraphs[i].paragraph_format.alignment = WD_ALIGN_PARAGRAPH.RIGHTparagraphs[i].style = style_1                                           # 当前段格式if re.findall(pattern1, paragraphs[i + 1].text):                        # 如果下一段有老婆continueif len(paragraphs[i + 1].text) < n:                                     # 如果下一段短paragraphs[i + 1].paragraph_format.alignment = WD_ALIGN_PARAGRAPH.RIGHTelse:                                                                   # 如果下一段长paragraphs[i + 1].paragraph_format.left_indent = Inches(2)paragraphs[i + 1].style = style_2                                       # 下一段格式else:continuedocument1.save(path_word + 'newWord-{0}.docx'.format(childfile))############################################################################################################
if __name__ == '__main__':chat()

用Python把QQ聊天记录文件转成WORD并排版

相关文章

Python-QQ聊天记录分析-jieba+wordcloud

Nonebot QQ机器人插件九：qq群聊天记录词云图

【已解决】有些网站播放视频时，视频播放器无法拖动进度，无法快进的问题

Roop：显卡GPU版软件已就位，速度提升28倍！

【SpringCloud】二、Nacos集群与Feign服务调用简介

电脑能登qq但是无法访问网页

连了热点可以用qq却不能用浏览器

Shell Script Strengthening Exercises