JAVA实现将PDF转换成word文档

ops/2024/11/28 15:54:05/

POM.xml

 

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId>
<!--       <version>3.2.1</version>--><version>2.3.9.RELEASE</version><relativePath/> <!-- lookup parent from repository --></parent><groupId>com.jack</groupId><artifactId>jackDemo</artifactId><version>0.0.1-SNAPSHOT</version><name>jackDemo</name><description>jackDemo</description><properties><java.version>1.8</java.version></properties><dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-mongodb</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-devtools</artifactId><scope>runtime</scope><optional>true</optional></dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope></dependency><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.75</version></dependency><dependency><groupId>org.openpnp</groupId><artifactId>opencv</artifactId><version>4.5.3-4</version></dependency><!-- Apache POI for Excel files --><dependency><groupId>org.apache.poi</groupId><artifactId>poi-ooxml</artifactId><version>5.2.3</version> <!-- 请检查并使用最新版本 --></dependency><!-- Apache POI dependencies (these may be included automatically by Maven, but it's good to be explicit) --><dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>5.2.3</version> <!-- 与poi-ooxml版本保持一致 --></dependency><!-- Apache Commons Collections (required by POI) --><dependency><groupId>org.apache.commons</groupId><artifactId>commons-collections4</artifactId><version>4.4</version> <!-- 确保版本与你的项目兼容 --></dependency><!-- Apache Commons IO (optional, but useful for file handling) --><dependency><groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version> <!-- 确保版本与你的项目兼容 --></dependency><!-- PDFBox for reading PDF files --><dependency><groupId>org.apache.pdfbox</groupId><artifactId>pdfbox</artifactId><version>2.0.24</version></dependency><!-- docx4j for creating Word documents --><dependency><groupId>org.docx4j</groupId><artifactId>docx4j</artifactId><version>3.2.1</version></dependency><dependency><groupId>javax.xml.bind</groupId><artifactId>jaxb-api</artifactId><version>2.3.1</version></dependency><dependency><groupId>org.glassfish.jaxb</groupId><artifactId>jaxb-runtime</artifactId><version>2.3.1</version></dependency></dependencies><build><plugins><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId></plugin></plugins></build></project>

java 文件:
 

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
import org.docx4j.wml.Body;
import org.docx4j.wml.P;
import org.docx4j.wml.R;
import org.docx4j.wml.Text;import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;public class PdfToWordConverter {public static void main(String[] args) throws Exception{String pdfFilePath = "D:\\word\\何以为父影响彼此一生的父子关系.pdf"; // 替换为你的PDF文件路径String wordFilePath = "D:\\word\\何以为父影响彼此一生的父子关系.docx"; // 生成的Word文件路径try {// 读取PDF文件内容String pdfText = extractTextFromPdf(pdfFilePath);// 将内容写入Word文档createWordDocument(wordFilePath, pdfText);System.out.println("PDF to Word conversion completed successfully!");} catch (IOException e) {e.printStackTrace();}}public static String extractTextFromPdf(String filePath) throws IOException {PDDocument document = PDDocument.load(new FileInputStream(filePath));PDFTextStripper pdfStripper = new PDFTextStripper();return pdfStripper.getText(document);}public static void createWordDocument(String filePath, String content) throws Exception {WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();Body body = mainDocumentPart.getContents().getBody();// 将内容按段落分割并添加到Word文档中String[] paragraphs = content.split("\\r?\\n");for (String paragraph : paragraphs) {P p = new P();R r = new R();Text text = new Text();text.setParent(paragraph);r.getContent().add(text);p.getContent().add(r);body.getContent().add(p);}// 保存Word文档try (FileOutputStream out = new FileOutputStream(new File(filePath))) {wordMLPackage.save(out);}}
}


http://www.ppmy.cn/ops/137396.html

相关文章

【论文笔记】LLaVA-o1: Let Vision Language Models Reason Step-by-Step

&#x1f34e;个人主页&#xff1a;小嗷犬的个人主页 &#x1f34a;个人网站&#xff1a;小嗷犬的技术小站 &#x1f96d;个人信条&#xff1a;为天地立心&#xff0c;为生民立命&#xff0c;为往圣继绝学&#xff0c;为万世开太平。 基本信息 标题: LLaVA-o1: Let Vision Lan…

《热带气象学报》

《热带气象学报》创刊于1984年&#xff0c;前身为《热带气象》&#xff0c;1993年更名为《热带气象学报》&#xff0c;是广东省气象局主管&#xff0c;中国气象局广州热带海洋气象研究所主办的中文学术期刊。 本刊坚持“热带气象”的办刊特色&#xff0c;主要刊登&#xff1a;…

Carla学习日志

车辆工程本科小白&#xff0c;如何在Carla中运行仿真&#xff0c;构建车辆行人地图场景模型并且掌握基本的环境感知算法&#xff1f; - Here-Kin的回答 - 知乎 https://www.zhihu.com/question/455478599/answer/2315207702 CARLA Documentation Python API reference 小飞自动…

作业3-基于pytorch的非线性模型设计

一、任务描述 使用BP神经网络和CNN实现对MNITS数据集的识别&#xff0c;并通过修改相关参数&#xff0c;比较各模型的识别准确率。 二、相关配置 pytorch&#xff1a;2.5.1 python&#xff1a;3.12 pycharm&#xff1a;2024.1.2&#xff08;这个影响不大&#xff0c;版本不要太…

模拟算法实例讲解:从理论到实践的编程之旅

目录 1、模拟算法简介 2、替换所有问号 3、提莫攻击 4、Z字形变换 5、外观数列 6、数青蛙 1、模拟算法简介 模拟算法是一种基本的算法设计方法&#xff0c;它的核心思想是按照问题描述的规则&#xff0c;逐步模拟问题的发展过程&#xff0c;从而得到问题的解决方案。这种…

【halcon】Metrology工具系列之 get_metrology_object_model_contour

get_metrology_object_model_contour (Operator) Name get_metrology_object_model_contour — 在图像坐标中查询测量对象的模型轮廓。 Signature get_metrology_object_model_contour( : Contour : MetrologyHandle, Index, Resolution : )Description get_metrology_obj…

【Flink-scala】DataStream编程模型之 窗口的划分-时间概念-窗口计算程序

DataStream编程模型之 窗口的划分-时间概念-窗口计算程序 1. 窗口的划分 1.1 窗口分为&#xff1a;基于时间的窗口 和 基于数量的窗口 基于时间的窗口&#xff1a;基于起始时间戳 和终止时间戳来决定窗口的大小 基于数量的窗口&#xff1a;根据固定的数量定义窗口 的大小 这…

RK3568平台开发系列讲解(DMA篇)DMA engine使用

🚀返回专栏总目录 文章目录 一、申请DMA channel二、配置DMA channel的参数三、获取传输描述(tx descriptor)四、启动传输沉淀、分享、成长,让自己和他人都能有所收获!😄 📢DMA子系统下有一个帮助测试的测试驱动(drivers/dma/dmatest.c), 从这个测试驱动入手我们了解…