java POI解析Excel大文件,获取表头

embedded/2024/9/23 6:33:30/

目录

  • 前言
  • 依赖
  • 代码
  • StreamingReader的open
  • WorkbookFactory的create
  • CSV解析首行

前言

poi解析大文件可能出现oom,通样大小文件,xlsx会oom,xls不会,所以使用流式的方式改造解析xlsx文件的代码。
我的需求是提取每一页的表头,所以值提取第一行,如果需要全部内容,把行数循环地方的if改成while就行。
产生Workbook给的入参都是给的File,没用Inputstream,原因见后两章

依赖

可能不全,按需引入更多poi相关依赖

		<dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>4.1.2</version></dependency><dependency><groupId>com.monitorjbl</groupId><artifactId>xlsx-streamer</artifactId><version>2.2.0</version></dependency>

代码

java">//newExcel true表示xlsx,false表示xls
private static String excel(File file, boolean newExcel) throws Exception {StringBuilder sb = new StringBuilder();Workbook workbook = null;try {if (newExcel) {//我只取第一行,你们按需调大workbook = StreamingReader.builder().rowCacheSize(1).bufferSize(10240).open(file);} else {//xls的还是普通方式workbook = WorkbookFactory.create(file);}getSheet(sb, workbook);} finally {if (workbook != null) {workbook.close();}}return sb.toString();}private static void getSheet(StringBuilder sb, Workbook workbook) throws IOException {try {sb.append(LEFT_MIDDLE_BRACKET);Iterator<Sheet> sheetIterator = workbook.iterator();while (sheetIterator.hasNext()) {Sheet sheet = sheetIterator.next();String sheetName = sheet.getSheetName();if (sheetName.startsWith(BTXA_MASK)) {continue;}sb.append(sheetName).append(C_COLON);Iterator<Row> rowIterator = sheet.iterator();//想取所有行的人把if改成whileif (rowIterator.hasNext()) {Row row = rowIterator.next();firstLine(sb, row);if (sheetIterator.hasNext()) {sb.append(C_SEMICOLON);}}}sb.append(RIGHT_MIDDLE_BRACKET);} finally {workbook.close();}}private static void firstLine(StringBuilder sb, Row row) {try {int numOfColumns = row.getLastCellNum();for (int j = 0; j < numOfColumns; j++) {Cell cell = row.getCell(j, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);sb.append(getCell(cell));if (j != numOfColumns - 1) {sb.append(C_COMMA);}}} catch (Exception e) {log.error("异常列", e);sb.append("异常列");}}private static String getCell(Cell cell) {// 获取单元格类型CellType cellType = cell.getCellType();// 根据单元格类型处理值String cellValue;switch (cellType) {case STRING:cellValue = cell.getStringCellValue();break;case NUMERIC:cellValue = String.valueOf(cell.getNumericCellValue());break;case BOOLEAN:cellValue = String.valueOf(cell.getBooleanCellValue());break;case FORMULA:try {DataFormatter formatter = new DataFormatter();cellValue = formatter.formatCellValue(cell);} catch (IllegalStateException | NumberFormatException e) {cellValue = cell.getCellFormula();}break;case BLANK:// 空单元格cellValue = "";break;case ERROR:// 错误值cellValue = "ERROR (" + ErrorEval.getText(cell.getErrorCellValue()) + ")";break;default:cellValue = "特殊类型";}return cellValue;}

StreamingReader的open

open方法使用InputStream还需要创建临时文件,如果有本地文件,直接给file

java">    /*** Reads a given {@code InputStream} and returns a new* instance of {@code Workbook}. Due to Apache POI* limitations, a temporary file must be written in order* to create a streaming iterator. This process will use* the same buffer size as specified in {@link #bufferSize(int)}.** @param is input stream to read in* @return A {@link Workbook} that can be read from* @throws com.monitorjbl.xlsx.exceptions.ReadException if there is an issue reading the stream*/public Workbook open(InputStream is) {StreamingWorkbookReader workbook = new StreamingWorkbookReader(this);workbook.init(is);return new StreamingWorkbook(workbook);}/*** Reads a given {@code File} and returns a new instance* of {@code Workbook}.** @param file file to read in* @return built streaming reader instance* @throws com.monitorjbl.xlsx.exceptions.OpenException if there is an issue opening the file* @throws com.monitorjbl.xlsx.exceptions.ReadException if there is an issue reading the file*/public Workbook open(File file) {StreamingWorkbookReader workbook = new StreamingWorkbookReader(this);workbook.init(file);return new StreamingWorkbook(workbook);}

WorkbookFactory的create

create方法说明InputStream比file更占内存

java">    /*** Creates the appropriate HSSFWorkbook / XSSFWorkbook from*  the given InputStream.** <p>Your input stream MUST either support mark/reset, or*  be wrapped as a {@link BufferedInputStream}!*  Note that using an {@link InputStream} has a higher memory footprint*  than using a {@link File}.</p>** <p>Note that in order to properly release resources the*  Workbook should be closed after use. Note also that loading*  from an InputStream requires more memory than loading*  from a File, so prefer {@link #create(File)} where possible.**  @param inp The {@link InputStream} to read data from.**  @return The created Workbook**  @throws IOException if an error occurs while reading the data*  @throws EncryptedDocumentException If the Workbook given is password protected*/public static Workbook create(InputStream inp) throws IOException, EncryptedDocumentException {return create(inp, null);}/*** Creates the appropriate HSSFWorkbook / XSSFWorkbook from*  the given File, which must exist and be readable.* <p>Note that in order to properly release resources the*  Workbook should be closed after use.**  @param file The file to read data from.**  @return The created Workbook**  @throws IOException if an error occurs while reading the data*  @throws EncryptedDocumentException If the Workbook given is password protected*/public static Workbook create(File file) throws IOException, EncryptedDocumentException {return create(file, null);}

CSV解析首行

赠送一个csv方法

java">    private static String csv(File file) throws Exception {String encode = KkFileUtils.getFileEncode(file);log.info("编码检测:{}", encode);if ("Big5".equalsIgnoreCase(encode)) {encode = "GB2312";}String res = "";try (FileInputStream fo = new FileInputStream(file);InputStreamReader isr = new InputStreamReader(fo, encode);BufferedReader br = new BufferedReader(isr)) {String line = br.readLine();if (StringUtils.isNotBlank(line)) {res = line;}}return res;}

http://www.ppmy.cn/embedded/17284.html

相关文章

怎么把图片调成a4大小?照片尺寸修改工具

在日常生活中制作各种文件、讲义、PPT文件等内容的时候&#xff0c;图片都会成为重要的一部分&#xff0c;不同的图片格式和大小各有特点&#xff0c;有些图片虽然比较大但画质清晰&#xff0c;有些则方便传输且占用内存较小但分辨率较低&#xff0c;在图片使用的过程中&#x…

大语言模型在专业领域的应用——医疗场景下的大语言模型

大语言模型在专业领域的应用——医疗场景下的大语言模型 构建面向医疗的大语言模型数据资源总结医疗是与人类生活密切相关的重要领域之一。由于具有较强的通用任务解决能力,大语言模型被广泛用于辅助医生处理各种相关医疗任务,例如医疗诊断、临床报告生成、医学语言翻译、心理…

Hive架构原理

Hive Hive 的架构是设计用于在大数据环境下进行数据仓库操作和分析的系统。它建立在 Hadoop 生态系统之上&#xff0c;利用 Hadoop 的存储&#xff08;HDFS&#xff09;和计算&#xff08;MapReduce、Tez、Spark 等&#xff09;能力。 1. 元数据存储&#xff08;Metastore&am…

书生·浦语 大模型(学习笔记-5)XTuner 微调 LLM:1.8B、多模态、Agent

目录 一&#xff1a;两种微调 二、数据的一生 三、微调方案 四、XTuner 五、InternLM2 1.8B模型&#xff08;相关知识&#xff09; 一&#xff1a;两种微调 增量与训练和指令微调的区别 二、数据的一生 原始数据转换为标准格式数据 添加对话模板&#xff0c;直接调用即可…

Ventus(承影):基于RISC V的开源GPGPU

Ventus&#xff08;承影&#xff09;&#xff1a;基于RVV的开源GPGPU 清华大学集成电路学院dsp-lab的承影RVV GPGPU设计文档。 整体目标 提供一个开源的基于RVV的GPGPU实现方案&#xff0c;并给出软件映射方案、指令集&#xff08;支持的指令及特性、添加的自定义指令&#xf…

CSS-vminvmax单位

vmin 和 vmax 单位 vmin 是相对于视口宽度和高度中较小值进行计算&#xff0c;它的值为视口宽度和高度中的较小值的百分比。 例如&#xff0c;如果视口宽度为 800px&#xff0c;高度为 1000px&#xff0c;那么 1vmin 等于 8px&#xff08;800px 的 1%&#xff09;。 vmax 是…

李彦宏:程序员将不复存在! 周鸿祎回怼!网友:先把百度程序员都开除了!

近日&#xff0c;百度创始人、董事长兼首席执行官李彦宏在央视《对话》•开年说的访谈中指出&#xff1a;“基本上说以后其实不会存在“程序员”这种职业了&#xff0c;因为只要会说话&#xff0c;人人都会具备程序员的能力”。 “未来的编程语言只会剩下两种&#xff0c;一种…

掌控基础设施,加速 DevOps 之旅:IaC 深度解析

在当今的 DevOps 世界中&#xff0c;基础设施即代码&#xff08;IaC&#xff09;是一个非常重要的概念。它在整个行业几乎无处不在&#xff0c;是现代工程角色的绝对关键。 本文将主要包含 IaC 的定义和它的好处&#xff0c;同时将 Walrus 作为最佳实践来进行详细讲解。 什么是…