目录
- 前言
- 依赖
- 代码
- StreamingReader的open
- WorkbookFactory的create
- CSV解析首行
前言
poi解析大文件可能出现oom,通样大小文件,xlsx会oom,xls不会,所以使用流式的方式改造解析xlsx文件的代码。
我的需求是提取每一页的表头,所以值提取第一行,如果需要全部内容,把行数循环地方的if改成while就行。
产生Workbook给的入参都是给的File,没用Inputstream,原因见后两章
依赖
可能不全,按需引入更多poi相关依赖
<dependency><groupId>org.apache.poi</groupId><artifactId>poi</artifactId><version>4.1.2</version></dependency><dependency><groupId>com.monitorjbl</groupId><artifactId>xlsx-streamer</artifactId><version>2.2.0</version></dependency>
代码
java">//newExcel true表示xlsx,false表示xls
private static String excel(File file, boolean newExcel) throws Exception {StringBuilder sb = new StringBuilder();Workbook workbook = null;try {if (newExcel) {//我只取第一行,你们按需调大workbook = StreamingReader.builder().rowCacheSize(1).bufferSize(10240).open(file);} else {//xls的还是普通方式workbook = WorkbookFactory.create(file);}getSheet(sb, workbook);} finally {if (workbook != null) {workbook.close();}}return sb.toString();}private static void getSheet(StringBuilder sb, Workbook workbook) throws IOException {try {sb.append(LEFT_MIDDLE_BRACKET);Iterator<Sheet> sheetIterator = workbook.iterator();while (sheetIterator.hasNext()) {Sheet sheet = sheetIterator.next();String sheetName = sheet.getSheetName();if (sheetName.startsWith(BTXA_MASK)) {continue;}sb.append(sheetName).append(C_COLON);Iterator<Row> rowIterator = sheet.iterator();//想取所有行的人把if改成whileif (rowIterator.hasNext()) {Row row = rowIterator.next();firstLine(sb, row);if (sheetIterator.hasNext()) {sb.append(C_SEMICOLON);}}}sb.append(RIGHT_MIDDLE_BRACKET);} finally {workbook.close();}}private static void firstLine(StringBuilder sb, Row row) {try {int numOfColumns = row.getLastCellNum();for (int j = 0; j < numOfColumns; j++) {Cell cell = row.getCell(j, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);sb.append(getCell(cell));if (j != numOfColumns - 1) {sb.append(C_COMMA);}}} catch (Exception e) {log.error("异常列", e);sb.append("异常列");}}private static String getCell(Cell cell) {// 获取单元格类型CellType cellType = cell.getCellType();// 根据单元格类型处理值String cellValue;switch (cellType) {case STRING:cellValue = cell.getStringCellValue();break;case NUMERIC:cellValue = String.valueOf(cell.getNumericCellValue());break;case BOOLEAN:cellValue = String.valueOf(cell.getBooleanCellValue());break;case FORMULA:try {DataFormatter formatter = new DataFormatter();cellValue = formatter.formatCellValue(cell);} catch (IllegalStateException | NumberFormatException e) {cellValue = cell.getCellFormula();}break;case BLANK:// 空单元格cellValue = "";break;case ERROR:// 错误值cellValue = "ERROR (" + ErrorEval.getText(cell.getErrorCellValue()) + ")";break;default:cellValue = "特殊类型";}return cellValue;}
StreamingReader的open
open方法使用InputStream还需要创建临时文件,如果有本地文件,直接给file
java"> /*** Reads a given {@code InputStream} and returns a new* instance of {@code Workbook}. Due to Apache POI* limitations, a temporary file must be written in order* to create a streaming iterator. This process will use* the same buffer size as specified in {@link #bufferSize(int)}.** @param is input stream to read in* @return A {@link Workbook} that can be read from* @throws com.monitorjbl.xlsx.exceptions.ReadException if there is an issue reading the stream*/public Workbook open(InputStream is) {StreamingWorkbookReader workbook = new StreamingWorkbookReader(this);workbook.init(is);return new StreamingWorkbook(workbook);}/*** Reads a given {@code File} and returns a new instance* of {@code Workbook}.** @param file file to read in* @return built streaming reader instance* @throws com.monitorjbl.xlsx.exceptions.OpenException if there is an issue opening the file* @throws com.monitorjbl.xlsx.exceptions.ReadException if there is an issue reading the file*/public Workbook open(File file) {StreamingWorkbookReader workbook = new StreamingWorkbookReader(this);workbook.init(file);return new StreamingWorkbook(workbook);}
WorkbookFactory的create
create方法说明InputStream比file更占内存
java"> /*** Creates the appropriate HSSFWorkbook / XSSFWorkbook from* the given InputStream.** <p>Your input stream MUST either support mark/reset, or* be wrapped as a {@link BufferedInputStream}!* Note that using an {@link InputStream} has a higher memory footprint* than using a {@link File}.</p>** <p>Note that in order to properly release resources the* Workbook should be closed after use. Note also that loading* from an InputStream requires more memory than loading* from a File, so prefer {@link #create(File)} where possible.** @param inp The {@link InputStream} to read data from.** @return The created Workbook** @throws IOException if an error occurs while reading the data* @throws EncryptedDocumentException If the Workbook given is password protected*/public static Workbook create(InputStream inp) throws IOException, EncryptedDocumentException {return create(inp, null);}/*** Creates the appropriate HSSFWorkbook / XSSFWorkbook from* the given File, which must exist and be readable.* <p>Note that in order to properly release resources the* Workbook should be closed after use.** @param file The file to read data from.** @return The created Workbook** @throws IOException if an error occurs while reading the data* @throws EncryptedDocumentException If the Workbook given is password protected*/public static Workbook create(File file) throws IOException, EncryptedDocumentException {return create(file, null);}
CSV解析首行
赠送一个csv方法
java"> private static String csv(File file) throws Exception {String encode = KkFileUtils.getFileEncode(file);log.info("编码检测:{}", encode);if ("Big5".equalsIgnoreCase(encode)) {encode = "GB2312";}String res = "";try (FileInputStream fo = new FileInputStream(file);InputStreamReader isr = new InputStreamReader(fo, encode);BufferedReader br = new BufferedReader(isr)) {String line = br.readLine();if (StringUtils.isNotBlank(line)) {res = line;}}return res;}