sxssfworkbook
之前报表导出使用得是XSSFWorkbook 但是导出数据量过大的时候经常出现OOM,现在发现使用sxssfworkbook 减少内存压力
官网是这样介绍的:
SXSSF (package: org.apache.poi.xssf.streaming) is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.
翻译过来就是:
SXSSF (package: org.apache.poi.xssf.streaming)是XSSF的api兼容流扩展,用于必须生成非常大的电子表格,并且堆空间有限的情况下。 SXSSF通过限制对滑动窗口内的行的访问来实现低内存占用,而XSSF允许访问文档中的所有行。 不再在窗口中的旧行在被写入磁盘时变得不可访问。
简单来说 在创建 SXSSFWorkbook 对象时,你可以设置缓存和窗口的大小。这可以通过构造方法 SXSSFWorkbook(int rowAccessWindowSize) 来完成,其中 rowAccessWindowSize 表示在内存中保持的行数。这样,当超过这个数量时,将会把数据写入磁盘。然后分批写入文件。减少内存占用
简单的Demo:
import junit.framework.Assert;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;public static void main(String[] args) throws Throwable {// 设置内存最大行数是1000行 (如果不带参数则默认是100行,也可以选择-1 就是全量输出)SXSSFWorkbook wb = new SXSSFWorkbook(100); //创建sheet页Sheet sh = wb.createSheet();for(int rownum = 0; rownum < 1000; rownum++){// 创建行Row row = sh.createRow(rownum);for(int cellnum = 0; cellnum < 10; cellnum++){//创建单元格Cell cell = row.createCell(cellnum);String address = new CellReference(cell).formatAsString();//设置单元格的值cell.setCellValue(address);}}//rownum < 900的行被刷新并且不可访问 (因为他设置了100行,所以他每一百行会从磁盘中加载到内存中,写入完成后 就会在内存中删除掉,接着会下一个一百行)for(int rownum = 0; rownum < 900; rownum++){//Assert.assertNotNull() 方法用于验证给定的对象不为空(即非 null)。如果 sh.getRow(rownum) 返回的对象为 null,将会抛出 AssertionError。Assert.assertNull(sh.getRow(rownum));}// 最后100行仍然在内存中for(int rownum = 900; rownum < 1000; rownum++){Assert.assertNotNull(sh.getRow(rownum));}//输出流FileOutputStream out = new FileOutputStream("/temp/sxssf.xlsx");//写入wb.write(out);out.close();wb.dispose();}
源码分析:
构造函数:
如果是有现有得模板,需要在模板加入数据,需要先创建XSSFWorkbook 然后在放入SXSSFWorkbook
public SXSSFWorkbook(XSSFWorkbook workbook, int rowAccessWindowSize, boolean compressTmpFiles) {this._sxFromXHash = new HashMap();this._xFromSxHash = new HashMap();//默认100行this._randomAccessWindowSize = 100;//这个是用来判断是否创建临时文件的this._compressTmpFiles = false;this.setRandomAccessWindowSize(rowAccessWindowSize);this.setCompressTempFiles(compressTmpFiles);if (workbook == null) {this._wb = new XSSFWorkbook();} else {this._wb = workbook;for(int i = 0; i < this._wb.getNumberOfSheets(); ++i) {XSSFSheet sheet = this._wb.getSheetAt(i);this.createAndRegisterSXSSFSheet(sheet);}}}
创建临时文件: 其实在创建Sheet页得时候就创建了这个临时文件
//创建SheetSheetDataWriter createSheetDataWriter() throws IOException {//判断 compressTmpFiles 构造函数默认是false,所以第一次默认执行 SheetDataWriterreturn (SheetDataWriter)(this._compressTmpFiles ? new GZIPSheetDataWriter() : new SheetDataWriter());}//这个地方其实就是创建临时文件
public SheetDataWriter() throws IOException {this._out = this.createWriter(this._fd);}public Writer createWriter(File fd) throws IOException {return new BufferedWriter(new FileWriter(fd));}//存储的路径:这样看其实是存在java.io.tmpdir的系统的环境变量下面
private void createPOIFilesDirectory() throws IOException {if (this.dir == null) {String tmpDir = System.getProperty("java.io.tmpdir");if (tmpDir == null) {throw new IOException("Systems temporary directory not defined - set the -Djava.io.tmpdir jvm property!");}this.dir = new File(tmpDir, "poifiles");}this.createTempDirectory(this.dir);}
写入文件
public void write(OutputStream stream) throws IOException {Iterator i$ = this._xFromSxHash.values().iterator();while(i$.hasNext()) {SXSSFSheet sheet = (SXSSFSheet)i$.next();sheet.flushRows();}File tmplFile = File.createTempFile("poi-sxssf-template", ".xlsx");try {FileOutputStream os = new FileOutputStream(tmplFile);try {this._wb.write(os);} finally {os.close();}this.injectData(tmplFile, stream);} finally {//这个地方会删除掉临时文件tmplFile.delete();}}
创建行得时候 入临时文件
public Row createRow(int rownum) {
//这个是最大得行数 1048576 int maxrow = SpreadsheetVersion.EXCEL2007.getLastRowIndex();if (rownum >= 0 && rownum <= maxrow) {if (rownum <= this._writer.getLastFlushedRow()) {throw new IllegalArgumentException("Attempting to write a row[" + rownum + "] " + "in the range [0," + this._writer.getLastFlushedRow() + "] that is already written to disk.");} else if (this._sh.getPhysicalNumberOfRows() > 0 && rownum <= this._sh.getLastRowNum()) {throw new IllegalArgumentException("Attempting to write a row[" + rownum + "] " + "in the range [0," + this._sh.getLastRowNum() + "] that is already written to disk.");} else {//获取要创建得行数Row previousRow = rownum > 0 ? this.getRow(rownum - 1) : null;int initialAllocationSize = 0;if (previousRow != null) {initialAllocationSize = previousRow.getLastCellNum();}if (initialAllocationSize <= 0 && this._writer.getNumberOfFlushedRows() > 0) {initialAllocationSize = this._writer.getNumberOfCellsOfLastFlushedRow();}if (initialAllocationSize <= 0) {initialAllocationSize = 10;}SXSSFRow newRow = new SXSSFRow(this, initialAllocationSize);this._rows.put(new Integer(rownum), newRow);// 这儿进行了判断,如果当前行数大于randomAccessWindowSize ,则flushRows 刷新内存区域if (this._randomAccessWindowSize >= 0 && this._rows.size() > this._randomAccessWindowSize) {try {this.flushRows(this._randomAccessWindowSize);} catch (IOException var7) {throw new RuntimeException(var7);}}return newRow;}} else {throw new IllegalArgumentException("Invalid row number (" + rownum + ") outside allowable range (0.." + maxrow + ")");}}
根据官网得描述 他这个刷新应该是一行一行得进行刷新:
一个包含100行窗口的工作表。 当行数达到101时,rownum=0的行被刷新到磁盘并从内存中删除,当rownum达到102时,rownum=1的行被刷新,依此类推。
总结
工作原理:
缓冲和分段写入: 当创建 SXSSFWorkbook 对象时,它会创建一个基于硬盘的滑动窗口(window)。数据不会一次性全部写入内存,而是被分为一系列窗口(windows),每个窗口中包含一定数量的行。这样可以限制内存占用,只有当前窗口中的数据会被加载到内存中。
滑动窗口的使用: 当超过窗口容量时,SXSSF 将当前窗口中的数据写入到临时文件中,并将窗口滑动到下一段数据。这种方式实现了数据的分段处理和写入,减少了内存压力。
Flush 和 Close: 当操作完成后,需要调用 flush() 方法来强制将数据写入到临时文件中。最后,调用 close() 方法关闭 SXSSFWorkbook 对象,释放资源并删除临时文件。
实测25W行得报表是没有问题得