java使用POI实现html和word相互转换

网友投稿 808 2023-01-16


java使用POI实现html和word相互转换

项目后端使用了springboot,maven,前端使用了ckeditor富文本编辑器。目前从html转换的word为doc格式,而图片处理支持的是docx格式,所以需要手动把doc另存为docx,然后才可以进行图片替换。

一.添加maven依赖

主要使用了以下和poi相关的依赖,为了便于获取html的图片元素,还使用了jsoup:

org.apache.poi

poi

3.14

org.apache.poi

poi-scratchpad

3.14

org.apache.poi

poi-ooxml

3.14

fr.opensagres.xdocreport

xdocreport

1.0.6

org.apache.poi

poi-ooxml-schemas

3.14

org.apache.poi

ooxml-schemas

1.3

org.jsoup

jsoup

1.11.3

二.word转换为html

在springboot项目的resources目录下新建static文件夹,将需要转换的word文件temp.docx粘贴进去,由于static是springboot的默认资源文件,所以不需要在配置文件里面另行配置了,如果改成其他名字,需要在application.yml进行相应配置。

doc格式转换为html:

public static String docToHtml() throws Exception {

File path = new File(ResourceUtils.getURL("classpath:").getPath());

String imagePathStr = path.getAbsolutePath() + "\\static\\image\\";

String sourceFileName = path.getAbsolutePath() + "\\static\\test.doc";

String targetFileName = path.getAbsolutePath() + "\\static\\test2.html";

File file = new File(imagePathStr);

if(!file.exists()) {

file.mkdirs();

}

HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(sourceFileName));

org.w3c.dom.Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();

WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(document);

//保存图片,并返回图片的相对路径

wordToHtmlConverter.setPicturesManager((content, pictureType, name, width, height) -> {

try (FileOutputStream out = new FileOutputStream(imagePathStr + name)) {

out.write(content);

} catch (Exception e) {

e.printStackTrace();

}

return "image/" + name;

});

wordToHtmlConverter.processDocument(wordDocument);

org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();

DOMSource domSource = new DOMSource(htmlDocument);

StreamResult streamResult = new StreamResult(new File(targetFileName));

TransformerFactory tf = TransformerFactory.newInstance();

Transformer serializer = tf.newTransformer();

serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8");

serializer.setOutputProperty(OutputKeys.INDENT, "yes");

serializer.setOutputProperty(OutputKeys.METHOD, "html");

serializer.transform(domSource, streamResult);

return targetFileName;

}

docx格式转换为html

public static String docxToHtml() throws Exception {

File path = new File(ResourceUtils.getURL("classpath:").getPath());

String imagePath = path.getAbsolutePath() + "\\static\\image";

String sourceFileName = path.getAbsolutePath() + "\\static\\test.docx";

String targetFileName = path.getAbsolutePath() + "\\static\\test.html";

OutputStreamWriter outputStreamWriter = null;

try {

XWPFDocument document = new XWPFDocument(new FileInputStream(sourceFileName));

XHTMLOptions options = XHTMLOptions.create();

// 存放图片的文件夹

options.setExtractor(new FileImageExtractor(new File(imagePath)));

// html中图片的路径

options.URIResolver(new BasicURIResolver("image"));

outputStreamWriter = new OutputStreamWriter(new FileOutputStream(targetFileName), "utf-8");

XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance();

xhtmlConverter.convert(document, outputStreamWriter, options);

} finally {

if (outputStreamWriter != null) {

outputStreamWriter.close();

}

}

return targetFileName;

}

转换成功后会生成对应的html文件,如果想在前端展示,直接读取文件转换为String返回给前端即可。

public static String readfile(String filePath) {

File file = new YBvgsPqFile(filePath);

InputStream input = null;

try {

input = new FileInputStream(file);

} catch (FileNotFoundException e) {

e.printStackTrace();

}

StringBuffer buffer = new StringBuffer();

byte[] bytes = new byte[1024];

try {

for (int n; (n = input.read(bytes)) != -1;) {

buffer.append(new String(bytes, 0, n, "utf8"));

}

} catch (IOException e) {

e.printStackTrace();

}

return buffer.toString();

}

在富文本编辑器ckeditor中的显示效果:

三.html转换为word

实现思路就是先把html中的所有图片元素提取出来,统一替换为变量字符”${imgReplace}“,如果多张图片,可以依序排列下去,之后生成对应的doc文件(之前试过直接生成docx文件发现打不开,这个问题尚未找到好的解决方法),我们将其另存为docx文件,之后就可以替换变量为图片了:

public static String writeWordFile(String content) {

String path = "D:/wordFile";

Map param = new HashMap();

if (!"".equals(path)) {

File fileDir = new File(path);

if (!fileDir.exists()) {

fileDir.mkdirs();

}

content = HtmlUtils.htmlUnescape(content);

List> imgs = getImgStr(content);

int count = 0;

for (HashMap img : imgs) {

count++;

//处理替换以“/>”结尾的img标签

content = content.replace(img.get("img"), "${imgReplace" + count + "}");

//处理替换以“>”结尾的img标签

content = content.replace(img.get("img1"), "${imgReplace" + count + "}");

Map header = new HashMap();

try {

File filePath = new File(ResourceUtils.getURL("classpath:").getPath());

String imagePath = filePath.getAbsolutePath() + "\\static\\";

imagePath += img.get("src").replaceAll("/", "\\\\");

//如果没有宽高属性,默认设置为400*300

if(img.get("width") == null || img.get("height") == null) {

header.put("width", 400);

header.put("height", 300);

}else {

header.put("width", (int) (Double.parseDouble(img.get("width"))));

header.put("height", (int) (Double.parseDouble(img.get("height"))));

}

header.put("type", "jpg");

header.put("content", OfficeUtil.inputStream2ByteArray(new FileInputStream(imagePath), true));

} catch (FileNotFoundException e) {

e.printStackTrace();

}

param.put("${imgReplace" + count + "}", header);

}

try {

// 生成doc格式的word文档,需要手动改为docx

byte by[] = content.getBytes("UTF-8");

ByteArrayInputStream bais = new ByteArrayInputStream(by);

POIFSFileSystem poifs = new POIFSFileSystem();

DirectoryEntry directory = poifs.getRoot();

DocumentEntry documentEntry = directory.createDocument("WordDocument", bais);

FileOutputStream ostream = new FileOutputStream("D:\\wordFile\\temp.doc");

poifs.writeFilesystem(ostream);

bais.close();

ostream.close();

// 临时文件(手动改好的docx文件)

CustomXWPFDocument doc = OfficeUtil.generateWord(param, "D:\\wordFile\\temp.docx");

//最终生成的带图片的word文件

FileOutputStream fopts = new FileOutputStream("D:\\wordFile\\final.docx");

doc.write(fopts);

fopts.close();

} catch (Exception e) {

e.printStackTrace();

}

}

return "D:/wordFile/final.docx";

}

//获取html中的图片元素信息

public static List> getImgStr(String htmlStr) {

List> pics = new ArrayList>();

Document doc = Jsoup.parse(htmlStr);

Elements imgs = doc.select("img");

for (Element img : imgs) {

HashMap map = new HashMap();

if(!"".equals(img.attr("width"))) {

map.put("width", img.attr("width").substring(0, img.attr("width").length() - 2));

}

if(!"".equals(img.attr("height"))) {

map.put("height", img.attr("height").substring(0, img.attr("height").length() - 2));

}

map.put("img", img.toString().substring(0, img.toString().length() - 1) + "/>");

map.put("img1", img.toString());

map.put("src", img.attr("src"));

pics.add(map);

}

return pics;

}

OfficeUtil工具类,之前发现网上的写法只支持一张图片的修改,多张图片就会报错,是因为添加了图片,processParagraphs方法中的runs的大小改变了,会报ArrayList的异常,就和我们循环list中删除元素会报异常道理一样,解决方法就是复制一个新的Arraylist进行循环即可:

package com.example.demo.util;

import java.io.ByteArrayInputStream;

import java.io.FileInputStream;

import java.io.IOException;

import java.io.InputStream;

import java.util.ArrayList;

import java.util.Iterator;

import java.util.List;

import java.util.Map;

import java.util.Map.Entry;

import org.apache.poi.POIXMLDocument;

import org.apache.poi.hwpf.extractor.WordExtractor;

import org.apache.poi.openxml4j.opc.OPCPackage;

import org.apache.poi.xwpf.usermodel.XWPFParagraph;

import org.apache.poi.xwpYBvgsPqf.usermodel.XWPFRun;

import org.apache.poi.xwpf.usermodel.XWPFTable;

import org.apache.poi.xwpf.usermodel.XWPFTableCell;

import org.apache.poi.xwpf.usermodel.XWPFTableRow;

/**

* 适用于word 2007

*/

public class OfficeUtil {

/**

* 根据指定的参数值、模板,生成 word 文档

* @param param 需要替换的变量

* @param template 模板

*/

public static CustomXWPFDocument generateWord(Map param, String template) {

CustomXWPFDocument doc = null;

try {

OPCPackage pack = POIXMLDocument.openPackage(template);

doc = new CustomXWPFDocument(pack);

if (param != null && param.size() > 0) {

//处理段落

List paragraphList = doc.getParagraphs();

processParagraphs(paragraphList, param, doc);

//处理表格

Iterator it = doc.getTablesIterator();

while (it.hasNext()) {

XWPFTable table = it.next();

List rows = table.getRows();

for (XWPFTableRow row : rows) {

List cells = row.getTableCells();

for (XWPFTableCell cell : cells) {

List paragraphListTable = cell.getParagraphs();

processParagraphs(paragraphListTable, param, doc);

}

}

}

}

} catch (Exception e) {

e.printStackTrace();

}

return doc;

}

/**

* 处理段落

* @param paragraphList

*/

public static void processParagraphs(List paragraphList,Map param,CustomXWPFDocument doc){

if(paragraphList != null && paragraphList.size() > 0){

for(XWPFParagraph paragraph:paragraphList){

//poi转换过来的行间距过大,需要手动调整

if(paragraph.getSpacingBefore() >= 1000 || paragraph.getSpacingAfter() > 1000) {

paragraph.setSpacingBefore(0);

paragraph.setSpacingAfter(0);

}

//设置word中左右间距

paragraph.setIndentationLeft(0);

paragraph.setIndentationRight(0);

List runs = paragraph.getRuns();

//加了图片,修改了paragraph的runs的size,所以循环不能使用runs

List allRuns = new ArrayList(runs);

for (XWPFRun run : allRuns) {

String text = run.getText(0);

if(text != null){

boolean isSetText = false;

for (Entry entry : param.entrySet()) {

String key = entry.getKey();

if(text.indexOf(key) != -1){

isSetText = true;

Object value = entry.getValue();

if (value instanceof String) {//文本替换

text = text.replace(key, value.toString());

} else if (value instanceof Map) {//图片替换

text = text.replace(key, "");

Map pic = (Map)value;

int width = Integer.parseInt(pic.get("width").toString());

int height = Integer.parseInt(pic.get("height").toString());

int picType = getPictureType(pic.get("type").toString());

byte[] byteArray = (byte[]) pic.get("content");

ByteArrayInputStream byteInputStream = new ByteArrayInputStream(byteArray);

try {

String blipId = doc.addPictureData(byteInputStream,picType);

doc.createPicture(blipId,doc.getNextPicNameNumber(picType), width, height,paragraph);

} catch (Exception e) {

e.printStackTrace();

}

}

}

}

if(isSetText){

run.setText(text,0);

}

}

}

}

}

}

/**

* 根据图片类型,取得对应的图片类型代码

* @param picType

* @return int

*/

private static int getPictureType(String picType){

int res = CustomXWPFDocument.PICTURE_TYPE_PICT;

if(picType != null){

if(picType.equalsIgnoreCase("png")){

res = CustomXWPFDocument.PICTURE_TYPE_PNG;

}else if(picType.equalsIgnoreCase("dib")){

res = CustomXWPFDocument.PICTURE_TYPE_DIB;

}else if(picType.equalsIgnoreCase("emf")){

res = CustomXWPFDocument.PICTURE_TYPE_EMF;

}else if(picType.equalsIgnoreCase("jpg") || picType.equalsIgnoreCase("jpeg")){

res = CustomXWPFDocument.PICTURE_TYPE_JPEG;

}else if(picType.equalsIgnoreCase("wmf")){

res = CustomXWPFDocument.PICTURE_TYPE_WMF;

}

}

return res;

}

/**

* 将输入流中的数据写入字节数组

* @param in

* @return

*/

public static byte[] inputStream2ByteArray(InputStream in,boolean isClose){

byte[] byteArray = null;

try {

int total = in.available();

byteArray = new byte[total];

in.read(byteArray);

} catch (IOException e) {

e.printStackTrace();

}finally{

if(isClose){

try {

in.close();

} catch (Exception e2) {

System.out.println("关闭流失败");

}

}

}

return byteArray;

}

}

我认为之所以word2003不支持图片替换,主要是处理2003版本的HWPFDocument对象被声明为了final,我们就无法重写他的方法了。而处理2007版本的类为XWPFDocument,是可以继承的,通过继承XWPFDocument,重写createPicture方法即可实现图片替换,以下为对应的CustomXWPFDocument类:

package com.example.demo.util;

import java.io.IOException;

import java.io.InputStream;

import org.apache.poi.openxml4j.opc.OPCPackage;

import org.apache.poi.xwpf.usermodel.XWPFDocument;

import org.apache.poi.xwpf.usermodel.XWPFParagraph;

import org.apache.xmlbeans.XmlException;

import org.apache.xmlbeans.XmlToken;

import org.openxmlformats.schemas.drawingml.x2006.main.CTNonVisualDrawingProps;

import org.openxmlformats.schemas.drawingml.x2006.main.CTPositiveSize2D;

import org.openxmlformats.schemas.drawingml.x2006.wordprocessingDrawing.CTInline;

/**

* 自定义 XWPFDocument,并重写 createPicture()方法

*/

public class CustomXWPFDocument extends XWPFDocument {

public CustomXWPFDocument(InputStream in) throws IOException {

super(in);

}

public CustomXWPFDocument() {

super();

}

public CustomXWPFDocument(OPCPackage pkg) throws IOException {

super(pkg);

}

/**

* @param ind

* @param width 宽

* @param height 高

* @param paragraph 段落

*/

public void createPicture(String blipId, int ind, int width, int height,XWPFParagraph paragraph) {

final int EMU = 9525;

width *= EMU;

height *= EMU;

CTInline inline = paragraph.createRun().getCTR().addNewDrawing().addNewInline();

String picXml = ""

+ ""

+ " "

+ " "

+ " " + "

+ ind

+ "\" name=\"Generated\"/>"

+ " "

+ "

"

+ ind

+ "\" name=\"Generated\"/>"

+ " "

+ "

+ " "

+ "

+ blipId

+ "\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\"/>"

+ " "

+ " "

+ " "

+ " "

+ " "

+ " "

+ " "

+ "

+ width

+ "\" cy=\""

+ height

+ "\"/>"

+ " "

+ " "

+ " "

+ " "

+ " "

+ " "

+ " " + "";

inline.addNewGraphic().addNewGraphicData();

XmlToken xmlToken = null;

try {

xmlToken = XmlToken.Factory.parse(picXml);

} catch (XmlException xe) {

xe.printStackTrace();

}

inline.set(xmlToken);

inline.setDistT(0);

inline.setDistB(0);

inline.setDistL(0);

inline.setDistR(0);

CTPositiveSize2D extent = inline.addNewExtent();

extent.setCx(width);

extent.setCy(height);

CTNonVisualDrawingProps docPr = inline.addNewDocPr();

docPr.setId(ind);

docPr.setName("图片" + ind);

docPr.setDescr("测试");

}

}

以上就是通过POI实现html和word的相互转换,对于html无法转换为可读的docx这个问题尚未解决,如果大家有好的解决方法可以交流一下。

+ blipId

+ "\" xmlns:r=\"http://schemas.openxmlformats.org/officeDocument/2006/relationships\"/>"

+ " "

+ " "

+ " "

+ " "

+ " "

+ " "

+ " "

+ "

+ width

+ "\" cy=\""

+ height

+ "\"/>"

+ " "

+ " "

+ " "

+ " "

+ " "

+ " "

+ " " + "";

inline.addNewGraphic().addNewGraphicData();

XmlToken xmlToken = null;

try {

xmlToken = XmlToken.Factory.parse(picXml);

} catch (XmlException xe) {

xe.printStackTrace();

}

inline.set(xmlToken);

inline.setDistT(0);

inline.setDistB(0);

inline.setDistL(0);

inline.setDistR(0);

CTPositiveSize2D extent = inline.addNewExtent();

extent.setCx(width);

extent.setCy(height);

CTNonVisualDrawingProps docPr = inline.addNewDocPr();

docPr.setId(ind);

docPr.setName("图片" + ind);

docPr.setDescr("测试");

}

}

以上就是通过POI实现html和word的相互转换,对于html无法转换为可读的docx这个问题尚未解决,如果大家有好的解决方法可以交流一下。

+ width

+ "\" cy=\""

+ height

+ "\"/>"

+ " "

+ " "

+ " "

+ " "

+ " "

+ " "

+ " " + "";

inline.addNewGraphic().addNewGraphicData();

XmlToken xmlToken = null;

try {

xmlToken = XmlToken.Factory.parse(picXml);

} catch (XmlException xe) {

xe.printStackTrace();

}

inline.set(xmlToken);

inline.setDistT(0);

inline.setDistB(0);

inline.setDistL(0);

inline.setDistR(0);

CTPositiveSize2D extent = inline.addNewExtent();

extent.setCx(width);

extent.setCy(height);

CTNonVisualDrawingProps docPr = inline.addNewDocPr();

docPr.setId(ind);

docPr.setName("图片" + ind);

docPr.setDescr("测试");

}

}

以上就是通过POI实现html和word的相互转换,对于html无法转换为可读的docx这个问题尚未解决,如果大家有好的解决方法可以交流一下。


版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:新产品研发管理平台(产品研发管理软件)
下一篇:Spring Boot实现Undertow服务器同时支持HTTP2、HTTPS的方法
相关文章

 发表评论

暂时没有评论,来抢沙发吧~