Java读取、写入文件如何解决乱码问题-eolink官网

Java读取、写入文件如何解决乱码问题

读取文件流时，经常会遇到乱码的现象，造成乱码的原因当然不可能是一个，这里主要介绍因为文件编码格式而导致的乱码的问题。首先，明确一点，文本文件与二进制文件的概念与差异。

文本文件是基于字符编码的文件，常见的编码有ASCII编码，UNICODE编码、ANSI编码等等。二进制文件是基于值编码的文件，你可以根据具体应用，指定某个值是什么意思（这样一个过程，可以看作是自定义编码。）

因此可以看出文本文件基本上是定长编码的(也有非定长的编码如UTF-8)。而二进制文件可看成是变长编码的，因为是值编码嘛，多少个比特代表一个值，完全由你决定。

对于二进制文件，是千万不能使用字符串的，因为字符串默认初始化时会使用系统默认编码，然而，二进制文件因为自定义编码自然与固定格式的编码会有所冲突，所以对于二进制的文件只能采用字节流读取、操作、写入。

对于文本文件，因为编码固定，所以只要在读取文件之前，采用文件自身的编码格式解析文件，然后获取字节，再然后，通过指定格式初始化字符串，那么得到的文本是不会乱码的。虽然，二进制文件也可以获取到它的文本编码格式，但是那是不准确的，所以不能同日而语。

具体操作如下：

1）获取文本文件的格式

public static String getFileEncode(String path) {

String charset ="asci";

byte[] first3Bytes = new byte[3];

BufferedInputStream bis = null;

try {

boolean checked = false;

bis = new BufferedInputStream(new FileInputStream(path));

bis.mark(0);

int read = bis.read(first3Bytes, 0, 3);

if (read == -1)

return charset;

if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {

charset = "Unicode";//UTF-16LE

checked = true;

} else if (first3Bytes[0] == (byte) 0xFE && first3Bytes[1] == (byte) 0xFF) {

charset = "Unicode";//UTF-16BE

checked = true;

} else if (first3Bytes[0] == (byte) 0xEF && first3Bytes[1] == (byte) 0xBB && first3Bytes[2] == (byte) 0xBF) {

charset = "UTF8";

checked = true;

}

bis.reset();

if (!checked) {

int len = 0;

int loc = 0;

while ((read = bis.read()) != -1) {

loc++;

if (read >= 0xF0)

break;

if (0x80 <= read && read <= 0xBF) //单独出现BF以下的，也算是GBK

break;

if (0xC0 <= read && read <= http://0xDF) {

read = bis.read();

if (0x80 <= read && read <= 0xBF)

//双字节 (0xC0 - 0xDF) (0x80 - 0xBF),也可能在GB编码内

continue;

else

break;

} else if (0xE0 <= read && read <= 0xEF) { //也有可能出错，但是几率较小

read = bis.read();

if (0x80 <= read && read <= 0xBF) {

read = bis.read();

if (0x80 <= read && read <= 0xBF) {

charset = "UTF-8";

break;

} else

break;

} else

break;

}

//TextLogger.getLogger().info(loc + " " + Integer.toHexString(read));

}

} catch (Exception e) {

e.printStackTrace();

} finally {

if (bis != null) {

try {

bis.close();

} catch (IOException ex) {

}

return charset;

}

private static String getEncode(int flag1, int flag2, int flag3) {

String encode="";

// txt文件的开头会多出几个字节，分别是FF、FE（Unicode）,

// FE、FF（Unicode big endian）,EF、BB、BF（UTF-8）

if (flag1 == 255 && flag2 == 254) {

encode="Unicode";

}

else if (flag1 == 254 && flag2 == 255) {

encode="UTF-16";

}

else if (flag1 == 239 && flag2 == 187 && flag3 == 191) {

encode="UTF8";

}

else {

encode="asci";// ASCII码

}

return encode;

}

2）通过文件的编码格式读取文件流

/**

* 通过路径获取文件的内容，这个方法因为用到了字符串作为载体，为了正确读取文件（不乱码），只能读取文本文件，安全方法！

public static String readFile(String path){

String data = null;

// 判断文件是否存在

File file = new File(path);

if(!file.exists()){

return data;

}

// 获取文件编码格式

String code = FileEncode.getFileEncode(path);

InputStreamReader isr = null;

try{

// 根据编码格式解析文件

if("asci".equals(code)){

// 这里采用GBK编码，而不用环境编码格式，因为环境默认编码不等于操作系统编码

// code = System.getProperty("file.encoding");

code = "GBK";

}

isr = new InputStreamReader(new FileInputStream(file),code);

// 读取文件内容

int length = -1 ;

char[] buffer = new char[1024];

StringBuffer sb = new StringBuffer();

while((length = isr.read(buffer, 0, 1024) ) != -1){

sb.append(buffer,0,length);

}

data = new String(sb);

}catch(Exception e){

e.printStackTrace();

log.info("getFile IO Exception:"+e.getMessage());

}finally{

try {

if(isr != null){

isr.close();

}

} catch (IOException e) {

e.printStackTrace();

log.info("getFile IO Exception:"+e.getMessage());

}

return data;

}

3）通过文件指定的格式写入文件

/**

* 按照指定的路径和编码格式保存文件内容，这个方法因为用到了字符串作为载体，为了正确写入文件（不乱码），只能写入文本内容，安全方法

* @param data

* 将要写入到文件中的字节数据

* @param path

* 文件路径,包含文件名

* @return boolean

* 当写入完毕时返回true;

public static boolean writeFile(byte data[], String path , String code){

boolean flag = true;

OutputStreamWriter osw = null;

try{

File file = new File(path);

if(!file.exists()){

file = new File(file.getParent());

if(!file.exists()){

file.mkdirs();

}

if("asci".equals(code)){

code = "GBK";

}

osw = new OutputStreamWriter(new FileOutputStream(path),code);

osw.write(new String(data,code));

osw.flush();

}catch(Exception e){

e.printStackTrace();

log.info("toFile IO Exception:"+e.getMessage());

flag = false;

}finally{

try{

if(osw != null){

osw.close();

}

}catch(IOException e){

e.printStackTrace();

log.info("toFile IO Exception:"+e.getMessage());

flag = false;

}

return flag;

}

4）对于二进制文件而且内容很少的，例如Word文档等，可以使用如下方式读取、写入文件

/**

* 从指定路径读取文件到字节数组中,对于一些非文本格式的内容可以选用这个方法

* 457364578634785634534

* @param path

* 文件路径,包含文件名

* @return byte[]

* 文件字节数组

public static byte[] getFile(String path) throws IOException {

FileInputStream stream=new FileInputStream(path);

int size=stream.available();

byte data[]=new byte[size];

stream.read(data);

stream.close();

stream=null;

return data;

}

/**

* 把字节内容写入到对应的文件，对于一些非文本的文件可以采用这个方法。

* @param data

* 将要写入到文件中的字节数据

* @param path

* 文件路径,包含文件名

* @return boolean isOK 当写入完毕时返回true;

* @throws Exception

public static boolean toFile(byte data[], String path) throws Exception {

FileOutputStream out=new FileOutputStream(path);

out.write(datahttp://);

out.flush();

out.close();

out=null;

return true;

}

以上就是本文的全部内容，希望对大家的学习有所帮助。

Flask接口签名sign原理与实例代码浅析

296 2023-07-30

Java读取、写入文件如何解决乱码问题

多平台统一管理软件接口，如何实现多平台统一管理软件接口

Flask接口签名sign原理与实例代码浅析

java中的接口是类吗

推荐文章

接口调用是什么意思？几种常用接口调用方式

接口设计原则

8款在线 API 接口文档管理工具

api管理系统是什么？

什么是接口调试？接口调试的步骤有哪些？

api 接口管理系统有哪些？

接口测试有几种测试方法

API文档生成工具有哪些？

微服务和api网关区别

交换机配置步骤

最近发表

热评文章

在线接口文档管理工具推荐，支持在线测试，HTTP接口

开源的在线接口文档wiki工具Mindoc的介绍与使

如何优雅的进行接口设计？接口设计的六大原则是什么？

什么是API测试,api检测公司

遇到百度网址安全中心提醒您该页面可能存在钓鱼欺诈信息

软件接口设计怎么做？前后端分离软件接口设计思路

Java读取、写入文件如何解决乱码问题

微信扫一扫：分享

推荐文章

最近发表

热评文章