收纳进此专辑:I/O流官方中文指南系列概述及索引
大部分内容来自 The Java™ Tutorials 官方指南,其余来自别处如ifeve的译文、imooc、书籍Android面试宝典等等。
作者: @youyuge
个人博客站点: https://youyuge.cn
一、什么是字符流
什么又是字节流?它们之间又有什么关系?请先仔细阅读我的这篇,对字符编码有个彻底的认知。
看完之后,你应该理解地比较透彻了,我个人的总结理解如下
- 首先,计算机中文件存储是二进制存储字节,CPU也只能读取二进制也就是0和1(你问为啥?百度一下),但是如何0和1怎么表示我们的英文字母和汉字呢?
- 为了能用二进制来表示我们生活中的字符,我们做了人为的规定,也就是编码集,但是编码的规定是人定的,是有很多种的,比如我们对一个汉字编码,同一个汉字在不同的规定下是不一样的:
汉字“尤” --------> 5C24 (UTF-8编码方式)
汉字“尤” --------> D3C8 (GBK编码方式)
所以,我们用UTF-8格式写了一个txt文件,实际计算机会将我们的字符编码成二进制的0和1存储起来。当我们用UTF-8格式打开它的时候,文本编辑器会根据UTF-8的转换规则,把二级制的一大堆0和1解码成我们人看的懂的字符,并且显示出来。
所谓的打开txt乱码也就很好解释了, 我们用UTF-8的编码准则,去打开一个用GBK准则写的txt,就会乱码。通俗的说,两个编码准则其实是对一大堆二进制0和1的不同翻译罢了。
更通俗的一个例子,我有一句中文“鱿鱼最好吃”,转换成字母(编码):you yu zui hao chi,把它存储起来,发给别人。别人打开文件,看到“you yu zui hao chi”,他用英语这个去翻译(解码),翻译不出来,不明所以。所以他又用中文这种解码去翻译,发现这不就是拼音吗,大概就知道了意思。在这个例子里,中文就是一个编码准则(如GBK),英文也是一种(如UTF-8),而字母就是字节,底层的二进制。
二、字符流官方定义Character Streams
The Java platform stores character values using Unicode conventions. Character stream I/O automatically translates this internal format to and from the local character set. In Western locales, the local character set is usually an 8-bit superset of ASCII.
Java平台使用Unicode标准去存储字符的值。字符流I/O自动把这种内在的形式转换成本地字符编码集。在西方,本地字符集通常是ASCII的8比特的超集。
三、使用字符流Using Character Streams
All character stream classes are descended from Reader
and Writer
. As with byte streams, there are character stream classes that specialize in file I/O: FileReader
and FileWriter
. TheCopyCharacters
example illustrates these classes.
所有的字符流的类都是从Reader和Writer继承而来。和字节流一样,有专门对文件读写的字符流的类: FileReader and FileWriter。如下是用字符流复制文件:
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
public class CopyCharacters {
public static void main(String[] args) throws IOException {
FileReader inputStream = null;
FileWriter outputStream = null;
try {
inputStream = new FileReader("xanadu.txt");
outputStream = new FileWriter("characteroutput.txt");
int c;
while ((c = inputStream.read()) != -1) {
outputStream.write(c);
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (outputStream != null) {
outputStream.close();
}
}
}
}
CopyCharacters is very similar to CopyBytes. The most important difference is that CopyCharacters uses FileReader and FileWriter for input and output in place of FileInputStream and FileOutputStream. Notice that both CopyBytes and CopyCharacters use an int variable to read to and write from. However, in CopyCharacters, the int variable holds a character value in its last 16 bits; in CopyBytes, the int variable holds a byte value in its last 8 bits.
复制字符和复制字节很像。最重要的不同点是,复制字符用的是FileReader and FileWriter 而非FileInputStream and FileOutputStream。
注意,复制字节和复制字符都用了一个int整型变量(4字节)来读写。但是,复制字符,int只有最后2字节有数据,前面2字节都是0,因为Java默认是UTF-16编码。而复制字节,每次一个字节,所以int变量只有最后一个字节有数据。
四、字符流使用了字节流
Character streams are often "wrappers" for byte streams. The character stream uses the byte stream to perform the physical I/O, while the character stream handles translation between characters and bytes. FileReader, for example, uses FileInputStream, while FileWriter uses FileOutputStream.
字符流是对字节流的包装。字符流使用字节流来操作物理I/O,而字符流是处理字符和字节直接的转换。FileReader类使用了FileInputStream,而FileWriter使用了FileOutputStream。
There are two general-purpose byte-to-character "bridge" streams: InputStreamReader
andOutputStreamWriter
. Use them to create character streams when there are no prepackaged character stream classes that meet your needs. The sockets lesson in the networking trail shows how to create character streams from the byte streams provided by socket classes.
有两种通用的字节到字符的“桥梁”流:InputStreamReader 和 OutputStreamWriter 。当没有满足你要求的预包装的字符流的时候,使用它们来创建字符流吧!在网络指南中的socket课程里,展示了如何用提供的socket类来把字节流转换成字符流。
五、面向行的I/O
Character I/O usually occurs in bigger units than single characters. One common unit is the line: a string of characters with a line terminator at the end. A line terminator can be a carriage-return/line-feed sequence ("\r\n"), a single carriage-return ("\r"), or a single line-feed ("\n"). Supporting all possible line terminators allows programs to read text files created on any of the widely used operating systems.
有时候我们需要按照一行一行来读入或者输出。通常,一行被定义为:一个字符串,以一个行终止符结尾。行终止符可以是回车+换行 ("\r\n")(windows下的换行符),可以是一个单回车符("\r"),或者是单换行符("\n")(mac OSX系统的换行符)。这样一来,各种不同系统创建的文本文件,我们都能使用这种方式来正确地获取所谓的一行。
注意:println方法是在末尾加上当前操作系统的换行符,所以为了保证跨平台性,Java中的换行符不能简单地写“/r/n”,而必须用:
//java写的根据系统平台得到换行符CRLF
String lineSeparator = System.getProperty("line.separator", "/n");