This question already has an answer here:
- What is the difference between Reader and InputStream? 3 answers
If we have a character in our text file which is in unicode, mustn't it be 2 bytes of data? But the
read() method reads one byte at a time as an
int. So if we have a
fin and we invoke
int x = fin.read() once, how do we get the full character back upon
System.out.println(x) if only one byte has been read? (the
fin.read() is not in a
while loop or anything, it is just called once)
Good question! You're right that in Java characters are always two bytes, but that isn't true elsewhere (e.g. in the contents of a file).
A file is not encoded "in "Unicode" because Unicode is a specification, not an encoding. Encodings map the Unicode specification to certain byte sequences, and not all such encodings use two-byte characters. Java
chars are UTF-16 which is always two bytes wide, but many files are stored as UTF-8 which is variable-width; ASCII chars are one byte, others are two or more.
More to the point however,
InputStream is designed to read binary data, not characters, and binary data is (essentially) always read one byte at a time. If you want to read text you wrap your stream in a a
Reader (preferably explicitly specifying the encoding to be used) to convert the binary data into text. Internally it will call
read() one or more times in order to properly construct a character from the sequence of bytes based on the encoding.