Why read () read one byte at a time if char is 2 bytes?

advertisements

This question already has an answer here:

  • What is the difference between Reader and InputStream? 3 answers

If we have a character in our text file which is in unicode, mustn't it be 2 bytes of data? But the read() method reads one byte at a time as an int. So if we have a FileInputStream object fin and we invoke int x = fin.read() once, how do we get the full character back upon System.out.println(x) if only one byte has been read? (the fin.read() is not in a while loop or anything, it is just called once)


Good question! You're right that in Java characters are always two bytes, but that isn't true elsewhere (e.g. in the contents of a file).

A file is not encoded "in "Unicode" because Unicode is a specification, not an encoding. Encodings map the Unicode specification to certain byte sequences, and not all such encodings use two-byte characters. Java chars are UTF-16 which is always two bytes wide, but many files are stored as UTF-8 which is variable-width; ASCII chars are one byte, others are two or more.

More to the point however, InputStream is designed to read binary data, not characters, and binary data is (essentially) always read one byte at a time. If you want to read text you wrap your stream in a a Reader (preferably explicitly specifying the encoding to be used) to convert the binary data into text. Internally it will call read() one or more times in order to properly construct a character from the sequence of bytes based on the encoding.