Is there a problem with saving all my source code files in UTF-8?


If that's relevant (it very well could be), they are PHP source code files.

There are a few pitfalls to take care of:

  1. PHP is not aware of the BOM character certain editors or IDEs like to put at the very beginning of UTF-8 files. This character indicates the file is UTF-8, but it is not necessary, and it is invisible. This can cause "headers already sent out" warnings from functions that deal with HTTP headers because PHP will output the BOM to the browser if it sees one, and that will prevent you from sending any header. Make sure your text editor has a UTF-8 (No BOM) encoding; if you're not sure, simply do the test. If <?php header('Content-Type: text/html') ?> at the beginning of an otherwise empty file doesn't trigger a warning, you're fine.
  2. Default string functions are not multibyte encodings-aware. This means that strlen really returns the number of bytes in the string, not the actual number of characters. This isn't too much of a problem until you start splicing strings of non-ASCII characters with functions like substr: when you do, indices you pass to it refer to byte indices rather than character indices, and this can cause your script to break non-ASCII characters in two. For instance, echo substr("é", 0, 1) will return an invalid UTF-8 character because in UTF-8, é actually takes two bytes and substr will return only the first one. (The solution is to use the mb_ string functions, which are aware of multibyte encodings.)
  3. You must ensure that your data sources (like external text files or databases) return UTF-8 strings too, because PHP makes no automagic conversion. To that end, you may use implementation-specific means (for instance, MySQL has a special query that lets you specify in which encoding you expect the result: SET CHARACTER SET UTF8 or something along these lines), or if you couldn't find a better way, mb_convert_encoding or iconv will convert one string into another encoding.