Is & ldquo; VARCHAR (255) CHARACTER SET utf8 & rdquo; 255 bytes or 255 characters

advertisements

I've declared a field in my INNODB/MySQL table as

VARCHAR(255) CHARACTER SET utf8 NOT NULL

however when inserting my data is truncated at 255 bytes not characters. This might chop the trailing two bite code point iemphasized textn two leaving an invalid character. Any ideas what I might be doing wrong

EDIT:

A sample session is like this

mysql> update channel set comment="ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬x" where id = 1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 1

mysql> select id, channelName, comment from channel;
+----+-------------+------------------------------------------------------------------------------------------
| id | channelName | comment                                                                                                                                                                                                                                                         |
+----+-------------+-----------------------------------------------------------------------------------------
|  1 | foo         | ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩ�� |
+----+-------------+-----------------------------------------------------------------------------------------
1 row in set (0.00 sec)

via mysql-admin I look at the comment field and see that it is indeed VARCHAR(255) and uses "UTF-8 Unicode"

from the command

show full columns from channel

I get

+-----------------------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| Field                       | Type             | Collation       | Null | Key | Default | Extra          | Privileges                      | Comment |
+-----------------------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+
| id                          | int(11)          | NULL            | NO   | PRI | NULL    | auto_increment | select,insert,update,references |         |
| channelName                 | varchar(255)     | utf8_general_ci | NO   |     | NULL    |                | select,insert,update,references |         |
| comment                     | varchar(255)     | utf8_general_ci | NO   |     | NULL    |                | select,insert,update,references |         |
+-----------------------------+------------------+-----------------+------+-----+---------+----------------+---------------------------------+---------+

mysql> SHOW VARIABLES LIKE 'character_set%'

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+


This is a stab in the dark, but are you using UTF-8 as the connection and client character sets? Issue SHOW VARIABLES LIKE 'character_set%' and see whether it tells you UTF-8 or latin-1.

Perhaps if you are using the wrong connection/client character sets, the UTF-8 bytes are reinterpreted as single-byte characters and stored that way in the database.