IÂ´m rewritting our database class (PDO based), and got stuck at this. IÂ´ve been taught to both use
SET NAMES utf8 and
SET CHARACTER SET utf8 when working with UTF-8 in PHP and MySQL.
In PDO I now want to use the
PDO::MYSQL_ATTR_INIT_COMMAND parameter, but it only supports one query.
SET CHARACTER SET utf8 necessary?
SET CHARACTER SET utf8 after using
SET NAMES utf8 will actually reset the
The manual states that
SET NAMES xis equivalent to
SET character_set_client = x; SET character_set_results = x; SET character_set_connection = x;
SET CHARACTER SET xis equivalent to
SET character_set_client = x; SET character_set_results = x; SET collation_connection = @@collation_database;
SET collation_connection = x also internally executes
SET character_set_connection = <<character_set_of_collation_x>> and
SET character_set_connection = x internally also executes
SET collation_connection = <<default_collation_of_character_set_x.
So essentially you're resetting
@@collation_database. The manual explains the usage of these variables:
What character set should the server translate a statement to after receiving it?
For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.
To sum this up, the encoding/transcoding procedure MySQL uses to process the query and its results is a multi-step-thing:
- MySQL treats the incoming query as being encoded in
- MySQL transcodes the statement from
- when comparing string values to column values MySQL transcodes the string value from
character_set_connectioninto the chracter set of the given database column and uses the column collation to do sorting and comparison.
- MySQL builds up the result set encoded in
character_set_results(this includes result data as well as result metadata such as column names and so on)
So it could be the case that a
SET CHARACTER SET utf8 would not be sufficient to provide full UTF-8 support. Think of a default database character set of
latin1 and columns defined with
utf8-charset and go through the steps described above. As
latin1 cannot cover all the characters that UTF-8 can cover you may lose character information in step 3.
- Step 3: Given that your query is encoded in UTF-8 and contains characters that cannot be represented with
latin1, these characters will be lost on transcoding from
latin1(the default database character set) making your query fail.
So I think it's safe to say that
SET NAMES ... is the correct way to handle character set issues. Even though I might add that setting up your MySQL server variables correctly (all the reuired variables can be set statically in your
my.cnf) frees you from the performance overhead of the extra query required on every connect.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki