When detecting the encoding of some text from Word (saved as a CSV file) using...

$encoding = mb_detect_encoding($value, 'WINDOWS-1252, ISO-8859-1', true);
$value = iconv($encoding, 'UTF-8//IGNORE', $value);

If a string has curly quotes the $encoding will be set to ISO-8859-1 not WINDOWS-1252 which it should be, so the string will read "self-motivated" with funny boxes around them and not “self-motivated” in it's UTF-8 encoding.

Any ideas on how to resolve this other than replacing the curly quotes, because this could effect other characters too?

Accepted Answer

Windows-1252 and ISO-8859-1 only differ in bytes 7F to 9F. They exist in the former but not in the latter. If you know your encode is either Windows-1252 or ISO-8859-1 you can determine which it is by the existence of such bytes. If no such bytes are included, and you know it is one of these two encodings, you can convert from either.

Written by borrible
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki