When using PHP's pathinfo() function on a filename known to be UTF-8, it does not return the correct value, unless there are 'normal' characters in front of the special character.

Examples:
pathinfo('aä.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => aä.pdf
[extension] => pdf
[filename] => aä
)  

which is fine and dandy, but pathinfo('äa.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => a.pdf
[extension] => pdf
[filename] => a
)  

Which is not quite what I was expecting. Even worse, pathinfo('ä.pdf')returns:

Array
(
[dirname] => [the dir]
[basename] => .pdf
[extension] => pdf
[filename] => 
)  

Why does it do this? This goes for all accented characters I have tested.

Comments

Most core PHP functions don't deal with character sets other than ISO-8859-1 (Latin-1). You're only real option is to re-implement the function yourself using multi-byte charset safe functions (mb_string functions)

Written by ircmaxell

which version of PHP are you running ?

Written by ajreal

@ajreal it's PHP 5.2.6-1+lenny9 on debian lenny

Written by Zsub

When printing to screen or the terminal are you using a terminal that supports UTF-8? And when printing to screen (browser?) is the encoding set to UTF-8?

Written by Htbaa

Yes to both :) Actually the documentation is now updated to reflect that pathinfo() is locale aware. I am still unsure how or what happens, the workaround I posted still works (as expected), but I strongly suspect it was the server's locale messing things up.

Written by Zsub

Accepted Answer

A temporary work-around for this problem appears to be to make sure there is a 'normal' character in front of the accented characters, like so:

function getFilename($path)
{
    // if there's no '/', we're probably dealing with just a filename
    // so just put an 'a' in front of it
    if (strpos($path, '/') === false)
    {
        $path_parts = pathinfo('a'.$path);
    }
    else
    {
        $path= str_replace('/', '/a', $path);
        $path_parts = pathinfo($path);
    }
    return substr($path_parts["filename"],1);
}

Note that we replace all occurrences of '/' with '/a' but this is okay, since we return starting at offset 1 of the result. Interestingly enough, the dirname part of pathinfo() does seem to work, so no workaround is needed there.

Written by Zsub
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki