This is for the purpose of having a nice short URL which refers to an md5 hash in a database. I would like to convert something like this:

a7d2cd9e0e09bebb6a520af48205ced1

into something like this:

hW9lM5f27

Those both contain about the same amount of information. The method doesn't have to be direct and reversible but that would be nice (more flexible). At the least I would want a randomly generated string with the hex hash as the seed so it is reproducible. I'm sure there are many possible answers, I am curious to see how people would do it in an elegant way.

Oh, this doesn't have to have perfect 1:1 correspondence with the original hash but that would be a bonus (I guess I already implied that with the reversibility criteria). And I would like to avoid collisions if possible.

EDIT I realized my initial calculations were totally wrong (thanks to the people answering here but it took me awhile to clue in) and you can't really reduce the string length very much by throwing in all the lower case and uppercase letters into the mix. So I guess I will want something that doesn't directly convert from hex to base 62.

Comments

With base-64 encoding you will only be able to decrease the input to (4/8)/(6/8) -> 4/6 ~ 66% in size (and this is assuming you deal with the "ugly" base64 characters without adding anything new). I would likely consider a (secondary) look-up method to get truly "pretty" values.

Written by pst

Re "So I guess I will want something that doesn't directly convert from hex to base 62." -- If you want to encode 16 bytes in a URL-safe string, my answer below (22 chars) is probably the best you'll get. What are you actually trying to achieve?

Written by dkamins

Accepted Answer

Of course if I want a function to satisfy my needs perfectly I better make it myself. Here is what I came up with.

//takes a string input, int length and optionally a string charset
//returns a hash 'length' digits long made up of characters a-z,A-Z,0-9 or those specified by charset
function custom_hash($input, $length, $charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUFWXIZ0123456789'){
    $output = '';
    $input = md5($input); //this gives us a nice random hex string regardless of input 

    do{
        foreach (str_split($input,8) as $chunk){
            srand(hexdec($chunk));
            $output .= substr($charset, rand(0,strlen($charset)), 1);
        }
        $input = md5($input);

    } while(strlen($output) < $length);

    return substr($output,0,$length);
}

This is a very general purpose random string generator, however it is not just any old random string generator because the result is determined by the input string and any slight change to that input will produce a totally different result. You can do all sort of things with this:

custom_hash('1d34ecc818c4d50e788f0e7a9fd33662', 16); // 9FezqfFBIjbEWOdR
custom_hash('Bilbo Baggins', 5, '0123456789bcdfghjklmnpqrstvwxyz'); // lv4hb
custom_hash('', 100, '01'); 
// 1101011010110001100011111110100100101011001011010000101010010011000110000001010100111000100010101101

Anyone see any problems with it or any room for improvement?

Written by Moss
This page was build to provide you fast access to the question and the direct accepted answer.
The content is written by members of the stackoverflow.com community.
It is licensed under cc-wiki