Alex Hunt

Representing a UUID as a URL-safe Hash ID

UUIDs are awesome — they provide a non-sequential, comprehensively unique identifier that can be generated separately from the source database. However, if you need to include one in a URL they can appear somewhat inelegant.

đź’ˇ
2b8b9396-0cdf-4b9c-a03d-b25e1d93601b

Recently, I was migrating a PHP application and database from auto increment to UUID 4 strings. In several instances, resource IDs had been referenced in the URL and obfuscated to a short, clean hash ID padded to six characters in length. These identifiers still needed to exist, and we needed to use UUIDs — so would they need to be displayed everywhere in full?

The Hash IDs package used generates identifiers from characters in the range [0-9A-Za-z]. This is an encoding with a 62 digit base; a subset of the characters valid in a URL path. So remove the fixed-position separators from the UUID above, and it turns out that you have a hexadecimal value that can be shortened significantly in this denser space.

First attempt: Convert to base 10 to pass an integer to Hashids

Initially, I looked to simply convert the de-hyphenated UUID to a decimal number, and leave the Hashids implementation unchanged.

// Not decodable
Hashids::encode((int) base_convert(str_replace('-', '', '2b8b9396-0cdf-4b9c-a03d-b25e1d93601b'), 16, 10));

Sadly, the intermediate value above can exceed the size of PHP’s int type, and the returned string will be sliced before it is passed to the Hashids encode function. Furthermore, base_convert internally doesn’t support anything above base 32.

Fixing the base conversion

This was going to require higher precision numbering. We are going to use the PHP GMP library to write a new helper function:

function gmp_base_convert(string $value, int $initialBase, int $newBase): string
{
    return gmp_strval(gmp_init($value, $initialBase), $newBase);
}

Now, the UUID string can be suitably reduced:

public static function encode(string $uuid): string
{
    return gmp_base_convert(str_replace('-', '', $uuid), 16, 62);
}

This returns an identifier of 1KASJZ7DWmusQjtvucYnoh. Not as short as six characters, but 61% of the length of the original UUID. Even at a maximum value consisting solely of the character f, this hash ID representation won’t exceed 22 characters. And with the UUID implementation, it’s highly unlikely that it will generate one smaller either.

To decode, it’s just as simple, with a bit more code to reinsert the UUID separators.

public static function decode(string $hashid): string
{
    return array_reduce([20, 16, 12, 8], function ($uuid, $offset) {
        return substr_replace($uuid, '-', $offset, 0);
    }, str_pad(gmp_base_convert($hashid, 62, 16), 32, '0', STR_PAD_LEFT));
}

We now have a simple implementation for accurately encoding and decoding UUIDs between a compact, obscured format

Review

For now, this is a practical enough solution. The hashids.org specification does a bit more – notably supporting a salt value to shuffle the output alphabet. Setting a minimum padded size is now a smaller concern, since data is distributed along the UUID string.

The final string for our URLs isn’t as short as before, but certainly acceptable. Nothing smaller can be generated without the tradeoff of switching away from full-length UUIDs, or otherwise by building something more bespoke. If you’re considering that route, I found “How Long Does an ID Need To Be” a great read.

Back to Home