• How to make an URL-shortening service

    Posted on June 24th, 2010 Nattl No comments

    Recently I was a little bit bored. And always when I’m bored I gonna start working on small projects just to drive away boredom. What I did this time was an URL-shortening service. URL-shortening service? What’s that? I’m pretty sure, most of you have used one before. Everybody hates long addresses like http://www.example.com/this/is/a/horrible/long/url.php?id=laksdfjlkfadjoweiu – they are extremely uncomfortable, especially when you want to copy them into an email or share it on a service like Twitter. So URL-shortening services like tinyurl.com come in handy, as they shorten an address like above to something like http://tinyurl.com/AB734 (this is just an example).

    These services use special algorithms to shorten URLs. Although I was using this service regularly, I was never paying attention to how it is achieved. Until yesterday. As I already mentioned I was bored. And suddenly it struck me – why not build an URL-shortening service? This can’t be so difficult, I thought. So I started to google a little bit and then started to code. And to be honest, it was not that difficult. In this blogpost I gonna show you how I wrote my URL-shortener.

    I wanted to use my webserver for this service, so it was clear that I had to code it in PHP. I didn’t code in PHP for quite some time, so it took me a while until I was back into the syntax. For my example I used PHP 5, a MySQL-database and a .htaccess-file to change the behavior of my webserver.

    The task is to shorten an URL like http://www.nattl.at/2010/06/automated-backup-jobs-on-mac-osx/ to something like http://s.nattl.at/A342 (example).

    The first thing to do was to create a subdomain for the URL-shortening service as I didn’t want it to interfere with my blog-software. I created the subdomain s.nattl.at, which is not all too long.

    Next I created the database. I’m sure that using a NoSQL-solution would have been a better and even faster solution in this case, but as I didn’t want to mess around with it too much, I simply used my MySQL-database. After creating a database called tinyurl (how ingenious), I added a table tinyurl (even more ingenious). It has three fields: hash (which is the shortened part of the address that will be used afterwards), url (which holds the original URL) and dt_added (which is the time, the new entry was added). I put an index on the hash-column, as this will be the value that will be searched the most. Here is the SQL needed:

    CREATE TABLE `tinyurl`.`tinyurl` (
    `hashval` varchar(6) NOT NULL,
    `url` varchar(256) NOT NULL,
    `dt_added` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    KEY `hash_index` (`hashval`) USING BTREE
    ) ENGINE=MyISAM DEFAULT CHARSET=utf8;

    After creating the database I needed a script, that creates the hash-value for the URL. I searched around a little bit what might be the best way to shorten the URL using a hash. Using methods like MD5 or SHA-1 would create long and nearly unreadable strings that would be of no use for an URL-shortener. Ok, this means that hashing would be no option (although I could write a hash-function that generates the shortened hash-value. But this would be most probably a total overkill). The solution was simple: do not use a hash at all. But… when I’m using no hash, how to create this alphanumeric string that is typical for URL-shorteners? Nothing simpler than that: I simply count up the number of URLs I have stored in the database and increment it when a new URL is added. So the first entry has 1, the second 2 and so on. The maximum number of entries is only limited by the maximum number of rows in a database table. But wait, then the shortened numbers would be numerical. What about the alphanumeric presentation? Good question: I encode it in a different number system. And as an alphanumerical presentation implies that there are 36 different values (26 characters + the numbers 0-9), I’m using a number system based on 36 (ie. Base 36) . All I have to do is to convert the incremented number from decimal (base 10) to base 36. The result is inserted together with the URL as new record in the database. Here is the PHP code (its just spagetti-code, don’t expect something high-sophisticated, its a proof of concept ;) ):

    <?php
    if(!$url = $_GET['url'])
        die('Dunno what to do, man!');
    
    function base36Encode($num) {
        $alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";       
    
        if ($num < 36)
            return $alphabet[$num];    
    
        $base36 = "";    
    
        while ($num != 0) {
            $rem = $num % 36;
            $num = floor($num/36);
            $base36 = $alphabet[$rem].$base36;
        }
        return $base36;
    }
    
    if (!$mysqli = new mysqli('localhost', 'dbuser', 'dbpassword', 'tinyurl')) {
        die('Service not available');
    }
    
    $result = $mysqli->query("SELECT count(*) AS cnt FROM tinyurl");
    $obj = $result->fetch_object();
    $hash = base36Encode($obj->cnt+1);
    $result->close();
    $mysqli->query("INSERT INTO tinyurl (hashval, url) VALUES ('$hash', '$url')");
    $mysqli->close();
    
    printf ("Your url %s was shortened to <a href=\"%s\">http://s.nattl.at/%s</a>\n", $url, $hash, $hash);
    ?>
    

    I have not added a form-field, as I use the url-shortening directly from a link in my browser (I have a button in Firefox, that executes the following Javascript):

    javascript:void(location.href='http://s.nattl.at/create.php?url='+encodeURIComponent(location.href))

    This means, that when I’m viewing a website and hit the button, it redirects me to my URL-shortener passing the service the URL of the viewed site as an argument.

    All I need now is a script that translates incoming requests and redirects to the correct URL. There is however, a problem. When I have an URL like http://s.nattl.at/AB34 my webserver is not happy when it doesn’t find a file called AB34… It will be so angry that it sends a 404 error response. To prevent this, I need to tell the webserver how to handle these requests. As I’m using an Apache, I’m on the save side. Using Apache’s rewrite engine, I define a rule, that redirects those special requests to the correct script while ignoring requests for all other PHP-scripts. This rules have to be defined in a .htaccess file which has to be stored in the very same place like the php-scripts.

    Options +FollowSymlinks
    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !^(.+)\.php$
    RewriteRule ^(.+)$ http://s.nattl.at/redirect.php?hash=$1 [R,NC]

    Now that we have solved this problem, we need the PHP-script to redirect incoming requests to the correct web-address. I call it redirect.php:

    <?php
    
    if (!$mysqli = new mysqli('localhost', 'dbuser', 'dbpassword', 'tinyurl')) {
        die('Service not available');
    }
    
    $hash = $_GET['hash'];
    
    if ($stmt = $mysqli->prepare("SELECT url FROM tinyurl WHERE hashval=?")){
        $stmt->bind_param("s", $hash);
        $stmt->execute();
        $stmt->bind_result($gotoUrl);
        $stmt->fetch();
        $header = 'Location: '.$gotoUrl;
        $stmt->close();
    }
    $mysqli->close();
    header("$header");
    ?>

    And that’s it. The URL-shortener works – try it out here: http://s.nattl.at/B

    Leave a reply