This is part of a blog series where we design, develop, optimize, deploy and test a URL shortener service from scratch
- Part 1: Overview
- Part 2: Design the write API
- Part 3: Read API, Load testing and Performance improvement
- Part 4: Deploy to AWS
- Part 5: Performance testing on AWS
In this article, we’ll discuss:
API signature is simple
shorten(longUrl) // Returns a unique short url
How short should the short url be?
We can use the following characters in our short code:
- _, -(2)
A total of 64. With 6 characters, something like http://ad.com/abcdef, we will be able to store 64⁶ unique urls. This is more than 68 billion (68,719,476,736). At 100 requests per second, it would take 21 years for this limit to exhaust.
How to generate short code?
There are several ways to do it. One way would be to generate a random 6 character string. The problem with random is that if different people want to use the service with same longUrl, the system will generate different code and store it multiple times. This would be wastage of space. To solve this, let’s use hashing, specifically MD5 hash. This would generate the same hash for the same input. We’ll use base64 encoding on the hash to generate the string and take the first 6 characters. However, there is one catch. Base64 is not url safe as it contains
+. So we’ll replace these characters from Base64 encoding.
Here is the code snippet in Node.js
How to save the short code
As discussed earlier, we’ll use postgres. Other DBs are fine as well.
Database needs two fields:
code to save the 6 character hash and the corresponding
The following migration will generate the above schema
unique constraint on the
code creates index. Database part is done.
There can be several node servers running. We need to solve for concurrent writes. There can be a race condition. We’ll use
findOrCreate to insert into postgres. If you are using noSQL database, find out how one can make an atomic transaction.
Model code should look like this
Since we are only using first 6 characters of the md5 hash, it is possible that we can get the same short code for two different long URLs. In order to resolve this, we’ll take next 6 characters from our base64 code and use them.
Here is the code which, when in conflict, would fetch the next 6 characters from the hash and use that for the short url
The only thing remaining now is to call this function from the routes.
If you are new to Hapi, here is how you can write your main server
The complete code can be found on Github.
If you found this story interesting or useful, please support it by clapping it👏