This is the first article of what will be a series on how to architect, code, and deploy a URL Shortener. So let's begin.
"What is a URL Shortener?" you ask, you might have come across it more often than you think. In simple words, a URL Shortener converts long URLs to short ones. Upon clicking the short link, you will be redirected to the original URL.
"Well then why do we need them in the first place?" Because they are easy to type, i.e., preventing typos. They also help in saving a lot of space while displaying, texting, or posting on social media.
Let us see an example:
Some of the popular URL Shorteners are: bitly, tinyurl, etc. Some more use cases of such services include better management and analyzing the audience.
By the look of it, sounds pretty simple, right? Let's think about the requirements before jumping the gun.
The below points can be treated as the static system requirements from a higher standpoint.
- Assign a unique short link for a given URL.
- Upon accessing a short link, redirect to the original URL with minimum latency.
- Use encoding to prevent short links from being guessed.
- Expire the short links after a certain timespan.
To keep it short and straightforward, we will keep these requirements to a minimum. FYI, I will be rendering the frontend using a template engine so that I can incorporate everything in a single backend focussed repository.
- Home Page: To ask users to input a URL.
- Output Page: To display the generated short link and provide options to the user such as copying the short link or opening it in a new tab.
- Invalid Page: To display an appropriate message on receiving invalid requests.
- Expired Page: To display an appropriate message when a user tries to access a short link which has expired.
- Home Page Render API: To render the Home page.
- Assign Short Link API: To assign a short link from the database against the incoming URL and return an encoded string to the frontend.
- Display Short Link API: To decode the encoded string and return the assigned short link to display it to the user.
- Redirect API: To redirect the user to the original URL.
"Hmm, but why do we need to send an encoded string to the frontend in the first place, only to decode it back again to send the generated short link?"
Hang on, I will clarify this in an upcoming section when I explain the flow from a user perspective.
We need only one table to store about the URL mappings.
+---------------------------+---------------------------+-----+--------------------+ | Field | Type | Key | Default | +---------------------------+---------------------------+-----+--------------------+ | _id | String | PRI | URL Safe String | | is_active | Boolean | | true | | is_used | Boolean | | false | | original_url | String | | null | | creation_date | Date | | Date.now | | expiration_date | Date | | Date.now | +---------------------------+---------------------------+-----+--------------------+
Note For our case, a URL Safe String will contain a combination of the following characters:
Since we do not have any requirement of creating relationships and we might need to store several rows, a NoSQL database will be a better choice as it will be easier to scale. Hence, we will be using MongoDB.
_id: This will the short link which will be served. Its length can be 6 in the beginning and can be increased as per the traffic. (Number of unique combinations of length 6 from the string which will be used to generate the URL Safe String will be huge). We will be using an npm package to generate it at the time of the creation of a document.
is_active: A flag to control whether a document is active or not. It might not be helpful early on, but it is always good to have a fallback, just in case you might need to disable certain short links on an urgent basis.
is_used: A flag to check whether a document, i.e., a short link, is in use or not.
original_url: The original URL from the user.
creation_date: The date of creation of the document.
expiration_date: The date of expiration of the document. When the document expires, the short link will expire. We will be leveraging a built-in functionality of MongoDB via Mongoose ODM (Object Document Mapper) while coding to attach a counter for the expiration of a document.
The problem we have on our hands is, how to generate a short and unique link for a given URL.
Generate a new document in the database after receiving a URL from the user. However, some of the issues with this approach are as follows:
- Suppose the storage space allocated for the database is full. Now because the system won't be able to create new documents for incoming requests, the availability of your service goes for a toss.
- Thinking about scale, if and when the server will serve considerable traffic, creating a document runtime will result in the consumption of more resources, in turn slowing down the system and increasing the latency.
A better solution to overcome the limitations with the first approach is to create x number of documents beforehand using a cron that runs every y minutes or hours, depending on the traffic.
Cron A command to an operating system or server for a job that is to be executed at a specified time.
Now, whenever a user will ask for a short link, your system will do the following:
- Pick a document with
- Update the
original_urlof the document with the user's URL,
expiration_dateto the current timestamp.
- Return the
_id(short link) of the document.
To summarize, the user flow will be as follows:
- The user visits the Home Page. Home Page Render API is in action.
- User inputs a URL and clicks on the Submit button. Assign Short Link API is called, which returns an encoded version of the short link.
- Upon successful response from Assign Short Link API, the frontend redirects to a new route that uses the encoded short link as a path parameter. Display Short Link API is triggered on visiting of this new route.
- Display Short Link API decodes the encoded short link, finds the corresponding document against it in the database, and returns the short link in the rendered Output Page.
- Redirect API comes into action when a user clicks on a short link. It basically extracts the unique alias (which is actually the
_id) and finds the document against it. If the document exists, it will redirect the user to the
original_urlstored against it, else it will display the Expired Page.
- Invalid Page will be rendered only in a particular scenario that we can address when writing the code.
Addressing the dangling question which was left unanswered earlier There can be several reasons for the whole encoding-decoding process, some of which are:
- Increase monetization by showing ads on 2 different pages rather than one, one for inputting the URL and the other for displaying the short link.
- To rate-limit the routes with a distinct configuration.
- To prevent the short links from being predictable easily.
Here is a list of some corner cases and improvements that I would like you to brainstorm on:
- What if multiple users enter the same URL? They might get the same shortened URL, which is not acceptable.
- What if parts of the URL are URL-encoded?
- Users should optionally be able to pick a custom short link for their URL.
- Users should be able to specify the expiration time.
FYI, the coding solution we will implement further in the upcoming articles might not solve all the above issues. These are for you to solve.
I realize that you might have some queries, but I'm sure we will resolve those when implementing the architecture. Till then, try to come up with some new and interesting corner cases and improvements that we can incorporate. And if you understood it well, go ahead and try to implement it yourself.