Today, we will be building a Pastebin clone — a web service that allows users to upload and share text through links known as ‘pastes’. What follows is my journey of how I create a Pastebin clone using serverless functions through Cloudflare Worker. If you are not familiar with Pastebin, I’d highly recommend you to give it a try before reading on.
“Why Pastebin?” you might ask. Well, sending >50 lines long block of text (or code) through a chat app (looking at you, IRC) isn’t exactly the best way to communicate.
The design of this Pastebin clone would be very similar to building a TinyURL clone, except we need to store the paste content instead of the original unshortened URL.
Before we begin, this is NOT a tutorial or guide on:
Rather, this is a proof of concept (POC) of how to build a simple paste tool using serverless computing with Cloudflare Worker. To follow through this article, check out Steps 1 to 3 of this Get Started Guide.
Let’s go!
Let’s start by clarifying the use cases and constraints of our project.
paste.jerrynsh.com/aj7kLmN9
Like our previous attempt, the goal here is to host this service for free. With Cloudflare Worker’s pricing and platform limits in mind, our constraints are:
Similar to a URL shortener, our application is expected to undergo a high read-to-write ratio. That being said, we will be using Cloudflare KV (KV in the following), a low-latency key-value store for this project.
At the time of writing, the free tier of KV comes with the following limits:
In this section, we are going to do an estimation on how many pastes can our Pastebin clone possibly store, given the limitations above. Unlike storing a URL, storing text blocks can consume much more space (relatively speaking). Here are the assumptions that we are going to make:
Do take note that the limits are applied on a per-account basis.
For this POC, we are going to use KV as our database of choice. Let’s dive a little bit deeper into what it does.
At present, CAP Theorem is often used to model distributed data stores. CAP Theorem states that a distributed system can only provide 2 of the following 3 guarantees (source):
In KV’s case, Cloudflare chooses to guarantee Availability and Partition tolerance — which fits our non-functional requirement. Even though this combination screams eventual consistency, that is a tradeoff that we are fine with.
Not forgetting to mention KV supports exceptionally high read volumes with ultra-low latency — perfect for our high read-to-write ratio application.
Now that we understood the tradeoffs, let’s move on!
The paste URL UUID generation logic is going to be very similar to a URL shortener. Here’s a quick summary of the possible approaches:
However, we are going with another solution that is not mentioned above.
For this POC, we will pre-generate a list of UUID in a KV using a separate worker. We shall refer to the worker as a key generator service (KGS). Whenever we want to create a new paste, we will assign a pre-generated UUID to the new paste.
So, what are the advantages of doing things in such a way?
With this approach, we will not have to worry about key duplication or hash collisions (e.g. from approach 2 or 3) as our key generator will ensure that the keys inserted in our KV are unique.
Here, we will be using 2 KVs:
KEY_KV
— used by our KGS to store a pre-generated list of UUIDPASTE_KV
— used by our main app server to store a key-value pair; where the key is the UUID and the value is the content of a paste.
To create a KV, simply run the following commands with Wrangler CLI (source).
# Production namespace:
wrangler kv:namespace create "PASTE_DB"
wrangler kv:namespace create "KEY_DB"
# This namespace is used for `wrangler dev` local testing:
wrangler kv:namespace create "PASTE_DB" --preview
wrangler kv:namespace create "KEY_DB" --preview
For creating these KV namespaces, we will need to update our wrangler.toml
files to include the namespace bindings accordingly. To view your KV’s dashboard, visit https://dash.cloudflare.com/<your_cloudflare_account_id>/workers/kv/namespaces
.
For KGS to generate new UUIDs, we will be using the nanoid
package. In case you’re lost, you can always refer to the /kgs
folder on the GitHub repository.
How does KGS know if there’s a duplicated key? Whenever KGS generates a key, it should always check if the UUID already exists in KEY_DB
and PASTE_DB
.
In addition, the UUID should be removed from KEY_DB
and be created at PASTE_DB
upon generating a new paste. We will cover the code in the API section.
// /kgs/src/utils/keyGenerator.js
import { customAlphabet } from "nanoid";
import { ALPHABET } from "./constants";
/*
Generate a `uuid` using `nanoid` package.
Keep retrying until a `uuid` that does not exist in both KV (`PASTE_DB` and `KEY_DB`) is generated.
KGS guarantees that the pre-generated keys are always unique.
*/
export const generateUUIDKey = async () => {
const nanoId = customAlphabet(ALPHABET, 8);
let uuid = nanoId();
while (
(await KEY_DB.get(uuid)) !== null &&
(await PASTE_DB.get(uuid)) !== null
) {
uuid = nanoId();
}
return uuid;
};
Another potential issue that we might run into is — what should we do when all our UUIDs in our KEY_KV
are completely used up?
For this, we will set up a Cron trigger that replenishes our list of UUID periodically on a daily basis. To respond to a Cron trigger, we must add a "scheduled"
event listener to the Workers script as shown later in the code below.
// /kgs/src/index.js
import { MAX_KEYS } from "./utils/constants";
import { generateUUIDKey } from "./utils/keyGenerator";
/*
Pre-generate a list of unique `uuid`s.
Ensures that pre-generated `uuid` KV list always has `MAX_KEYS` number of keys.
*/
const handleRequest = async () => {
const existingUUIDs = await KEY_DB.list();
let keysToGenerate = MAX_KEYS - existingUUIDs.keys.length;
console.log(`Existing # of keys: ${existingUUIDs.keys.length}.`);
console.log(`Estimated # of keys to generate: ${keysToGenerate}.`);
while (keysToGenerate != 0) {
const newKey = await generateUUIDKey();
await KEY_DB.put(newKey, "");
console.log(`Generated new key in KEY_DB: ${newKey}.`);
keysToGenerate--;
}
const currentUUIDs = await KEY_DB.list();
console.log(`Current # of keys: ${currentUUIDs.keys.length}.`);
};
addEventListener("scheduled", (event) => {
event.waitUntil(handleRequest(event));
});
As our POC can only support up to 1k writes/day, we will set the MAX_KEYS
to generate to 1000. Feel free to tweak around according to your account limits.
On the high level, we probably need 2 APIs:
For this POC, we will be developing our API in GraphQL using the Apollo GraphQL server. Specifically, we will be using the itty-router
worker template alongside workers-graphql-server
.
Before we move along, you can directly interact with the GraphQL API of this POC via the GraphQL playground endpoint in case you are not familiar with GraphQL.
When lost, you can always refer to the /server
folder.
To start, the entry point of our API server lies in src/index.js
where all the routing logic is handled by itty-router
.
// server/src/index.js
const { missing, ThrowableRouter, withParams } = require("itty-router-extras");
const apollo = require("./handlers/apollo");
const index = require("./handlers/index");
const paste = require("./handlers/paste");
const playground = require("./handlers/playground");
const router = ThrowableRouter();
router.get("/", index);
router.all("/graphql", playground);
router.all("/__graphql", apollo);
router.get("/:uuid", withParams, paste);
router.all("*", () => missing("Not found"));
addEventListener("fetch", (event) => {
event.respondWith(router.handle(event.request));
});
Typically to create any resource in GraphQL, we need a mutation. In the REST API world, a GraphQL mutation to create would be very much similar to sending a request to a POST endpoint, e.g. /v1/api/paste
. Here’s what our GraphQL mutation would look like:
mutation {
createPaste(content: "Hello world!") {
uuid
content
createdOn
expireAt
}
}
Under the hood, the handler (resolver) should call createPaste
that takes in content
from the HTTP JSON body. This endpoint is expected to return the following:
{
"data": {
"createPaste": {
"uuid": "0pZUDXzd",
"content": "Hello world!",
"createdOn": "2022-01-29T04:07:06+00:00",
"expireAt": "2022-01-30T04:07:06+00:00"
}
}
}
You can check out the GraphQL schema here.
Here’s the implementation in code of our resolvers:
// /server/src/resolvers.js
const { ApolloError } = require("apollo-server-cloudflare");
module.exports = {
Query: {
getPaste: async (_source, { uuid }, { dataSources }) => {
return dataSources.pasteAPI.getPaste(uuid);
},
},
Mutation: {
createPaste: async (_source, { content }, { dataSources }) => {
if (!content || /^\s*$/.test(content)) {
throw new ApolloError("Paste content is empty");
}
return dataSources.pasteAPI.createPaste(content);
},
},
};
To mitigate spam, we also added a small check to prevent the creation of empty pastes.
We are keeping the API logic that interacts with our database (KV) within /datasources
.
As mentioned previously, we need to remove the key used from our KGS KEY_DB
KV to avoid the risk of assigning duplicated keys for new pastes.
Here, we can also set our key to have the expirationTtl
of one day upon paste creation:
// /server/src/datasources/paste.js
const { ApolloError } = require('apollo-server-cloudflare')
const moment = require('moment')
/*
Create a new paste in `PASTE_DB`.
Fetch a new `uuid` key from `KEY_DB`.
UUID is then removed from `KEY_DB` to avoid duplicates.
*/
async createPaste(content) {
try {
const { keys } = await KEY_DB.list({ limit: 1 })
if (!keys.length) {
throw new ApolloError('Ran out of keys')
}
const { name: uuid } = keys[0]
const createdOn = moment().format()
const expireAt = moment().add(ONE_DAY_FROM_NOW, 'seconds').format()
await KEY_DB.delete(uuid) // Remove key from KGS
await PASTE_DB.put(uuid, content, {
metadata: { createdOn, expireAt },
expirationTtl: ONE_DAY_FROM_NOW,
})
return {
uuid,
content,
createdOn,
expireAt,
}
} catch (error) {
throw new ApolloError(`Failed to create paste. ${error.message}`)
}
}
Similarly, I have also created a getPaste
GraphQL query to retrieve the paste content via UUID. We won’t be covering it in this article but feel free to check it out in the source code. To try it out on the playground:
query {
getPaste(uuid: "0pZUDXzd") {
uuid
content
createdOn
expireAt
}
}
In this POC, we won’t be supporting any deletion of the pastes since pastes would expire after 24 hours.
Whenever a user visits a paste URL (GET /:uuid
) the original content of the paste should be returned. If an invalid URL is entered, users should get a missing error code. View the full HTML here.
// /server/src/handlers/paste.js
const { missing } = require("itty-router-extras");
const moment = require("moment");
const handler = async ({ uuid }) => {
const { value: content, metadata } = await PASTE_DB.getWithMetadata(uuid);
if (!content) {
return missing("Invalid paste link");
}
const expiringIn = moment(metadata.expireAt).from(metadata.createdOn);
return new Response(html(content, expiringIn), {
headers: { "Content-Type": "text/html" },
});
};
Finally, to start the development API server locally, simply run wrangler dev
Before publishing your code, you will need to edit the wrangler.toml
files (within server/
& kgs/
) and add your Cloudflare account_id
inside. You can read more information about configuring and publishing your code can be found in the official documentation.
Do make sure that the KV namespace bindings are added to your wrangler.toml
files as well.
To publish any new changes to your Cloudflare Worker, simply run wrangler publish
in the respective service.
To deploy your application to a custom domain, check out this short clip.
In the GitHub repository, I have also set up a CI/CD workflow using GitHub Actions. To use Wrangler actions
, add CF_API_TOKEN
into your GitHub repository secrets.
I did not expect this POC to take me this long to write and complete, I probably slacked more than I should.
Like my previous post, I would love to end this with some potential improvements that can be made (or sucked into the backlog blackhole for eternity) in the future:
Like URL shorteners, Paste tools have a certain stigma about them — both tools make URLs opaque which spammers love to abuse. Well, at least the next time you ask “why doesn’t this code work?”, you’ll have your own paste tool to use, at least until you add in syntax highlighting.