Today, we will be building a clone — a web service that allows users to upload and share text through links known as ‘pastes’. What follows is my journey of how I create a Pastebin clone using serverless functions through . If you are not familiar with Pastebin, I’d highly recommend you to give it a try before reading on. Pastebin Cloudflare Worker “Why Pastebin?” you might ask. Well, sending >50 lines long block of text (or code) through a chat app ( ) isn’t exactly the best way to communicate. looking at you, IRC TL;DR - Build a Serverless Pastebin Building a Pastebin clone using Cloudflare Worker and KV Project requirements and limitations planning Paste URL UUID generation logic with key generation service (KGS) GraphQL API design and implementation Live demo at paste.jerrynsh.com GitHub repository The design of this Pastebin clone would be very similar to , except we need to store the paste content instead of the original unshortened URL. building a TinyURL clone Before we begin, this is NOT a tutorial or guide on: How to tackle an actual system design interview Building a commercial grade paste tool like or Pastebin GitHub Gist Rather, this is a proof of concept (POC) of how to build a simple paste tool using serverless computing with Cloudflare Worker. To follow through this article, check out Steps 1 to 3 of this . Get Started Guide Let’s go! Requirements for Your Serverless Pastebin Let’s start by clarifying the use cases and constraints of our project. Functional Whenever a user enters a block of text (or code), our web service should generate a URL with a random key (UUID), e.g. paste.jerrynsh.com/aj7kLmN9 Whenever a user visits the generated URL, the user should be redirected to view the original paste content, i.e. the original block of text The link to the paste should expire after 24 hours The UUID should only contain alphanumeric characters ( ) Base62 The length of our UUID should be 8 characters Non-Functional Low latency Highly available Budget, Capacity, & Limitations Planning Like our previous attempt, the goal here is to host this service for free. With and in mind, our constraints are: Cloudflare Worker’s pricing platform limits 100k requests/day at 1k requests/min CPU runtime not exceeding 10ms Similar to a URL shortener, our application is expected to undergo a high read-to-write ratio. That being said, we will be using (KV in the following), a low-latency key-value store for this project. Cloudflare KV At the time of writing, the comes with the following : free tier of KV limits 100k reads/day 1k writes/day 1 GB of stored data (key size of 512 bytes; value size of 25 MiB) How Many Pastes Can We Store? In this section, we are going to do an estimation on how many pastes can our Pastebin clone possibly store, given the limitations above. Unlike storing a URL, storing text blocks can consume much more space (relatively speaking). Here are the assumptions that we are going to make: 1 character is 1 byte (using this ) byte counter Assuming on average, a single paste (file) can consist of about 200 lines of code (text), that would mean that the size of each paste would be about 10 KB With 1 GB of maximum storage size, that means that our Pastebin clone can only store up to 100,000 pastes Do take note that the limits are applied on a per-account basis. Storage & Database Cloudflare Worker KV For this POC, we are going to use KV as our database of choice. Let’s dive a little bit deeper into what it does. At present, CAP Theorem is often used to model distributed data stores. CAP Theorem states that a distributed system can only provide 2 of the following 3 guarantees ( ): source onsistency - is my data the same everywhere? C vailability - is my data always accessible? A artition tolerance - is my data resilient to regional outages? P In KV’s case, Cloudflare chooses to guarantee vailability and artition tolerance — which fits our non-functional requirement. Even though this combination screams eventual consistency, that is a tradeoff that we are fine with. A P Not forgetting to mention KV supports exceptionally high read volumes with ultra-low latency — perfect for our high read-to-write ratio application. Now that we understood the tradeoffs, let’s move on! How to Implement Your Serverless Pastebin URL Generation Logics The paste URL UUID generation logic is going to be very similar to a URL shortener. Here’s a quick summary of the possible approaches: Use a UUID generator to generate a UUID on demand for every new request Use the hash ( ) of the paste content as our UUID, then use the first N characters of the hash as part of our URL MD5 Using a combination of hashing + Base62 encoding Use an auto-incremented integer as our UUID However, we are going with another solution that is not mentioned above. Pre-generate UUID Key For this POC, we will pre-generate a list of UUID in a KV using a separate worker. We shall refer to the worker as a key generator service (KGS). Whenever we want to create a new paste, we will assign a pre-generated UUID to the new paste. So, what are the advantages of doing things in such a way? With this approach, we will not have to worry about key duplication or hash collisions (e.g. from approach 2 or 3) as our key generator will ensure that the keys inserted in our KV are unique. Here, we will be using 2 KVs: — used by our KGS to store a pre-generated list of UUID KEY_KV — used by our main app server to store a key-value pair; where the key is the UUID and the value is the content of a paste. PASTE_KV To create a KV, simply run the following commands with Wrangler CLI ( ). source # Production namespace: wrangler kv:namespace create "PASTE_DB" wrangler kv:namespace create "KEY_DB" # This namespace is used for `wrangler dev` local testing: wrangler kv:namespace create "PASTE_DB" --preview wrangler kv:namespace create "KEY_DB" --preview For creating these KV namespaces, we will need to update our files to include the namespace bindings accordingly. To view your KV’s das board, visit . wrangler.toml h https://dash.cloudflare.com/<your_cloudflare_account_id>/workers/kv/namespaces How to Generate UUID For KGS to generate new UUIDs, we will be using the package. In case you’re lost, you can always refer to the folder on the GitHub repository. nanoid /kgs How does KGS know if there’s a duplicated key? Whenever KGS generates a key, it should always check if the UUID already exists in and . KEY_DB PASTE_DB In addition, the UUID should be removed from and be created at upon generating a new paste. We will cover the code in the API section. KEY_DB PASTE_DB // /kgs/src/utils/keyGenerator.js import { customAlphabet } from "nanoid"; import { ALPHABET } from "./constants"; /* Generate a `uuid` using `nanoid` package. Keep retrying until a `uuid` that does not exist in both KV (`PASTE_DB` and `KEY_DB`) is generated. KGS guarantees that the pre-generated keys are always unique. */ export const generateUUIDKey = async () => { const nanoId = customAlphabet(ALPHABET, 8); let uuid = nanoId(); while ( (await KEY_DB.get(uuid)) !== null && (await PASTE_DB.get(uuid)) !== null ) { uuid = nanoId(); } return uuid; }; Running Out of Unique Keys to Generate Another potential issue that we might run into is — what should we do when all our UUIDs in our are completely used up? KEY_KV For this, we will set up a that replenishes our list of UUID periodically on a daily basis. To respond to a Cron trigger, we must add a event listener to the Workers script as shown later in the code below. Cron trigger "scheduled" // /kgs/src/index.js import { MAX_KEYS } from "./utils/constants"; import { generateUUIDKey } from "./utils/keyGenerator"; /* Pre-generate a list of unique `uuid`s. Ensures that pre-generated `uuid` KV list always has `MAX_KEYS` number of keys. */ const handleRequest = async () => { const existingUUIDs = await KEY_DB.list(); let keysToGenerate = MAX_KEYS - existingUUIDs.keys.length; console.log(`Existing # of keys: ${existingUUIDs.keys.length}.`); console.log(`Estimated # of keys to generate: ${keysToGenerate}.`); while (keysToGenerate != 0) { const newKey = await generateUUIDKey(); await KEY_DB.put(newKey, ""); console.log(`Generated new key in KEY_DB: ${newKey}.`); keysToGenerate--; } const currentUUIDs = await KEY_DB.list(); console.log(`Current # of keys: ${currentUUIDs.keys.length}.`); }; addEventListener("scheduled", (event) => { event.waitUntil(handleRequest(event)); }); As our POC can only support up to 1k writes/day, we will set the to generate to 1000. Feel free to tweak around according to your account limits. MAX_KEYS API On the high level, we probably need 2 APIs: Creating a URL for paste content Redirecting to the original paste content For this POC, we will be developing our API in using the server. Specifically, we will be using the worker template alongside . GraphQL Apollo GraphQL itty-router workers-graphql-server Before we move along, you can directly interact with the GraphQL API of this POC via the endpoint in case you are not familiar with GraphQL. GraphQL playground When lost, you can always refer to the folder. /server Routing To start, the entry point of our API server lies in where all the routing logic is handled by . src/index.js itty-router // server/src/index.js const { missing, ThrowableRouter, withParams } = require("itty-router-extras"); const apollo = require("./handlers/apollo"); const index = require("./handlers/index"); const paste = require("./handlers/paste"); const playground = require("./handlers/playground"); const router = ThrowableRouter(); router.get("/", index); router.all("/graphql", playground); router.all("/__graphql", apollo); router.get("/:uuid", withParams, paste); router.all("*", () => missing("Not found")); addEventListener("fetch", (event) => { event.respondWith(router.handle(event.request)); }); Creating Paste Typically to create any resource in GraphQL, we need a . In the REST API world, a GraphQL mutation to create would be very much similar to sending a request to a POST endpoint, e.g. . Here’s what our GraphQL mutation would look like: mutation /v1/api/paste mutation { createPaste(content: "Hello world!") { uuid content createdOn expireAt } } Under the hood, the handler (resolver) should call that takes in from the HTTP JSON body. This endpoint is expected to return the following: createPaste content { "data": { "createPaste": { "uuid": "0pZUDXzd", "content": "Hello world!", "createdOn": "2022-01-29T04:07:06+00:00", "expireAt": "2022-01-30T04:07:06+00:00" } } } You can check out the GraphQL schema . here Here’s the implementation in code of our resolvers: // /server/src/resolvers.js const { ApolloError } = require("apollo-server-cloudflare"); module.exports = { Query: { getPaste: async (_source, { uuid }, { dataSources }) => { return dataSources.pasteAPI.getPaste(uuid); }, }, Mutation: { createPaste: async (_source, { content }, { dataSources }) => { if (!content || /^\s*$/.test(content)) { throw new ApolloError("Paste content is empty"); } return dataSources.pasteAPI.createPaste(content); }, }, }; To mitigate spam, we also added a small check to prevent the creation of empty pastes. Paste Creation Data Source We are keeping the API logic that interacts with our database (KV) within . /datasources As mentioned previously, we need to remove the key used from our KGS KV to avoid the risk of assigning duplicated keys for new pastes. KEY_DB Here, we can also set our key to have the of one day upon paste creation: expirationTtl // /server/src/datasources/paste.js const { ApolloError } = require('apollo-server-cloudflare') const moment = require('moment') /* Create a new paste in `PASTE_DB`. Fetch a new `uuid` key from `KEY_DB`. UUID is then removed from `KEY_DB` to avoid duplicates. */ async createPaste(content) { try { const { keys } = await KEY_DB.list({ limit: 1 }) if (!keys.length) { throw new ApolloError('Ran out of keys') } const { name: uuid } = keys[0] const createdOn = moment().format() const expireAt = moment().add(ONE_DAY_FROM_NOW, 'seconds').format() await KEY_DB.delete(uuid) // Remove key from KGS await PASTE_DB.put(uuid, content, { metadata: { createdOn, expireAt }, expirationTtl: ONE_DAY_FROM_NOW, }) return { uuid, content, createdOn, expireAt, } } catch (error) { throw new ApolloError(`Failed to create paste. ${error.message}`) } } Similarly, I have also created a to retrieve the paste content via UUID. We won’t be covering it in this article but feel free to check it out in the . To try it out on the : getPaste GraphQL query source code playground query { getPaste(uuid: "0pZUDXzd") { uuid content createdOn expireAt } } In this POC, we won’t be supporting any deletion of the pastes since pastes would expire after 24 hours. Getting Paste Whenever a user visits a paste URL (GET ) the original content of the paste should be returned. If an invalid URL is entered, users should get a missing error code. View the full HTML . /:uuid here // /server/src/handlers/paste.js const { missing } = require("itty-router-extras"); const moment = require("moment"); const handler = async ({ uuid }) => { const { value: content, metadata } = await PASTE_DB.getWithMetadata(uuid); if (!content) { return missing("Invalid paste link"); } const expiringIn = moment(metadata.expireAt).from(metadata.createdOn); return new Response(html(content, expiringIn), { headers: { "Content-Type": "text/html" }, }); }; Finally, to start the development API server locally, simply run wrangler dev Deploying Your Pastebin Before publishing your code, you will need to edit the files (within & ) and add your Cloudflare inside. You can read more information about configuring and publishing your code can be found in the . wrangler.toml server/ kgs/ account_id official documentation Do make sure that the KV namespace bindings are added to your files as well. wrangler.toml To publish any new changes to your Cloudflare Worker, simply run in the respective service. wrangler publish To deploy your application to a custom domain, check out . this short clip CI/CD In the , I have also set up a CI/CD workflow using GitHub Actions. To use , add into your GitHub repository secrets. GitHub repository Wrangler actions CF_API_TOKEN Conclusion: Building a Pastebin I did not expect this POC to take me this long to write and complete, I probably slacked more than I should. Like my previous post, I would love to end this with some improvements that can be made (or sucked into the backlog blackhole for eternity) in the future: potential Allowing users to set custom expiry Pastes edit and deletion Syntax highlighting Analytics Private pastes with password protection Like URL shorteners, Paste tools have a certain stigma about them — both tools make URLs opaque which spammers love to abuse. Well, at least the next time you ask “why doesn’t this code work?”, you’ll have your own paste tool to use, at least until you add in syntax highlighting.