tldr: Using a basic monorepo (one repository for all services code) project design can make development easier when building many services.
I work for a learning company which has two main goals:
- Help individuals learn/discover skills relevant to their organization’s needs.
- Help organizations retain/onboard/train individuals.
To fulfill these two goals our company collects, processes, and analyzes tons of data. This data challenge is common among many companies working on the enterprise level. It requires a solid data-pipeline infrastructure which is fast
while still reliable
. There are design
considerations to consider while building. This blog post will focus only on a high level view of a small portion of this pipeline.
Why use Golang?
- Scalability: Go was designed with simplicity in mind. There are only 25 keywords, only one loop (the
for loop), all code has a common style thanks to
go fmt, etc… . This simplicity allows for scalability in a team sense. If new team members are to work on applications written in Go it is easier for them to read and understand the code. This increases the size of the pool of programmers available to help with development.
- Reliability: Since Go is a strongly typed language which also has great error handling mechanisms it makes great reliable code. The Go project also has a great race detector tool to help identify possible race conditions.
- Speed: Go is fast.
Why use services and a monorepo?
Scaling out a team is more than just choosing a language which everyone can work on. The code must be organized in ways which allow for easy lookup and modification. This often requires the code to live close to each other while having well defined responsibilities.
Well designed monoliths accomplish these tasks, but sometimes stall when design patterns diverge, responsibilities are incorrect, or specific parts of the application are too resource intensive and bring down the whole application. Parts which are resource intensive is why some projects choose to break off code into separate services.
Services (or whatever you want to call them) can run into the same problems as a monolith project; however, many services are also split up across multiple repositories which can make it hard for new team members to quickly navigate the project as a whole.
Data-processing has a variety of requirements for tasks. Some tasks might require quick turnaround time. Others might receive large batches of data which takes minutes to process. Since the requirements of these tasks vary greatly using services makes sense. There still might be issues, but scaling out individual processes horizontally/vertically based on their individual needs is much easier. The last issue is code proximity. The concept of a monorepo (one repository for all services code) can mitigate the proximity issue in certain cases. For the scenario of many Go services often sharing similar packages, the monorepo concept works out quite nice.
A monorepo project design will be different compared to a package or small project. It is important to note that these design suggestions are just suggestions.
Structures of projects are important for organization. The quicker someone can visualize the project as a whole the faster they can contribute. With Golang
the designers have not forced a folder convention so your team can use their ideals and preferences to find a solution. Below is one monorepo example folder structure.
When projects become large and more individuals start contributing code it becomes difficult to enforce everyone to use the same commands. A Makefile can help standardize a nice shortcut list for contributors to use. Here are some tasks you might want to include in your projects:
- build: Builds the binary for your go project and places it in the
- test: Runs tests using specific flags or tools. With Go’s testing package you can also specify test groups if your code base become very large.
- deploy: Runs a deploy script or trigger to deploy a specific project. It usually makes sense to keep your build and deploy tasks separate. Your Makefile can always specify a specific order to run them.
- clean: Cleans out old build objects and artifacts.
The cmd folder is meant to hold your application entry points (main funcs). These entry point files are responsible for setting up and configuring each application.
All internal packages which do not make sense as a standalone package should be located in a pkg folder. The organization of files within this folder will depend on your organization. Here are some general tips.
- Keep the KISS principle in mind while organizing.
- Teams will often be tempted to design projects in familiar language folder styles, but it is important to remember Go is the language and having a package for every data model might not make sense.
- Having generic packages named
misc are not clear to developers unfamiliar with your code base. If the code base becomes large enough developers might even forget what functions are held in these generic packages and end up adding duplicate functions.
- Splitting up a single package into separate files with good names often helps organization, but having 20 files which are only 20 lines long probably doesn’t make sense and will most likely annoy developers. Sometimes one file is enough.
- Designing a package which holds all error types might seem like a good organizational tactic, but it increases package entanglement and forces developers to jump file to file.
For many projects the
bin folder is responsible for holding binary files. So it makes sense to store any binary files within this folder. Especially our compiled projects.
Having a folder dedicated to scripts works well when paired with a Makefile. Teams can have complex deploy scripts which trigger multiple actions and keep the code separate from the Makefile. This keeps your Makefile clear and simple.
Every Go project should use Godoc’s
methods, but there are benefits to making documentation which is not contained in your Go files. Having higher level docs can be tailored to users who do not care to see the inner workings. They also can house deployment processes, contributing guidelines, and other information which wouldn’t make sense to contain within the packages documentation.
This article was written as a small opinion piece on project design. It was meant to give suggestions not force standards. Feel free to disagree or share opinions or thoughts.
Check out Pathgather
if you are interested in learning more about our company.