Monorepo is an approach to organizing the code of many projects inside one repository. It can go as far as keeping all the code maintained by a company inside a centralized repository. Many big companies are using the monorepo strategy: Google, Meta, Microsoft, etc.
I will do my best to show you the pros and cons of monorepos, but I have a confession to make: I like monorepos. A few years ago, I was creating a separate repository for each application, but at last, I got tired of the overhead being generated. Currently, I’m in the process of migrating all those projects into one big repository.
A monorepo is a single repository that hosts multiple projects. It’s different from multi-repos (or poly-repos), where each project has its own repository. It’s different from the monolithic application because it contains different applications.
So, a simple example would be a monorepo that contains:
All that is contained in a monorepo, no matter which programming languages are used where: so you could have JS on the frontend, Python on the backend, and PHP for the website.
No matter how we organize our repositories, well-defined projects help with development and maintenance. For one thing, backend and frontend development are often done by separate teams in different technologies. The client/server split makes for a clear boundary between those parts, and with a bit of documentation, you can have a clearly defined relationship between them.
The same approach can be used for development that is done on the same end. By having multiple projects, you can use one technology stack for one application and use another one for another application. Separate projects provide more flexibility for your team. This gets more important as the team gets bigger and the solution you build becomes more complex.
So, we see reasons to leave monolithic applications behind, but what’s the problem with having different repositories for different projects? After all, we can include our own projects as dependencies in places where you need them and use the same workflow we use for using-third party libraries.
For me, the problems start when we pretend that the backend is a third-party application to the front. It’s not: those two are usually developed in parallel. When I use a third-party library, I
In the case of the frontend–backend relationship, all three points are different:
So if the needs are so different, the workflow should reflect that as well.
Let’s start with the main advantage of monorepo—you can make changes across all parts of the applications in one commit. Imagine you rename a field on a data model. This simple change will require many changes in the codebase:
In poly-repos, you’ll see many repositories affected by this change, with commits that should be developed, merged and deployed in parallel. It’s a lot of manual work and mental overhead, even when everything goes smoothly. If you need to revert the changes, things get even more ugly. Monorepos allow you to create one atomic commit that contains all changes—and merge (or revert) it when needed.
With atomic commits spanning across many projects, it’s easy to have integration tests that truly check everything together. In an ideal setup, you would have
I had been trying to achieve something like this in a poly-repo, and it was never easy. This and the atomic commits are the main reasons why I decided that monorepos are the way to go with code development.
So, after going through the main reason in favor of monorepos, let’s take a look at the downsides. The biggest one is complexity. A poly-repo allows you to pretend that each of your projects is an independent, standalone thing, so you can tackle things in more bite-sized chunks. Let’s see what gets complicated as you move many projects into one repo.
The biggest thing that gets complicated is your continuous integration (CI). By moving projects, you introduce a trade-off:
With option #2, you save time and computational resources, but you introduce a risk that some changes will not be tested even though they should be. To address this issue, my solution is to run
This way, even if my optimization will cause a regression to go unnoticed for too long, the main branch will start failing, and I'll be able to resolve the issue.
Optimizing the CI for such a scenario is not an easy task. For example, here you have a simple setup for CI for a monorepo in GitLab. As you can see, it’s much more than CI configurations for single projects.
A quiet advantage of using one repository for a project is that you can use this repository as an artifact. So, for example, in your node.js package, you could just import your library directly from a remote Git repository by installing the dependency with something similar to:
$ npm i git+https://github.com/amcharts/amcharts3.git#3.18.3
added 1 package, and audited 2 packages in 9s
found 0 vulnerabilities
In any other place, if you need your code in a specific version, you could do something similar, effectively using your Git repository as both code and artifact repository.
As you move your projects inside the monorepo, you will need to replace this way of sharing code. Otherwise, you would be downloading the whole monorepo to use only a small library that is inside. You will need some package or artifact repository. For node libraries, you could use NPM—it can host public or private packages. Or, if you use GitLab, they provide a package registry that you can use to publish packages. This registry can be used with NPM or one of the other dependency managers.
For reusing code inside the mono repo, you could use the same approach—making sure that during the CI build the packages are published before you try to use them. Alternatively, you can use direct imports between applications—the relative paths between projects are tracked by Git, so it should work smoothly on different machines.
By housing more projects in one repository, more things will be happening there. No matter your Git workflow, each developer will have more remote changes to deal with—with rebases or with merges. Even though you shouldn’t be afraid of rebases, this can add a bit of overhead to your development—especially with bigger or more productive teams.
Another advantage to the monorepo approach is creating an obvious repository where almost any code belongs. Instead of defining many projects/repositories, each in its own location, you have one repository where you put all code in different folders. The question of which repository should host the code is replaced by what folder—you can use the same rules to decide, but the stakes are much lower due to a few factors.
git grep
is a useful command to search through your repository. By default, it’s searching inside your current folder, but you can easily run it at the topmost folder of your repository to search across all projects. You could simulate something similar by getting all related projects from different repos next, but the advantage of the monorepo is that you don’t need to follow what project is being added or removed. Everything is in the repository, whether you pay attention to a given project or not.
With multi-repo projects, it’s never clear where the relationship between different repositories should be documented. Should it be a frontend README that describes the relationship with the backend, the backend’s one, or some third place? We can even consider a company-wide wiki that everybody will forget about in two months. With the monorepo, there is a folder that contains every part of your solution, and it’s an obvious place to put documentation that spans multiple projects.
Are you interested in learning more programming and JavaScript? You can sign up here to get occasional updates from me when I publish new content.
Also Published Here