How to Overcome Scaling Challenges in Technical Architecture by@tobsch
259 reads

How to Overcome Scaling Challenges in Technical Architecture

by Tobias SchlottkeJuly 21st, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Scaling isn’t easy. You will need to set up processes, effective development practices and communication strategies on how things are going to work now that you are a big company with multiple teams. In this article, we discussed how to make design decisions, handle tech debt and keep in touch with developers now that the company is so much bigger. 

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How to Overcome Scaling Challenges in Technical Architecture
Tobias Schlottke HackerNoon profile picture

What worked when your company was founded will no longer work now that you are scaling and hiring at a rapid pace. How will people communicate? How will tech debt be tackled? The choices one engineer makes affect many more people so now it's time for you to strategize on how your increased workforce works together.

Rachel Potvin, VP Engineering at GitHub tackled many scaling challenges when the team she led grew threefold - to 500 people! This is what she shared on the alphalist CTO podcast on how she had to adapt as the team grew.

Tackle Tech Debt to Keep Morale Up

If an engineer is going to hit the same friction again and again due to tech debt - it is going to demoralize them. So as important as it is to launch new features, it is also important that your Investment Portfolio is healthy - which is where tackling tech debt comes in.

Create Processes for Fan-Out Work

Fan-out work is when you decide to get something done - maybe it's a large-scale code-based evolution to clean up some technical debt - and it's something that not just one team can do, but lots of different teams all have to be involved.

The way it used to work - an engineer somewhere would have a great idea and they would write a discussion post or a team post internal to GitHub and say, ”Hey, I realize this API/technology we're using it's not good. We should refactor and remove it and look, here's my PR where I removed it from my project. Now everyone, please go and do this.”

And you know perhaps when there were a small number of engineers working with influence to get stuff like that done could work, but we're not at that scale. Now we're a global company with people working all around the world. - Rachel Potvin, VP Engineering at GitHub speaking on the alphalist CTO podcast.

In the past, one influential developer might want to adapt code to get rid of tech debt - and does this by making a PR refactoring one aspect of the project and asking others to do the same on their work. But at a company so big, not everyone knows each other and it likely ends up becoming a half-done refactor. And we well know that the only thing worse than code that needs refactoring is code that is half refactored!

But that doesn’t mean tech debt and refactorings shouldn’t be initiated - it just needs to be done systematically.

Centrally you need to decide

  • What is the scope of the proposed fan-out project

  • What is it going to cost

  • What is the benefit?

All potential fan-out projects are compared in this way and a select few are picked to work on every quarter, with the goal of getting a few impactful things completed. Each chosen fan-out project is properly tracked (with TPMs and project managers etc.)  to make sure it makes a measurable distinct difference.

One such fan-out process GitHub worked on recently was cleaning up feature flags in the monolith. As you know, feature flags are an excellent tool for deployment but if too many hang around for too long they affect code readability. Over 15 years GitHub had amassed feature flags that were always on, feature flags that were never turned on, and worse-  feature flags that were deployed for some specific enterprise customers and not for others. No customer will appreciate being on a bespoke feature flag that GitHub is not properly aware of!

GitHub tackled this fan-out process by centrally staffing each one with a bunch of motivated people who were like “yeah, we really wanna do this. We wanna get this done. It's gonna be better for everyone.” This team investigated each feature flag figure to figure out who owned it and if it could be safely removed. GitHub made great progress on their product doing it this way.

Use Design Guidance to Simplify Design Reviews

The need for paved paths and design guidance is another thing that needs to be addressed as an engineering org scales and matures. How will a team building a new service know what building blocks or languages to use, or how to architect the code in a way that it will work smoothly with the work another team is doing on the monolith?

Not only does GitHub have cascading levels of design reviews, but they are increasingly investing in paved paths so engineers can focus on the novel work they’re doing and have the infrastructure to rely on that they don’t have to think about too much.

With design reviews you want to balance the effort relative to the importance of the change – not everything requires a documented design, but changes that are substantial and will have a lasting impact across engineering should be carefully reviewed and communicated.

GitHub also has a Principal Council composed of the company’s most senior technical ICs and VPs who review the most important design decisions. The Principal Council sets the overall architectural roadmap in order to simplify design guidance for teams. The Council might say that if the project meets Condition X, it should be built in Go here but if it meets Condition Y it should be built in the monolith. Of course, choosing a programming language is just scratching the surface so now the Principal Council is working on a broader future architecture plan.

Use Council Meetings to Make Aligned Technical Decisions

GitHub really wants teams to have agency in making the decisions that apply to them about the novel work that they're doing and so on. But there also needs to be a way for anyone in engineering to bring big questions that they don't feel can be decided within the realm of their own team - like bigger infrastructure questions, investment questions, or engineering-wide questions - to a senior group that is going to have thoughtful discussions which will be communicated back.

This is what happens at their council meetings. Anyone in engineering can submit a GitHub issue that will be discussed by people who work in various parts of the platform who are able to make decisions on things like “ I want to start using this database technology. I'm not sure how that's going to work on GitHub enterprise server, which is our on-prem server deployment, or on our future cloud-based SaaS offering. Is this something I can do and what are the constraints?”

Of course,  GitHub is built on GitHub and uses GitHub. (Rachel loves how when GitHub does work on their systems for internal developer productivity, it's not only helping their own developers but it’s also testing their products and making them better for users.). So GitHub uses GitHub Discussions and even repos for communication.(Occasionally they’ll write Google Docs as Google Docs are really good for iterative commenting and, and working. But then when something's locked, they bring  it into a repo.)

Assign DRIs for Effective Decision Making

Another thing that comes up in a large organisation is that sometimes decisions take too long. Not only should there be processes for healthy escalation, but each team also needs a DRI (Directly Responsible Individual). The DRI is able to make decisions, to be able to iterate, and make sure that teams have a healthy cadence of being able to move forward without getting in their own way.

Create a DevSat survey

A DevSat survey is a developer productivity and happiness survey. The GitHub DevSat survey is focused on the internal GitHub developer experience. There is a whole set of questions that they go through e.g.

  • What is causing developer friction?
  • What is the satisfaction with our tools and systems?
  • What jobs to be done have the most friction?
  • Psychological safety on teams
  • Decision-making on teams
  • The On-Call experience
  • How much unplanned work versus planned work is being done.

GitHub uses these responses to provide anonymized reports to managers and to leadership which really helps inform their investment decisions - really elevating their decision-making.

An example of this is when GitHub asked in their DevSat about any friction involved in using Codespaces within GitHub. They acquired a wealth of insight from their internal developers that the Codespaces team can use, as people tend to provide more information when they have an anonymous survey at their disposal, so GitHub can spot real trends based on what appears to be more important.  This ultimately helps make their external product better as well!

In Conclusion

Scaling isn’t easy. You will need to set up processes on how things are going to work now that you are a big company with multiple teams. No one person can understand the details of all the work that is happening, so you need effective development practices and communication strategies. In this article, we discussed how to make design decisions, handle tech debt and keep in touch with developers now that the company is so much bigger.

This article was based on alphalist CTO podcast episode 55. The alphalist podcast features interviews of CTOs and other technical leadership figures and topics range from technology to management.

Guests from leading tech companies share their best practices and knowledge.

The goal is to support other CTOs on their journey through tech and engineering, inspire and allow a sneak-peek into other successful companies to understand how they think and act. Get awesome insights into the world‘s top tech companies, personalities and trends by listening today on AppleSpotifyGoogle, Deezer and more.

If you are a CTO of tech-product company, perhaps you would be interested in joining alphalist -an exclusive CTO network? Reach out to us for more information.