How To Learn With Open Source API Fundamentals

Observations on how popular Open Source projects document their public APIs and how it applies to Closed Source projects

A design artwork showing interconnected plugs with an engine in the center

Besides big achievements such as Linux, the Open Source ecosystem is full of reusable projects that deal with a diverse range of technical concerns like frameworks, libraries, servers, databases, etc. As Uncle Bob puts it on the "Clean Architecture", they are just "details". They represent problems that developers don't need to think about if a decent solution built by somebody else is already available.

Most for-profit organizations also adhere to the same principles of reusable projects in a Closed Source manner. They build internal projects to deal with internal problems when Open Source doesn't fit. The reasons why they build internal projects for a problem can vary. It can be either because there's no Open Source project solution that solves their problem or the problem is too specific to their domain to make sense as an Open Source project.

However, there's surely one thing which is valuable for both Open Source and Closed Source Organizations: The public API documentation.

We know from some popular Open Source projects that the internals of a library should not be exposed to those consuming it. The reason for that is the same as the reason for why we build abstractions inside our systems. We constrain the complexity in small modules and components with high cohesion and adhere to several principles of software engineering to keep them maintainable.

The same can be done for either Open Source or Closed Source APIs. In this case, the abstraction lies in the form of a human-readable documentation instead of just forcing the consumer to rely on the raw definition of the classes or functions that compose the API. Some Open Source projects such as bluebird, httpie and pageres seem to follow the same philosophy given the quality of their documentation.

The purpose of a public API documentation is similar to the purpose of encapsulation constructs of a programming language, which is to abstract complexity by exposing a human-readable interface that can be consumed by a developer

There are many ways to consume a piece of reusable API. However, the more it lacks a human-readable form, the hardest it is to effectively consume it. Below are the options we have when consuming an API like this, in order from the most to the less effective option:

Read the human-readable documentation, which can be built either from the source code that allows generating a browsable API or a README, for example.
Read the tests that document what the project supports. This is useful if they describe clearly the behavior of the internals in a human-readable form (like when writing tests using BDD — Behavior Driven Development).
Inspect the public members. This way we can infer what is supported based on the high-level purpose of the API and the way the public members are defined and named.
Look for online examples or ask other consumers. In the case of an Open Source API, we can look for online examples. For a Closed Source API, though, we need to ask for other members of the project that have worked with that API because such information is less likely to exist anywhere online.

There are positive and negative effects for each of the approaches above. For the purpose of this post, I will focus on the one that communities usually consider the most effective way to communicate an API, the "human-readable documentation".

The more a public API lacks a human-readable documentation the hardest it is to effectively consume it

In Open Source, authors tend to not care a lot for a human-readable API. Unless the project is already popular or backed by a for-profit organization, the coding is done in the developer's unpaid free time. Developers that work in their free time usually care more about the primary goal of building interesting and useful stuff than documenting it for somebody else.

However, that mindset can go against its own primary goal.

One of the main reasons of why an Open Source project is created is to relieve others from the pain that somebody else has gone through. The project is publicized in hope that other developers find the solution interesting and useful. This way they can start using and contributing to it in order to make it more robust. If there's no public documentation in a human-readable form, unless the project is extremely popular, consumers won't know what are the supported APIs. That will reduce the chances that someone will take the time to contribute or use it, which can make the project look less interesting or useful.

Also, it makes impossible for the author to understand which features are supported and which ones are not. That makes refactoring and improvements very hard because every change needs to be considered as a breaking change due to the lack of a contract between the feature and those who are consuming it.

Not having a public API in a human-readable form can reduce the chances of attracting contributors or users to an Open Source project

In Closed Source, authors tend to not care about a human-readable API either.

When a project starts, it starts with a small team with low communication friction and a small codebase. Even on a small scale, the need for splitting a piece of concern in a separate reusable project still exist. However, due to the size of the project and the size of the team, the developers tend to not make a human-readable API documentation a priority and prefer to focus more on writing the code.

That mindset can go against the primary goal of a for-profit organization either.

When the project scales, it's very hard to find room to document what was left undocumented. The more time passes, the more rigid a project becomes because of undocumented API usage. Even if the team were to start documenting stuff at a later time, there's a chance the developers are not the same and therefore what is supported and what is not is in the head of those who already left. This can be measured by something called the Truck Factor. The Truck Factor defines the factor of risk when important knowledge is not shared among those involved in a project which can prevent others to continue the work efficiently.

The more time passes, the hardest it is to change an undocumented project because it's hard to trace how it's being consumed

Another problem arising from the lack of a human-readable API documentation is for the onboarding of new developers to the project. If there's no clear documentation, it will take more time for someone that have never worked in that project to start changing or consuming it without a lot of assistance. They will need to understand the internals, look at how the code is being consumed and what are the APIs that are implicitly supported and which ones are not.

When a project is undocumented the cost to onboard new members is considerably higher

In Open Source, if the goal is to attract contributions and be consumed, investing big effort in documenting the public API in a clear human-readable form is a must. Even if the code of the project solves a big problem and is very useful, having a poor documentation can reduce its potential because it will be hard for consumers to be aware of that usefulness. Unless it's extremely popular, it's unlikely that it will reach a considerable level of relevance to attract the desired audience.

In a Closed Source environment of a for-profit organization, the goal is to generate profit and therefore it's reasonable to imagine cases where a reusable internal API can exist for some time without documentation. Those cases are usually when the project is still a cheap prototype and there’s less than a handful of developers that care about it and don't feel the friction of communication. When the project starts to become bigger, though, it's necessary to add the documentation if we want to keep focusing on the "profit" goal. Otherwise, the friction of communication will always increase over time which will start to hurt its capability to deliver features or bug fixes.

As a general practice, we can consider that prototypes tend to become the permanent solution and so we can act earlier to prevent that from happening by not postponing documentation. However, we need to acknowledge that in for-profit organizations there's no immediate cost by not building a proper documentation when a project is in its infancy. In Open Source, though, not having documentation since the beginning can directly hurt the goal of attracting contributions.

For Closed Source communities, documentation is very important, mostly at scale. However, for Open Source, it's essential.

One difficult problem regarding documentation is that the authors of the project usually have the whole mental context of how the system works and what is supported, therefore they are more likely to fail in providing useful improvements for a public API documentation. The best way to workaround this problem is in the form of feedback and issues consumers can have. The authors of a project, be it either Open Source or Closed Source, should encourage members of the community to help to improve the documentation instead of giving away the answer to a question. This issue is a good example, it was a consumer problem that generated an action to improve the project's documentation.

The best individuals to provide feedback for the public documentation are not the ones who use it but the ones who just started using it. They don't have the knowledge of the project internals and therefore they are more likely to spot problems that the authors won't and other consumers will when they start to use the public API. This means that documentation contributions from newcomers should be as valuable, if not more, as code contributions for a project.

Documentation contributions from newcomers should be as valuable, if not more, as code contributions for a project

Now let's wrap it all up.

There are cases where it makes sense not documenting public APIs in Closed Source projects for profit organizations. However, in Open Source failing to document the public API since the beginning can really hurt the primary goal of the project.

When contributing to an Open Source or Closed Source project of a for-profit organization, be aware that the public API documentation can be as important as the code itself. Helping to improve it is extremely valuable to attract contributors or reduce the overall Truck Factor of the team.

Many Open Source projects have been documenting their APIs for a long time because of their goal. What we can do is finding ways to apply those ideas to similar contexts. Maybe if we use that mindset in a company we will also be able to improve the awareness and transform a Cathedral into a collaborative and self-sufficient Bazaar.

For-profit organizations should learn with Open Source and understand that documentation can be more important than what it seems for the goal of collaboration and knowledge sharing.

You can also use a similar technique to hide leaky abstractions in plain sight.

Thanks for reading. If you have some feedback, reach out to me on Twitter, Facebook or Github.