Open source has evolved from a few pioneering transparent projects into the backbone of modern development across the industry. As a result, many projects now use the term "open source" to convey a positive impression. However, with a wide range of development practices and open-source licenses, the meaning of "open source" can vary significantly.
In this article, I aim to explore the true value of openness and identify what is and isn't genuinely open. Additionally, I will discuss the different levels of openness that projects may adopt, helping you navigate the diverse landscape of open-source projects more effectively.
The value of open source manifests in various ways. One significant advantage is transparency, which allows you to understand the code you are running, especially when processing sensitive data and information. Open source code also enables you to make repairs or enhancements to the software you use in your business or project.
However, for projects aspiring to become foundational standards for others to build upon, users seek more than just transparency—they seek certainty. This includes assurance that the project will not undergo sudden changes that could disrupt everything built on top of it, and that it will continue to be actively developed and maintained for the foreseeable future.
In this context, the approach to open source becomes crucial.
It's this kind of certainty that underscores the vital role of the
The ASF enforces strict standards for diverse contributions, independence, and activity in its projects, ensuring they can withstand the test of time as standards in software development.
Many open-source projects strive to become Apache projects to gain the community credibility necessary for adoption as standard software building blocks, such as
Other organizations, like the
In reality, independence isn't always crucial. Many open-source standards in web development, like
Instead, long-standing standards like REST and HTTP serve as the glue that connects web applications across various backend languages, frontend frameworks, and more.
In the realm of data, standards are still emerging. Some notable standards are Apache Arrow and Apache Arrow Flight for data representation in memory and data transfer, and Apache Parquet for how datasets are persisted on the file system for analytics. As datasets grow larger, there is a need for standards on how datasets spanning multiple files are represented (table formats) and how these datasets are tracked, governed, and discovered by different tools (metadata catalogs).
In the world of table formats, there are three competing standards:
When a particular standard significantly impacts how businesses must build their enterprises to interoperate with the broader ecosystem, there is greater pressure for independence. This is because the lack of assured independence can pose potential risks to ecosystem partners.
Many popular open-source projects are beloved and closely tied to particular vendors. For example, web frameworks like React and
However, there are clear risks when the underlying project is intended to be a standard that many commercial enterprises need to build and stake their business on:
Independence isn't the end-all, be-all for open source projects, but the more a project represents a standard format whose value lies in its ecosystem, the more independence should matter.
Beyond unexpected changes, licensing shifts, and an uneven playing field for the ecosystem, there are other practices to be cautious of under the guise of being open. One strategy used to avoid some traditional licensing conflicts is to offer two versions of a project: an open-source version and a proprietary version controlled by a commercial entity. The proprietary version often receives new or exclusive features first.
This practice, in itself, isn't inherently bad. Many businesses maintain commercial proprietary forks of open-source projects, but usually, the commercial version has a different name than the open-source project. For example, in the world of data catalogs,
Both aim to become community-driven projects over time but will also drive integrated features in their respective commercial products under different names. For instance, if you set up your own Nessie catalog, it has a distinct name compared to the Dremio Enterprise Catalog (formerly Arctic) integrated into Dremio Cloud.
The Dremio Enterprise Catalog is powered by Nessie but has additional features, so the different names prevent confusion about available features or which documentation to reference.
In contrast,
This creates a "muddying of the waters" between what is open and what is proprietary. This isn't an issue if you are a Databricks user, but it can be quite confusing for those who want to use these tools outside of the Databricks ecosystem.
To clarify, the fact that a project does not adhere to the highest standards of openness or is even proprietary does not diminish the quality of the project's code, the skills of its developers, or the value it can provide to its users. However, openness can serve as a signal of certainty, fostering ecosystems for standards that benefit from a growing network effect.
Independent actors within these ecosystems feel more comfortable building upon such projects, which is particularly important for standards that affect how systems communicate with each other.