The hidden costs of the Cloud

For the last decade, we got used to living lavishly - cloud resources seemed endless and efficiency was forgotten... just boot up another machine, easy as that. While the growing monetary costs are noted more and more, the inefficient use of Cloud Computing has a price tag attached that goes beyond the monetary price: In economics, these are called external effect costs or externalities. It means the costs that are borne by uninvolved third parties. Oftentimes that means the society as a whole. An example is the air pollution that motor vehicles cause.

External effect costs/externalities of the cloud

External effects and their costs are real and someone needs to bear them. Of course, the cloud has its uses and there are many use cases, where it is the right tech to be used. However, it is also an inefficient tool for many use cases. And that is where the external effects weigh heavy: Cloud applications often needlessly transfer all data that is used (primarily or solely) on the edge, to the cloud, to be stored there, and sent back when used.

Storing data in the cloud that is not needed / used there is wasteful.

To put that into perspective: Saving 1 GB of data to the cloud versus keeping it on the edge leads to about 1,000,000x CO2 emissions.* However, if you truly need to store that data centrally, the cloud might be the most sustainable option for that part of the data. With few companies (the so-called hyper-scalers) dominating the cloud market, there are further less easily measurable external costs associated, e.g. with regards to data sovereignty and privacy.

While this is a strong reason for us to question the use of the cloud for everything, there are many more reasons driving the megashift to the edge.

Trends driving the megashift to decentralized Edge Computing

The trend towards Edge Computing is pretty apparent: By 2025, 30+ billion IoT devices will be creating ~4.6 trillion GB of data per day (on the edge). The growing numbers of devices and data volume, variety, and velocity, as well as bandwidth infrastructure limitations, make it infeasible to store and process all data in a centralized cloud. On top of that, new use cases come with new requirements, a centralized cloud infrastructure cannot meet. For example, soft and hard response rate requirements, offline-functionality, and security and data protection regulations. Let alone, the side effects of storing so much data, all of it, in ONE central place….

These trends accelerate the shift away from centralized cloud computing to a decentralized edge computing topology. Edge computing refers to decentralized data processing at the “edge” of the network. For example, in a car, on a machine, on a smartphone, or in a building. Hardware specifications, unfortunately, are not well-suited to define the edge (via “edge devices”). The crucial point is rather the decentralized use of data at, or as close as possible to, the data source.

Edge computing itself is not a technology but a topology, and according to McKinsey, one of the top growing trends in tech in 2021. However, there is a gap in basic “core” edge technologies, so-called “software infrastructure”. This gap is one of the main reasons for the failure of edge projects and the delay of Edge Computing (it has been the year of the edge for at least three years ;)).

Needed: Software infrastructure for Edge Computing

With computing shifting to the edge of the network, the needs of this decentralized topology spanning a variety of hardware/software stacks, oftentimes including many restricted and embedded devices, become clear:

Need for fast local data storage

→ i.e. a machine on the factory floor collects data on stiffness, friction, pressure points. Resources can be limited on the device (CPU, Memory, power, …), and typically no connection to the Internet. Even with an Internet connection, high data rates quickly push the available bandwidth, as well as associated networking/cloud costs, to the limit. To be able to use this data, it must be persisted in a structured manner at the edge, e.g. stored locally in a database.

Need for reliable on-device data flows

→ i.e. the car is an edge device consisting of many control units. Therefore, data must be stored on multiple control units. In order to access and use the data within several of the control units of the car, the data must be selectively synchronized between the devices. A centralized structure and thus a single point of failure is unthinkable.

Need for edge-to-edge-to-cloud data flows

→ i.e. in a manufacturing hall: Typically, you will find any number of diverse devices from sensors to brownfield to greenfield devices, and no internet connectivity. At the same time, there are diverse employee devices such as tablets or smartphones, as well as central PCs, and a cloud. To extract value from the data, it must be available in raw, aggregated, or summarized form, in different places. This means it needs to be synchronized efficiently and selectively, with possible conflicts resolved.

Need for flexible edge data management

→ e.g. with the rise of IoT, time-series data have become common. However, time-series data alone is usually not sufficient and needs to be combined with other data structures (like objects) to add value. At the same time, a push to standardize data formats in industries (e.g. VSS in automotive or Umati in Industrial IoT) requires that the database supports flexible data structures.

Lack of Core Software Infrastructure for Edge Computing

While fopen() is rarely a good choice, developing solutions without software infrastructure on an individual level is possible, but has many drawbacks:

Custom in-house implementations often are cumbersome, slow, costly, and typically scale poorly. Oftentimes, applications or certain feature sets become unfeasible to deliver because of the lack of core software infrastructure. Legacy code and individual workarounds create problems over the lifetime of a product. Therefore, instead of a thriving ecosystem, only a few big players are able to implement edge solutions. Innovation and creativity are limited.

Edge databases to empower the ecosystem

There is no shortage of “IoT / Edge platforms” with new ones still springing up. Typically provided by big players with market access, they want others to fill in the core tech gap, making them available via their app stores (preferably for free). This leads to a stalling of the IoT and edge ecosystem with projects being delayed or failing. This is a huge opportunity of creating value (and saving resources) from the billions of deployed devices lost for the moment. For a thriving Edge and IoT ecosystem (and players of all sizes participating), the edge needs core edge technologies, providing developers with the tools to implement value-creating apps faster.

One such core technology is an edge database tailored to the unique requirements of the Edge Computing topology. Specifically, an edge database needs to have a footprint smaller than a couple of MB and run efficiently on a wide range of restricted devices. Because efficiency is not only beautiful but also pays with regards to monetary costs as well as externalities, such a database needs to support decentralized data flows. There is no place more efficient than at the database level to make sure data is efficiently, quickly, and reliably synchronized across devices on the edge and to and from the cloud. Again, a simple replication of all data is wasteful in the long run. To empower flexible decentralized data flows, such a database would ideally offer a selection of conflict-resolution strategies alongside the data synchronization. And last not least, data should be protected at rest, in transit, and in use. Data security typically is in a tradeoff with speed/efficiency, but the best place to decide which data to protect at which level and the most efficient spot is at the database level.

The edge ecosystem – an outlook

Edge computing brings many advantages and enables many applications and functionalities that can only be realized by computing on the edge. Up to now, however, only a few (usually large) players have been able to create value in edge computing projects and thus gain competitive advantages. One reason is the lack of basic software for the edge. A thriving edge ecosystem requires edge software infrastructure that solves the basic recurring requirements of edge projects. One such core piece of the edge tech stack is the database. Edge databases are an important building block on the way to unlocking the value of IoT and a thriving ecosystem.

* Disclaimer: This is a best-effort guestimate based on the following sources. In any case with data volumes exploding it is clear that even with a smaller delta, potential CO2 savings are huge:

→ According to the American Council for an Energy-Efficient Economy it takes 5.12 kWh of electricity per gigabyte of transferred data. A Carnegie Mellon University study concluded that the energy cost of data transfer and storage is about 7 kWh per gigabyte. An assessment of the American Council for an Energy-Efficient Economy concluded 3.1 kWh per gigabyte. So, sources vary and we needed to make assumptions. (https://www.iea.org/reports/global-energy-co2-status-report-2019/emissions, [https://www.idc.com/getdoc.jsp?containerId=prUS45213219]

Using renewable energy sounds good but no one else benefits from what will be generated, and it skews national attempts to reduce emissions. Data centres… have eaten into any progress we made to achieving Ireland’s 40% carbon emissions reduction target. They are just adding to demand and reducing our percentage.” from https://www.climatechangenews.com/2017/12/11/tsunami-data-consume-one-fifth-global-electricity-2025/)) - If someone has more input on this, we’d be happy to discuss and update.

The Megashift Towards Decentralized Edge Computing