As the dust of the cloud hype wave settles, more tech teams discover the side effects of cloud infrastructures, typically kept under the radar. As promising as on-demand scalability, less time managing on-premise services, and other benefits sound, they are often balanced by a significant drawback–spikes in infrastructure costs in high-load systems. When discussing infrastructure costs, the focus on high-load systems is substantial: There’s hardly a more flexible and cheaper alternative to the cloud for smaller companies. However, as QPS reaches hundreds of thousands, vendor fees that seem small are no longer sustainable. As a software development company specializing in building and optimizing high-load systems for AdTech, we have explored multiple practices teams use to prevent infrastructure costs from spiking. With over 15 years of experience, Xenoss helped support projects like Activision Blizzard, Verve Group, Smartly, Voodoo, Inmar Intelligence, and others to build robust yet nimble infrastructures. In this post, we’d like to share our experience and know-how on infrastructure challenges relevant to high-load platforms and explore ways to streamline costs. To illustrate the tactics shown in the post, we’ll use an industry where speed and scale are non-negotiable: AdTech. We also have a blog post that , featuring expert tips and comments from our software architects and a case study of a twenty-fold infrastructure cost reduction. covers infrastructure cost optimization in more detail AdTech platforms–a poster case for high-load systems High-load platforms enable multiple sectors like banking, healthcare, and more. Programmatic advertising, though often not seen as a technical feat to develop, can rival other complex systems, as its operational requirements often push the boundaries of infrastructure design. Let’s quickly recap why AdTech platforms (SSP, DSP, and so on) are an excellent lens for exploring . infrastructure cost optimization Pressure for high volume and low latency AdTech platforms are caught in a constant tug-of-war between the need for high traffic volume and low latency. On the one hand, they need to handle the vast amount of traffic generated by online advertising (which, , CEO of TPA Digital, amounts to 950 billion impressions per day). according to Wayne Bloodwell Besides the load, the real-time nature of the ecosystem adds a new layer of complexity. High latency in AdTech platforms, i.e., a delay between a bid request and a response, leads advertisers to miss out on high-quality inventory, as their bids aren’t processed in time. High latency creates a struggle to fill ad slots for publishers, leading to lower revenues in the long run. A standard time frame for bid processing hovers around 80-120 ms is the average time frame within which the industry operates. Real-time decision-making Real-time data processing is another recurring challenge for AdTech projects due to the following challenges: Need to retrieve data rapidly (under 100 ms) to make real-time decisions, such as bid price modeling. Collecting audience data from multiple sources increases the complexity of the pipelines and expands the toolset needed to process various data types. Data quality concerns: Wrong data can result in advertisers making poor bidding decisions. Data quality checks for every pipeline stage (ingestion, processing, consumption) are essential. The clip below illustrates the complexity and critical operations of real-time data analytics. https://www.youtube.com/watch?v=uaRzovqK3t0&embedable=true Scalability demands The AdTech industry is cyclical, with periods of economic ups and downs leading to fluctuations in demand for advertising services. Market hikes pressure AdTech platforms into implementing dynamic scalability capabilities. Coupled with SPO on the rise, AdTech vendors feel the pressure to reliably adjust their capacity up or down in response to changes in demand. Thus, they need the ability and resources to handle peak traffic without sacrificing performance or reliability (and to scale it down to adjust for market fluctuations). Collecting raw and aggregated data Using raw data is pivotal to the success of AdTech platforms. These systems collect a lot of aggregated data–demographic information, browsing history, user behavior, etc. These insights are integrated from various sources and help fuel targeting and personalization. Before raw data is ready for use, it needs to undergo the steps of ETL: Extraction, transformation, and loading. However, maintaining multiple pipelines becomes an engineering challenge as systems scale and data volumes increase exponentially. Best practices Xenoss uses to optimize infrastructure costs in high-load systems Unless tech teams pay close attention to infrastructure costs, they quickly get out of control. Inefficient data modeling and storage, lack of selectivity in relying on services, and failure to plan and counter threats ahead make infrastructures unpredictable, slow, expensive, and hard to maintain. Cutting infrastructure costs is not a day’s work, but armed with the knowledge of the ecosystem and your platform, you can achieve significant reductions with a few tweaks. Here’s a list of several infrastructure reduction practices Xenoss tech teams use to help our clients achieve leaner infrastructures. Exploring the benefits of hybrid cloud infrastructure In early-stage projects, not much thought is given to designing an optimal cloud infrastructure. Tech teams typically choose one of two ways; , like AWS, Google Cloud, or Microsoft Azure. While relying on a cloud vendor early in the development process is understandable, we’d like to caution tech leaders against using managed services if not strictly needed. Over time, these tools can significantly increase the project’s infrastructure bills - one of our clients reached out when infrastructure bills were at $2.5 million. Public cloud service providers . These days, maintaining on-premises data centers is not as common for early-stage projects due to the upfront investment and manpower it requires. It’s worth noting that on-prem infrastructures have their benefits, such as more control and tighter security. On-premises infrastructure maintained by the in-house team In AdTech, flexibility and the ability to scale dynamically are vital. Full control over infrastructure costs and the ability to tighten security are equally important. The former is typically associated with the cloud, while the latter is commonly cited as a benefit for on-prem. At Xenoss, we recognize the benefits of both infrastructures, which is why we use both in client projects. The combination of cloud and on-prem is often referred to as “hybrid cloud”, though more combinations fit the term. Combining a public and private cloud or two public clouds (aka multi-cloud) also fits the concept. , 33% of surveyed organizations use a combination of cloud and on-premises infrastructure. The figure goes up to 42% if we take only enterprise organizations (over 1000 employees) into account. According to the Data Pipelines report published by DZone The hybrid model offers AdTech teams higher financial flexibility, allowing AdTech platforms to merge the control of on-premises setups with the dynamic scalability of cloud platforms. Security is another significant advantage; projects can maintain stringent data protection standards by keeping sensitive data on-premise and using the cloud for less critical tasks. Another reason we prefer and advocate for a hybrid approach is its ability to prevent vendor lock-in. Keeping critical infrastructures on-premises gives businesses the leeway to diversify their tech stack without dependence on one cloud provider. Besides, a hybrid approach allows product teams to be more intentional about building workload-specific infrastructures. Some tasks in AdTech, like real-time ad bidding or data operations bound by strict regional compliance, are better suited for on-premise execution. At the same time, other workflows (campaign analytics, distributed ad content hosting, or collaborative ad design) can seamlessly migrate to the cloud. Optimizing data storage In our experience, optimizing storage alone can significantly trim infrastructure costs. In AdTech, both SQL and NoSQL databases are used to manage structured and non-structured data. Let’s recap the key differences between the two types of databases, as well as their use cases in AdTech. To add more context to the discussion, let’s recap the differences between the two. Relational database benefits NoSQL database benefits High reliability High performance High data consistency High scalability Standardized schema Storage optimized for high data volumes ACID compliance High agility and customization Now, let’s look at the database of choice for top AdTech platforms and their approaches to data storage. How AdTech vendors use SQL databases to operate high data volumes Pubmatic Pubmatic SSP helps publishers capture wide audiences and maximize advertising revenue with unique demand partnerships, advanced analytics, and creative optimization tools. the company needed a robust database to handle large datasets and solve complex problems. The company wanted a battle-tested tool that would, above all, be reliable and effective. Challenge: MySQL Solution: PubMatic’s Ad Quality team uses MySQL as its primary data source. The platform’s database stores up to a hundred million records. Known for reliability and robustness, MySQL allows PubMatic to process millions of creatives a day and maintain 2x-10x data loads. Impact: AdGreetz AdGreetz is a personalization platform that distributes tailored advertising creatives across multiple channels: social media, CTV/OTT, in-app, and more. the organization’s workflows are data-intensive, requiring database management solutions that would support millions of user records. Challenge: ClickHouse Chosen database: For the engineering team at AdGreetz, Clickhouse turned out to be a cost-efficient and high-performance solution. The company was able to cut query time from seconds to sub-seconds at small compute. Impact: How AdTech projects use NoSQL databases Beeswax Beeswax is a managed RTB platform that allows advertisers to streamline programmatic operations. The company offers a Bidder-as-a-Service solution that processes millions of queries per second and consumes 125 GB of data each minute. Rapid scaling that would ensure efficient ad delivery, need for equal load distribution across the organization’s machine. Challenge: . Chosen NoSQL database: Aerospike running on Amazon EC2 Beeswax can process millions of queries per second with a 2-milliseconds tail-read latency. Impact: GumGum GumGum offers a platform for contextual targeting enabled by a proprietary machine-learning platform, Verity. The company wanted to process high volumes of advertising-related data (impressions, views, clicks, conversion) with minimal latency - although data was not processed in real-time, the goal was to keep the gap to the minimum. Challenge: Chosen NoSQL database: ScyllaDB Impact: Reduced strain on engineering resources 75% volume increase Facilitating scalability thanks to on-demand resource provisioning. Moloco Moloco is a mobile audience platform that helps advertisers acquire, engage, and retail mobile audiences. The platform heavily relies on machine learning models for campaign optimization and predictive analytics. The pressure to process millions of bid quests per second with a strict latency limit (under 100 ms). Challenge: Chosen NoSQL database: Google Cloud BigTable Impact: Scaled the number of processed requests from 500k to 5 million per second Low latency Managed infrastructure enabled the company to reallocate its software engineering resources to focus on other tasks. Our years of experience in AdTech platform development have shown us that there’s no cookie-cutter approach to choosing the suitable database for AdTech data storage infrastructure. There’s a lot of variety under the database umbrella - it takes experience, product knowledge, and thorough research to find the right fit. Sometimes, switching between two NoSQL databases can make a lot of difference. GumGum, featured above, relied on Cassandra before switching to ScyllaDB. We’ve seen significant operation cost reductions in a client’s (mobile DSP) case after migrating from MongoDB to Aerospike. Other ways to optimize data storage Implementing data compression and deduplication techniques is another way to reduce the required storage space, leading to cost savings. implies a reduction in data size, which leads to faster transmission and reduced storage costs. Data teams can employ techniques like GZIP. Compression , as the name suggests, eliminates redundant copies of data. It is instrumental in AdTech, where repeated user profiles or similar datasets are commonplace. Deduplication is a cost-effective way to store rarely accessed data (old campaign metrics) with no performance repercussions. Cold storage Selecting premium services offered by an infrastructure vendor Navigating cloud services requires intelligent choices. If you don’t pay attention, it’s easy to use service bundles that add infrastructure costs but no value to the platform. In a clip below, Xenoss CTO Vova Kyrychenko explains how the “free money trap” can result in high infrastructure costs as AdTech platforms scale. https://www.youtube.com/watch?v=q_57WdKDJI0&embedable=true Our crucial recommendation to AdTech vendors is to dissect the pricing of premium services to spot hidden costs or savings.” Also, since new tools can slow the platform down, testing them on a small scale before you take a new service to production is reasonable. Keeping an eye on third-party or open-source projects is another alternative to expensive managed offerings. Free or low-cost platforms can offer better performance than mainstream cloud vendors. By adopting this approach on a client project, Xenoss’ engineers helped drive infrastructure costs down by 20 times. In the infographic below, we illustrate the client’s old infrastructure and the modernized version designed by our architects. Balancing traffic and load As we mentioned a while back, AdTech platforms don’t operate under stable loads - one moment, a platform might hit a sudden spike, and the next, it has more computing resources than it knows what to do with. Since Xenoss’ engineers believe efficient traffic and load balancing is a must-have for AdTech systems, let’s dive deeper into these concepts. Load balancing means evenly distributing incoming requests across multiple servers, ensuring no single server is overwhelmed. Within this framework, Xenoss architects prioritize mission-critical processes—essential tasks that, if interrupted, will disrupt the system's core functionality (real-time ad bidding or user data processing). By giving these processes precedence, we safeguard the vital operations from potential slowdowns or failures. Designing a mechanism for early threat detection A famous adage goes: “Failure is a part of every plan,” concisely warning AdTech product teams to guard for threats and downtimes. To that end, we urge vendors and in-house tech teams to leverage monitoring tools that keep an eye on system health, ensuring uninterrupted operations. If you set up alerts for any anomalies, teams can be alerted promptly, act swiftly, and ensure minor setbacks don’t turn into major meltdowns. Enhancing this approach with AI-powered insights offers even more granularity. Anomaly Detection algorithms, such as the Isolation Forest or One-Class SVM, are a good fit for identifying unusual data patterns, which can indicate threats or system vulnerabilities. We again suggest deploying Long Short-Term Memory recurrent neural networks to analyze time-series data. Large Language Models can also contribute to threat detection by analyzing logs and system messages to detect anomalies, thus making sense of textual data that might otherwise be overlooked. The bottom line Infrastructure cost optimization is a linchpin for companies in every sector aiming for efficiency and profitability. AdTech is an excellent playground for exploring the challenges and workarounds of working with high data volumes and traffic loads, as the need to tackle thousands of queries in a millisecond time frame pushes the limits of infrastructure development to the edge. The good news is that experienced tech teams have, often through trial and error, developed a handbook for keeping infrastructure costs low, even for high-load systems. Balancing between cloud and on-premises solutions, leveraging AI for threat detection, and continuously refining data storage strategies help product teams ensure robust operations without compromising on budget. Staying agile and informed in this domain is a cost-saving measure and a competitive advantage in the dynamic AdTech landscape.