Building a Secure, Cost-Efficient Authentication System for Millions of Users

In this article, I’ll talk about the challenges we faced while building a user authentication system: what was hardest, which bumps we hit, and how we ultimately launched in 40 countries, saved millions of dollars, and kept the order funnels intact while sustaining 23 RPS on user login.

Problem

At one of my previous jobs, I worked on a ride-hailing app. After the app went international and daily active users grew from a notional 1 million a day to 12 million, we saw a proportional surge in logins — amplified by heavy advertising and a lot of “cold” users who were just checking things out and never reached the end of the funnel.

One-line problem statement: the cost of user authentication was rising along with user growth.

Authentication via SMS

After analyzing the existing authentication system, we found the main bottleneck driving costs — phone number verification via SMS.

Below are rough prices:

Region	Cost per 1 sms
🇮🇳 India, 🇮🇩 Indonesia, 🇵🇭 Philippines	0.01 – 0.015 $
🇧🇷 Brazil, 🇲🇽 Mexico, 🇨🇴 Colombia	0.02 – 0.03 $
🇺🇸 USA, 🇨🇦 Canada	0.007 – 0.015 $
🇪🇺 Europe (France, Germany, Spain)	0.03 – 0.05 $

It’s not hard to estimate the monthly spend:

($0.03–$0.05 per SMS) × (2 million authentications per day) × (30 days) ≈ $1.8–3.0 million per month.

It’s also important to note that SMS pricing varies by region, which creates region-dependence — we’ll come back to this later.

Clearly, this is a large amount of money that we needed to reduce drastically, because the more advertising we run and the more users we attract, the higher the spend climbs.

You also have to consider failures — for example, a portion of tokens may expire when users are logging in en masse, which leads to an unsanctioned spike in authentications and SMS sends.

Requirements

Before starting implementation, we gathered the core requirements and concepts that would shape our solution. It’s important to note that the app had a high load — over 10k RPS — so the solution had to be reliable and couldn’t risk revenue loss or user drop-off.

The main principles we followed were:

Don’t break the funnels. The number of events — from opening the start screen → login → ordering a taxi — had to remain unchanged.
Significantly reduce authentication costs, keeping in mind the continued growth of active users.
Account for regional specifics (pricing and popularity of third-party apps like WhatsApp).
Maintain high-quality analytics and metrics.
Keep the codebase flexible and implementation simple. We paid special attention to clean architecture and maintainability from the start.

Funnels

A cornerstone metric for us — and one of the most critical — was the user funnel. What are funnels, and why are they so important?

It was crucial not to trade a lower authentication cost for a drop in orders, since each order generates a commission — the company’s main source of revenue.

In other words, we had to balance two directly linked metrics: the harder it is for a user to log in, the fewer orders they place. So while cutting authentication costs, we risked losing commission revenue from completed rides.

These formulas were meant to reflect the maximum acceptable drop in login conversion.

However, the calculations turned out to be overly complex, involving too many indirect dependencies — things like parallel feature rollouts or external factors unrelated to the app itself (for example, region-specific holidays).

Below is an example of what a typical funnel looks like:

That is, the app sends a metric event for each screen (each user action), and analytical dashboards are built with breakdowns by user characteristics, countries, and dates.

The key success criterion was simple: the cost of authenticating 1,000 users goes down, but the losses among those 1,000 remain minimal — that’s our main benchmark.

Solution

So, the most important part was done — we defined how to measure success and how to know things were working correctly. Now it was time to figure out what exactly to build and how.

The key idea we came up with was to add new authentication methods.

Previously, users had no real choice — an SMS was sent to them automatically. Now, we decided to give users a choice.

But the crucial part is that we didn’t just give users a choice — we fine-tuned the order in which options are shown.

For example, the sequence SMS → Call → WhatsApp is not the same as Call → SMS → WhatsApp.

Even small variations in the order or availability of authentication methods directly affect the login cost structure and the conversion rate from login to order.

Authentication Methods

We added four more authentication methods in addition to the existing SMS option.

The table below provides a brief explanation of each method:

Type	Short description	Approx. cost (per 1 auth)
SMS	Classic authentication method: the user receives a one-time code via SMS. Universal, but the most expensive and sometimes unreliable in certain regions.	$0.03 – $0.05
Voice Call	The verification code is delivered through an automated voice call. Cheaper than SMS, but some users miss the call or get confused — conversion rate is lower.	$0.01 – $0.02
WhatsApp	The code or link is sent via WhatsApp message. Convenient for countries where WhatsApp is the main communication channel and significantly cheaper than SMS.	$0.005 – $0.01
Facebook	Authentication through a social account using OAuth token. No SMS required, fast and familiar, but depends on user’s active account and consent.	≈ $0
Google	Login via Google ID. Works best for Android users and corporate accounts, provides strong security and seamless user experience.	≈ $0

It’s worth mentioning the WhatsApp-based method separately. This turned out to be the most effective authentication option in countries where the messenger is widely used.

However, implementing it isn’t straightforward — only Facebook-accredited providers can handle WhatsApp authentication. One such provider, for example, is Twilio.

Tip: here’s a small but useful trick that many companies use — you can check at the app level whether the WhatsApp client is installed on the user’s device. This way, you’ll show this authentication option only to users who already have the app. Otherwise, those who don’t will simply ignore it (no one is going to install WhatsApp just to log in), and you’ll definitely hurt your conversion rate.

Architecture

Our main goal was to keep things simple. In all my projects, I follow one guiding principle: you can always make a system better, faster, and more robust — but improvements should come only after you’ve built a working foundation that’s in production and generating feedback.

We applied the same logic here. We inherited a large and complex monolithic service that already contained the authentication module. Instead of rewriting everything, we decided to modify the existing handlers and add new ones directly within the same monolith.

This approach gave us several advantages:

Nothing was radically changed.
Minimal new code — no new microservice, and almost no changes on the mobile side (clients hit the same handlers).
No increase in RPS — no interservice communication.
We could involve the same monolith developers (easy knowledge sharing and interchangeability).

Of course, from the start, we planned that these new handlers and the authentication module would later be migrated into a dedicated service, whose development we scheduled right after the new authentication went live and we gathered initial feedback and improvement tasks.

However, this approach also brought challenges: we had to work in an old codebase, on an old framework, and the code contained a number of bugs that we had to fix on the fly.

Challenges:

Complex codebase (old legacy code)
Bugs in the code
Hard to write unit and integration tests
Difficult to emit analytics
Complicated monolith deployment (shipping under load)
Risk of taking everything down, since authentication lived inside the monolith

Tip: don’t rush to rewrite everything or jump straight to a complex architecture. Move in small steps, especially on hard projects. You’ll always have time to complicate, rewrite, replace, and polish later. It’s better to ship a rough but working product that delivers feedback and exposes metrics than a “perfect” system that never makes it to production.

Management

As I mentioned earlier, the app operates in 40 countries and regions, each with its own specifics. For example, in some countries, people are so accustomed to receiving everything via SMS that enabling voice calls there would clearly lead to a drop in conversion.

Therefore, we needed an admin panel where each regional manager could configure the order and priority of authentication methods for their region.

For example, in the screenshot above, I showed how the admin panel is structured: in Thailand, a specific order of authentication types is defined — and that’s exactly the order users will see on their devices. As I mentioned earlier, we found a direct correlation between the position of a method in the list and the one users ultimately choose.

Tip: don’t make the admin panel complicated or fancy — the key thing is clarity. End users will never see it, and once everything is configured and optimized based on metrics, people will rarely touch it again — except when adding a new country or disabling a specific method.

Challenges

Even at the analysis stage, it became clear that solving this problem would be far from trivial.

I would divide the challenges into three categories:

Business challenges:

Users had been receiving SMS codes for 10 years — habits are hard to break.
Cold users (from ads) who, if they fail to log in the first time, may never return.
40 countries — high variability and strong dependence on regional specifics.

Technical challenges:

High RPS (11k) — even a tiny error triggers an incident immediately.
Outdated legacy code.
Bugs and inconsistencies in the core authentication logic.
Complex third-party APIs.
Difficult funnel calculations; no out-of-the-box system for metric management or delivery.

Organizational challenges:

Small team.
Two mobile platforms to support (Android and iOS).
Hard to negotiate with solution providers (requires a dedicated person).

Despite all of this, we managed to overcome every challenge — though it took a lot of effort.

Security

We paid special attention to security. The worst-case scenario — something we absolutely couldn’t allow — was fraudsters gaining access to other users’ accounts, which contain ride history and wallet balances. That would be a severe reputational risk. Moreover, by law, any such data breach would require immediate notification of regulators.

I was in constant consultation with the security engineering team, and they independently stress-tested the system — attempting brute-force attacks and other unauthorized login methods.

One major improvement we made was increasing the verification code length from 4 digits to 6, making brute-force attacks exponentially harder.

Here’s how the brute-force protection mechanism works:

With each failed attempt and incorrect code entry, the delay before the next try increases progressively.

Without this mechanism, a brute-force attack would take only a few minutes — even with a 6-digit code.

Under our scheme, however, a 4-digit code would take about 55 days to brute-force, while a 6-digit code would take around 30 years.

It’s safe to say that, with this approach and properly functioning logic, a 6-digit code is effectively protected against brute-force attacks.

Launch

Once the code was written and the functionality thoroughly tested, it was time to go live.

We chose one country (region) and, in coordination with the regional operations manager, enabled a preselected configuration — SMS, Voice Call, WhatsApp. We ran it for one full day.

The next day, we disabled the new flow and started analyzing the results. Conversion had dropped. We began investigating bottlenecks — running field surveys, checking all metrics, reviewing the code, and re-testing. We couldn’t find any technical issues. It was suggested that the drop happened simply because users weren’t yet familiar with the new methods. So, we launched again.

On the second attempt, conversion also fell slightly — but much less — and then returned to its previous level.

Scaling

After obtaining stable and positive results, we started scaling to other countries and regions, continuously collecting metrics and staying in close contact with regional managers.

As mentioned earlier, we adjusted the configuration for each region individually in the admin panel.

In the end, every country had its own custom authentication setup. In some places, we even kept SMS-only authentication — where it was cheaper or where user habits made changes risky.

Logout problem

During rollout, we noticed another issue: users were logging out too often.

Why is that bad? Because every logout requires another login — which means more authentications and, consequently, higher costs.

It was therefore in our best interest to keep users logged in for as long as possible.

Unfortunately, the effort spent building this in-app survey didn’t pay off. The responses were fragmented, vague, and ultimately provided no clear insights or actionable takeaways. In hindsight, making it a native feature was a mistake — it would’ve been simpler to use a web-view form or a third-party survey tool.

Tip: don’t expect to gain meaningful insights from surveys built directly into your app — in most cases, they won’t lead to anything useful.

Fraud

Fighting fraud was particularly challenging. Attackers used third-party apps, rotated phone numbers, and called the SMS-issuance handler from different APIs—which drove up our costs.

Combatting this went beyond our team’s remit; another team handled the fight against fraudulent traffic. I’ll describe the specifics only in broad strokes.

The most important thing is to determine whether the request is coming from a bot or not. Simple checks (IP checks, request header validation, location verification, etc.) work against casual attackers, but they won’t stop real bot-farms that run attacks from real devices.

Experienced fraudsters constantly adapt to our countermeasures — it’s a cat-and-mouse game: we build defenses, they find workarounds. The more sophisticated our defenses, the more sophisticated their attacks become.

How to fight fraud:

The most reliable method is to use third-party solutions that build a unique fingerprint from device characteristics combined with specific user behavior.

As shown in the diagram, a third-party library on the device collects a fingerprint and sends it to the server for evaluation, where an algorithm uses indirect signals to return a verdict — “bot” or “human.” The approach is quite reliable, although it does produce false positives.

Tip: don’t, in the pursuit of “let’s kill all fraud,” make life harder for legitimate users — many use VPNs or various apps and devices that the system can mistakenly mark as malicious.

Additional Work

As I mentioned earlier, we didn’t just add new authentication methods and reduce company expenses — we also delivered several important improvements:

Redesign. Completely redesigned all authentication screens (phone input, code sending, code entry, etc.).
Bug fixes. Resolved numerous issues (e.g., codes not being sent, inconsistent delays between resend attempts).
Analytics and metrics. Introduced new metrics to monitor and manage both funnels and costs.
Security enhancements. Increased the code length from 4 to 6 digits and implemented device fingerprinting.

What we planned

We had grand plans for further development and improvements that, for various reasons, we didn’t manage to finish and roll out. Largely this was due to prioritisation decisions and technical impracticality or complexity.

What we didn’t have time (or couldn’t) to do:

Add more authentication methods (via Telegram, Viber, Facebook).
Key-based login. This would have solved the logout problem — the user enters a code once, and after logging out we ask them to enter that previously set code.
Remember which authentication methods failed for a given user and stop showing them.
Roll out an A/B testing platform.
Smarter funnels, more metrics, and massive, all-encompassing dashboards that would let us fine-tune methods and delays between issuing new codes more precisely.

I deeply regret not being able to implement some of this, especially the A/B platform — it would’ve allowed us to experiment with different settings across user segments and reduce authentication costs even more aggressively.

Tip: while working on the main task you’ll get lots of bright ideas — don’t rush to implement them immediately. Create a “big task” — a backlog bucket — where you attach all ideas so you can return to them later. Focus.

Results

Below are the results we achieved.

They clearly show a reduction in the company’s authentication costs, a decrease in fraud incidents, and an improvement in conversion metrics.

I hope my experience was useful and that you’ve found something valuable in this article. It was an amazing journey — launching a large-scale, high-risk overhaul of the authentication system in a big company with high RPS.

Our work paved the way for the app’s further growth by removing the dependency on expensive authentication. I’d be glad to hear your thoughts and experiences if you’ve ever tackled something similar.

Wishing you smooth logins and successful authentications! 🚀