I have been building microservices in distributed systems for over a decode now.
Everything has been predictable and controllable—I never had any big issue when it comes to the behavior of these systems.
This changed when I introduced AI:
- The system started being unpredictable
- Debugging became harder
- The system became unreliable
That's when I realized:
Adding AI to microservices is not straightforward and can lead to serious problems if not handled in the right way.
What Caused These Changes?
In every system I have built before, I always had one assumption that a microservice produces the same output on the same input.
And this has always been true and is the foundation of everything:
- Debugging
- Testing,
- Caching
- And most importantly, system reliability.
On adding AI, this assumption turned false—same input produced different outputs.
The system shifted from being deterministic to probabilistic.
At first, I thought it was a normal bug. It wasn't, It was the model.
The Problematic system
I was working on a fraud detection system that analyzed users transactions and flagged suspicious activities.
It was a backend setup with:
- API gateway
- Database
- Transaction service
- And the AI service for fraud detection
The functions of the AI service were:
- To look at a user's transactions
- Assign risk score
- And to decide whether the transaction is fraudulent or not, explaining the reason behind that decision
At first, it worked as expected. The API request looked like this:
{
"amount": 1000,
"location": "California",
"merchant": "xyz online store",
"user_history": [...]
}
The response was predictable and clean:
{
"risk_score": 0.68,
"is_fraud": true,
"reason": "Transaction is unusual based on the user's past spending pattern"
}
I ran the same request multiple times, and the responses were different.
Sometimes it flagged the transaction as fraudulent, other times it didn't.
Keep in mind this was one system, same input, different outputs.
That's a very serious problem as far as financial systems are concerned.
Breakdown
The system had the following trade-offs:
It lost determinism
Microservices are deterministic in nature, whereas AI models are non-deterministic and work on the concept of probability.
When I introduced AI to the system, I lost this determinism and so the idempotency I initially relied on became useless.
Retries became unsafe and caching didn’t make sense anymore.
This broke:
- My retry logic and assumptions
- My caching strategies
- And most importantly, the confidence I had in the system—it became unreliable
Debugging became harder
Before I added AI, debugging was relatively easier and straightforward.
I could trace bugs by simply checking requests and logs, then reproduce issues.
With AI, debugging became a nightmare. I couldn’t reproduce any meaningful issues. Logs didn’t help because there was no clear execution path to inspect:
- Same request
- Same input
- Different response
I found debugging draining and almost impossible.
No explicit errors
Microservices are designed to produce and handle explicit errors. This is contrary to AI models because they don’t produce explicit errors, but instead deliver silent failures in the form of false statements—hallucination.
No error, no warning—just confident incorrect statements
{
“is_fraud”: false
“reason”: “User frequently shops at this merchant”
}
This was produced even though the user had never shopped at that merchant before.
This is extremely dangerous.
My retry logic, circuit breakers, and idempotency keys didn’t help—the system thought everything was perfect but in reality, catastrophic failures were awaiting.
API interfaces became inconsistent
Before AI, types were strict and interfaces were consistent.
With AI, interfaces lost strictness and became dangerously flexible:
- Interface fields changed or missed at times
- Format drifted
- And funny enough, reasons varied wildly.
I was no longer dealing with strict interfaces.
Mistakes Most Developers Do
Blindly trusting AI output without manual review
AI responses at times can be incorrect. Trusting it and taking their responses as final judgement can lead to:
- Miscalculated fraud
- Legal exposure, mainly from misjudgment
And bad financial decisions that can be costly
Treating AI model as an ordinary microservice
Most developers wrap LLM models in APIs thinking it will work just like any other service. It won’t since AI is probabilistic—not deterministic as a normal microservice.
Putting AI in critical paths
Putting AI directly in critical paths such as core transaction flow or payment authorization, or just directly influencing transaction handling can lead to:
- Latency spikes
- Hard to explain inconsistencies
And even cascading failures—which can render a system unavailable
Not validating AI output
AI responses are always unpredictable. Their outputs can drift from expected formats, thus validation is not only necessary—but a must.
Accepting these outputs without validation is the same as accepting user inputs without validation.
Solutions To The Above Mistakes
Most of the problems faced when adding AI models to microservices are caused by our design choices. We tend to use AI in a deterministic way contrary to its philosophy.
AI is probabilistic. You can’t fix this—but you can contain it.
The following are the solutions to most of these problems:
Enforcing strict validation
Always treat AI output as untrusted information and validate it first.
from pydantic import BaseModel, Field, ValidationError
from typing import Optional, Any
import re
import logging
logger = logging.getLogger(__name__)
class RiskAssessment(BaseModel):
risk_score: float = Field(ge=0, le=1, description="Must be between 0 and 1")
is_fraud: bool = Field(description="Must be boolean")
reason: str = Field(max_length=500, description="Explanation, max 500 chars")
def normalize_fields(raw: dict) -> dict:
normalized = {}
for key, value in raw.items():
snake_key = re.sub(r'(?<!^)(?=[A-Z])', '_', key).lower()
if snake_key in ["risk_score", "riskscore"]:
normalized["risk_score"] = value
elif snake_key in ["is_fraud", "fraud"]:
normalized["is_fraud"] = value
elif snake_key in ["reason", "explanation", "message"]:
normalized["reason"] = value
else:
normalized[snake_key] = value
return normalized
def coerce_types(raw: dict) -> dict:
coerced = raw.copy()
if "risk_score" in coerced:
try:
coerced["risk_score"] = float(coerced["risk_score"])
except (TypeError, ValueError):
coerced["risk_score"] = 0.5
if "is_fraud" in coerced:
if isinstance(coerced["is_fraud"], str):
coerced["is_fraud"] = coerced["is_fraud"].lower() in ["true", "yes", "1", "fraud"]
else:
coerced["is_fraud"] = bool(coerced["is_fraud"])
if "reason" in coerced and not isinstance(coerced["reason"], str):
coerced["reason"] = str(coerced["reason"])
return coerced
def validate_ai_output(raw_output: Any) -> Optional[dict]:
if not isinstance(raw_output, dict):
logger.error(f"AI output is not a dict: {type(raw_output)}")
return None
normalized = normalize_fields(raw_output)
coerced = coerce_types(normalized)
try:
validated = RiskAssessment(**coerced)
return validated.dict()
except ValidationError as e:
logger.error(f"Validation failed: {e}")
logger.error(f"Coerced data was: {coerced}")
return None
async def get_ai_with_validation(transaction):
raw = await call_ai_service(transaction)
if not raw:
return None
validated = validate_ai_output(raw)
if not validated:
return rule_based_fallback(transaction)
return validated
Making AI advisory
Never allow AI to decide for your system.
The workflow should be as follows:
AI suggests → the system verifies → and finally, the system decides
By combining AI with rule-based fallback, AI becomes input—not authority.
Versioning prompts
AI prompts are like code changes. Versioning prompts help to trace change in behaviour of the model with time.
Adding a control layer
Instead of calling AI service directly from core logic, wrap it with a layer responsible for:
- Validation
- Normalization
- Fallback coordination
- Prompt versioning
- And safety checks
This layer sits between the core logic and AI service.
Adding rule-based fallbacks
Despite having AI, the system should still be able to perform its operations without fully relying on AI as illustrated by the following FastAPI code snippet:
from fastapi import FastAPI
from pydantic import BaseModel, Field, ValidationError
import httpx
import logging
app = FastAPI()
logger = logging.getLogger(__name__)
class TransactionRequest(BaseModel):
amount: float
location: str
merchant: str
user_history: list = []
class RiskAssessment(BaseModel):
risk_score: float = Field(ge=0, le=1)
is_fraud: bool
reason: str = Field(max_length=500)
def rule_based_fallback(transaction: TransactionRequest):
risk_score = 0.0
if transaction.amount > 50000:
risk_score += 0.5
if transaction.location in ["California", "Dallas", "Columbus"]:
risk_score += 0.3
if transaction.user_history and transaction.merchant not in transaction.user_history:
risk_score += 0.2
risk_score = min(risk_score, 1.0)
return {
"risk_score": risk_score,
"is_fraud": risk_score > 0.6,
"reason": "Fallback: AI unavailable, using rule-based decision"
}
async def get_risk_assessment(transaction: TransactionRequest):
try:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.post(
"http://ai-service/predict",
json=transaction.dict()
)
if response.status_code != 200:
logger.warning(f"AI returned {response.status_code}, using fallback")
return rule_based_fallback(transaction)
raw = response.json()
try:
validated = RiskAssessment(**raw)
return validated.dict()
except ValidationError as e:
logger.error(f"AI validation failed: {e}")
return rule_based_fallback(transaction)
except httpx.TimeoutException:
logger.warning("AI timeout, using fallback")
return rule_based_fallback(transaction)
except Exception as e:
logger.error(f"AI error: {e}, using fallback")
return rule_based_fallback(transaction)
@app.post("/analyze")
async def analyze(transaction: TransactionRequest):
return await get_risk_assessment(transaction)
Important Checklist
To summarize, before you deploy any system that has AI model, ask yourself the following critical questions:
- Did I validate every AI output?
- Does the system have a control layer for AI?
- Does it have a fallback?
- Can the system run if AI fails?
- Is the AI authority or advisory?
If you can’t answer all those questions clearly, then the system is not ready for production.
Conclusion
Integrating AI with microservices introduces inconsistencies, makes debugging hard, and makes the system unreliable.
By applying the right measures, you can contain these problems and design reliable microservices that work seamlessly with AI models.
AI won’t crash your system, it will quietly break.
Validate everything. Never let it decide alone. And always have a fallback. That’s how your system will survive.
