I need to tell you about a code review that changed how I think about AI.
It was sometime last year. I was looking at the backend of a Series B startup. They had an AI chat assistant.
Production.
Real users.
Real data.
The kind of product where people ask personal questions because the interface feels safe.
Their AI pipeline was three lines of code.
User message goes in.
OpenAI SDK call happens.
Response comes back.
That was the whole thing.
While I was poking around their analytics (trying to understand usage patterns, not looking for trouble), I noticed a user had pasted their full social security number into the chat. Just typed it right in. "Can you help me with my tax return? My SSN is..." and the whole number, right there.
It went to OpenAI. The model processed it. The response came back with the SSN echoed in it. Their system logged the whole exchange. Three months of this before anyone noticed.
I sat with that for a while.
The thing that bothered me
The startup wasn't doing anything wrong, technically. They were using the OpenAI SDK exactly as documented. The API works as advertised. You send text, you get text back. Clean, fast, simple.
But there is nothing in that pipeline that asks: should this text actually be sent?
Is there sensitive information in here?
Should the response be scanned before we hand it back to the user?
The answer at most companies is that nobody checks. I know because I started keeping a list. Every AI integration I reviewed, I wrote down what security existed between the user and the model. Over about four months, across maybe 38 different codebases (mix of client work, open source projects, and things friends showed me), the breakdown was:
- 14 had literally nothing. Raw input to API to response.
- 21 had a system prompt that said something like "do not respond to harmful queries." That was it. The entire security model was a natural language instruction to the model.
- 3 had basic regex on the input. Usually checking for SQL injection patterns, which is the wrong threat model entirely for LLM traffic but at least showed someone was thinking about it.
- 0 had output scanning. Zero. Nobody was checking what the model said back.
So I started building.
What sentinel actually is ?
I should probably back up and explain what I built before getting into the details.
Sentinel Protocol is a security proxy. It runs locally on your laptop. It sits between your application and whatever LLM you're using. OpenAI, Anthropic, Google Gemini, Ollama, anything that speaks the OpenAI API format.
When your app makes an API call to the model, Sentinel intercepts it. It runs the request through a pipeline of security checks. If something is wrong (PII in the input, injection attempt, rate limit exceeded), it blocks or modifies the request before it ever leaves your machine. When the model responds, Sentinel scans the output too. Toxic content, hallucinated URLs, code execution suggestions, leaked system prompts. All caught on the way back.
It is a reverse proxy with opinions.
The way you use it is stupid simple. You change one line in your SDK configuration:
// this is what you have now
const openai = new OpenAI({
baseURL: 'https://api.openai.com/v1'
});
// this is what you change it to
const openai = new OpenAI({
baseURL: 'http://127.0.0.1:8787/v1',
defaultHeaders: { 'x-sentinel-target': 'openai' }
});
Your app code stays the same.
Your prompts stay the same.
Your response handling stays the same.
Sentinel does its work transparently.
One command to start it:
npx --yes --package sentinel-protocol sentinel bootstrap \
--profile paranoid --mode enforce --dashboard
Proxy on port 8787, dashboard on port 8788. Done.
I probably overbuilt it, and I do not care
As of today, Sentinel has 81 security engines. Eighty one. That is a lot. I know it is a lot. People will look at that number and think it is marketing inflation. It is not. I can walk through every single one.
There are 18 engines on the ingress side (the stuff that scans your request before it goes to the model).
Things like PII detection with 40+ pattern types.
A neural injection classifier I built using a custom rule language (I call it LFRL, it stands for Latent Feature Rule Language).
Semantic similarity scanning against a known attack corpus using local ONNX embeddings.
MCP poisoning detection for when you are running AI agents with tool calls and one of the tool servers gets compromised.
There are 14 engines for agentic security and MCP. This is the part I am most proud of, honestly, because almost nobody else is building for this threat model. If you are running multi-agent systems with tool use, the attack surface is not just the prompt. It is the tool results. A malicious MCP server can return poisoned data in a tool call response and hijack the agent's next action. Sentinel has a dedicated detector for that. It also has a shadow MCP detector (catches fake servers impersonating real ones), certificate pinning per MCP server, HMAC-signed inter-agent messaging (so agents cannot impersonate each other), and a loop breaker that catches infinite agent recursion before your budget evaporates.
There are 9 engines on the egress side. Output classifier. Hallucination tripwire (catches fabricated URLs, invented package names, numeric contradictions within the same response). Real-time SSE stream redaction. Stego exfil detection (zero-width Unicode characters used to embed hidden data in model output, which is a real thing and terrifying).
Six engines for resilience and cost. Circuit breakers, rate limiters, budget autopilot.
And 17 engines for governance. Tthreat attribution on every blocked event. Forensic debugger with replay. Evidence vault. AIBOM generator. TLA+ and Alloy formal verification specs.
I built the whole thing with 9 runtime dependencies. Nine. Because every dependency in a security tool is a liability and I wanted the thing to be auditable by one person in a weekend.
The pii engine, since that is what started all this
Let me get specific about how the PII protection works because it was the original motivation.
The scanner recognizes 40+ types of personally identifiable information. But it does not treat them all the same. I learned early on that blocking everything makes the product unusable. If someone types "email me at test@example" that is probably fine. If someone types their social security number, that absolutely is not fine.
So I built a severity tier:
Critical stuff (SSNs, credit card numbers, passport numbers, bank accounts, AWS credentials, private keys) gets blocked. Hard 403 response. The data never leaves the server. Period.
Medium stuff (email addresses, phone numbers, physical addresses) gets silently redacted. The model receives `[EMAIL_REDACTED]` instead of the real address. The user's request still goes through. They just do not realize the sensitive bit got swapped out.
Low stuff (IP addresses, generic identifiers) gets logged but passes through. You might want to know it happened, but blocking it would create too much friction for most use cases.
{
"error": "PII_DETECTED",
"reason": "pii_detected",
"pii_types": ["ssn_us", "credit_card_generic"],
"correlation_id": "52360b2d-4b92-4b30-9ace-32fae427c323",
"response_status": 403
}
There is also a two-way vault mode where PII gets tokenized. The model receives a reference token. The mapping stays on your machine. When the model's response contains that token, Sentinel detokenizes it so the user sees the real value. End to end, the model never touches the actual data. I think this is the right approach for medical and legal applications but I have not tested it at high concurrency yet, so fair warning.
The output problem
I want to spend some time on the egress side because I think it is dramatically underserved and I wasted weeks getting it right.
Everyone in the AI security space focuses on input. And input matters. But the output is where the subtle damage happens.
Last month I saw a model respond to a coding question with a recommendation to install a specific npm package. The package did not exist. The name sounded plausible. If someone had registered that name and filled it with malware, every developer who followed the model's advice would have been compromised. This is not theoretical. It is called dependency confusion through hallucination and it has been documented.
The hallucination tripwire in Sentinel catches this. It looks at URLs in the response and checks whether they are structurally suspicious. It cross-references package names against known registries. It finds numeric contradictions (the model says "99.7% accuracy" in one paragraph and "97.2% accuracy" three paragraphs later). It catches citation hallucinations (the model invents a research paper title and a plausible-looking DOI).
Is it perfect?
No.
Hallucination detection without ground truth is fundamentally hard. But the false positive rate at the default thresholds is low enough that I run it in enforce mode on my own projects and it has not blocked a legitimate response yet.
The streaming scanner was the hardest part. SSE responses come chunk by chunk. Each chunk might be three words. You cannot buffer the whole thing and scan at the end because the user is watching the stream in real time. The redaction transform holds a sliding window across chunk boundaries, scans completed segments, and forwards or redacts before the chunk reaches the client. Memory bounded. No unbounded accumulation. I rewrote it three times.
The things I got wrong
I should be honest about what did not work the first time.
The output classifier was too aggressive initially. It flagged any response that mentioned the word "execute" or "delete" or "password" even when the model was explaining why you should NOT do something. I had to build a context dampener. If the surrounding text signals educational or cautionary intent, the classifier score gets reduced. It is not machine learning. It is weighted n-gram scoring with a polarity modifier. Simple but effective enough.
I underinvested in MCP security early on. The 14 agentic engines were added in the last quarter of development. Should have been there from the beginning. Tool-using agents are the fastest growing use case and the security gap is enormous.
The "paranoid" profile (all 81 engines in enforce mode) was too aggressive for development. Kept blocking my own test requests. So I added three profiles: minimal (8 engines, monitor mode, good for dev laptops), standard (about 20 engines, monitor mode, for staging), and paranoid (all 81, enforce mode, for production security audits). Should have done that from the start.
Where it is now
As of v1.2.7 today:
- 52,069 lines of code
- 81 security engines
- 139 test suites, 567 individual tests, zero failures
- 306 files linted with zero warnings
- 9 runtime dependencies
- p95 proxy overhead under 5ms
- 0 npm audit vulnerabilities
- OWASP LLM Top 10: all 10 categories covered
- Formal verification: TLA+ spec for the security pipeline, Alloy spec for policy consistency
- Runs entirely locally. No telemetry, no cloud calls for security decisions
It works with OpenAI, Anthropic, Google Gemini, Ollama, and anything that speaks the OpenAI chat completions format. You switch between providers with a header. No code changes.
The audit log goes to `~/.sentinel/audit.jsonl`. Plain text. Grep friendly. Every decision, every blocked request, every PII type detected, with timestamps and correlation IDs. You own the data. No database. No external service.
MIT license.
No paid tiers.
No enterprise edition.
No account required.
Why open source, why now ?
I thought about monetizing this. I built it over almost a year. It is a lot of code. I could have done a hosted version with a free tier and upsold enterprise features.
But the whole point is that your data should not leave your machine. A hosted security proxy is an oxymoron. The moment I route your LLM traffic through my servers to "protect" it, I have become the threat I am trying to defend against.
So it is MIT. Clone it, fork it, embed it, sell it in your own product if you want. I do not care. I care that the tooling exists and that anyone shipping AI in production can use it without paying $30K a month for some enterprise AI governance platform that does half of what Sentinel does.
If it is useful to you, star the repo.
If it saves you time or protects your users, there is a "Sponsor" button on the GitHub page.
But neither of those things is necessary to use it.
npx --yes --package sentinel-protocol sentinel bootstrap \
--profile paranoid --mode enforce --dashboard
That is the whole thing. One command.
The repo is at github.com/sentinel-protocol
I am available to answer questions about the architecture, the engines, or why I made specific design decisions. I built this because I kept seeing the same vulnerability repeated across every AI integration I touched, and I got tired of pretending someone else was going to fix it.
Disclosure: I am the author and sole developer of Sentinel Protocol. It is not affiliated with any AI company. There is no commercial interest. The project is MIT licensed and entirely free.
