I Spent 15 Years Building Geospatial Threat Detection Systems. Here's Everything I Learned.

It was 2:47 AM when my phone buzzed. Not the usual alert. This was the priority channel. Our geospatial threat detection system had flagged something strange: a device registered to a senior finance executive was authenticating to our client's trading platform. Normal enough, except the device was simultaneously reporting GPS coordinates in Manhattan and cellular tower associations in Lagos. and Physics doesn't work that way. Within three minutes, we confirmed the executive's actual phone was on his nightstand in Connecticut. Someone had cloned his credentials and was attempting to access the trading system from Nigeria. But they had made a mistake. They'd spoofed the GPS coordinates without realizing our system also triangulated cellular signals. The attack was blocked. No data exfiltrated. No trades executed. Forty-seven minutes later, the SOC team's traditional SIEM finally flagged the login as suspicious based solely on IP geolocation, which had been VPN'd through a legitimate New York exit node. That 47-minute gap is why I've spent the last 12 years building geospatial intelligence systems for threat detection. The Fundamental Problem: Your Security Stack is Spatially Blind Here's an uncomfortable truth: nearly 80% of enterprise data contains a location component. Your security infrastructure uses almost none of it. Think about what your current stack actually knows about geography: IP geolocation: Accurate to the city level, maybe. Trivially spoofed with any VPN.Time zone from browser: Self-reported. Meaningless."Impossible travel" detection: Flags if someone logs in from London then Tokyo within an hour. Catches the laziest attackers. IP geolocation: Accurate to the city level, maybe. Trivially spoofed with any VPN. IP geolocation Time zone from browser: Self-reported. Meaningless. Time zone from browser "Impossible travel" detection: Flags if someone logs in from London then Tokyo within an hour. Catches the laziest attackers. "Impossible travel" detection Now think about what modern devices actually know about their location: actually GPS coordinates (3-10 meter accuracy)Cellular tower associations (50-300 meter triangulation)Wi-Fi access point identifiers (15-40 meter positioning)Bluetooth beacon proximity (1-5 meter range)Barometric pressure (floor-level detection in buildings) GPS coordinates (3-10 meter accuracy) Cellular tower associations (50-300 meter triangulation) Wi-Fi access point identifiers (15-40 meter positioning) Bluetooth beacon proximity (1-5 meter range) Barometric pressure (floor-level detection in buildings) This data exists. It flows through your MDM, your asset tracking systems, your endpoint agents. It almost never reaches your security tools. Attackers know this. They've built entire attack categories around it. The Attacks You are Not Detecting GPS Spoofing For about $300 in hardware from Amazon, anyone can broadcast fake GPS signals that override legitimate satellite positioning within a 50-meter radius. Originally a concern for military and aviation, GPS spoofing has gone mainstream: Geofenced malware activation: Malware that only executes when the device reports specific coordinates evading sandbox analysis that runs in known data center locations.Location-based access control bypass: Systems that grant privileged access based on "being in the office" can be fooled by spoofed coordinates.Fleet and logistics manipulation: Attackers redirecting delivery vehicles or manipulating location-verified transactions. Geofenced malware activation: Malware that only executes when the device reports specific coordinates evading sandbox analysis that runs in known data center locations. Geofenced malware activation Location-based access control bypass: Systems that grant privileged access based on "being in the office" can be fooled by spoofed coordinates. Location-based access control bypass Fleet and logistics manipulation: Attackers redirecting delivery vehicles or manipulating location-verified transactions. Fleet and logistics manipulation A GPS-only detection system would never catch this. The coordinates look legitimate. Cellular Network Attacks IMSI catchers (fake cell towers) aren't just for nation-states anymore. For under $2,000, you can intercept cellular traffic, track device movements, and inject malicious payloads. More sophisticated attacks: Silent SMS triangulation: Locating a target device without any user-visible indication.Tower spoofing for location falsification: Making a device appear to be in a location it isn't.Man-in-the-middle on cellular data: Intercepting authentication tokens, API keys, or session data. Silent SMS triangulation: Locating a target device without any user-visible indication. Silent SMS triangulation Tower spoofing for location falsification: Making a device appear to be in a location it isn't. Tower spoofing for location falsification Man-in-the-middle on cellular data: Intercepting authentication tokens, API keys, or session data. Man-in-the-middle on cellular data The Credential Cloning Problem The Lagos attack I mentioned earlier is increasingly common. Attackers steal credentials, clone device identifiers, and attempt to authenticate from locations that should trigger alarms but don't, because enterprise security relies on easily-spoofed signals. The common thread: location-based attacks exploit the assumption that security systems can't see the physical world. How We Built a System That Actually Works After watching organizations get burned by location-blind security, I set out to build something different. What followed was 12 years of iteration, failure, and gradual improvement. Here's what actually works. Architecture Overview The system has four layers. ┌─────────────────────────────────────────────────────────┐ │ CORRELATION ENGINE │ │ (Joins location intelligence with security telemetry) │ ├─────────────────────────────────────────────────────────┤ │ BEHAVIORAL MODELING │ │ (GMMs for spatial patterns, anomaly scoring) │ ├─────────────────────────────────────────────────────────┤ │ SENSOR FUSION │ │ (Extended Kalman Filter, multi-signal triangulation) │ ├─────────────────────────────────────────────────────────┤ │ SIGNAL INGESTION │ │ (GPS, Cellular, Wi-Fi, Bluetooth, IP → normalized) │ └─────────────────────────────────────────────────────────┘ Each layer solves a specific problem. Skip one, and the system fails. Layer 1: Signal Ingestion The problem: Location data arrives from different sources with different schemas, accuracies, and failure modes. GPS is precise but spoofable. Cellular is harder to spoof but less precise. Wi-Fi depends on access point databases that may be stale. The problem The solution: Normalize everything to a common schema while preserving source-specific metadata. The solution @dataclass class LocationSignal: device_id: str timestamp: datetime latitude: float longitude: float accuracy_meters: float source: Literal["gps", "cellular", "wifi", "bluetooth", "ip"] confidence: float # 0.0 to 1.0 raw_metadata: dict # Source-specific: cell tower ID, BSSID, etc. @dataclass class LocationSignal: device_id: str timestamp: datetime latitude: float longitude: float accuracy_meters: float source: Literal["gps", "cellular", "wifi", "bluetooth", "ip"] confidence: float # 0.0 to 1.0 raw_metadata: dict # Source-specific: cell tower ID, BSSID, etc. The raw_metadata field is critical. When the fusion layer detects something suspicious, you need the original data for investigation. Knowing that GPS said "Manhattan" while cellular said "Lagos" is more actionable than just knowing "location anomaly detected." raw_metadata Implementation note: We use Kafka for stream ingestion with strict schema enforcement. Location data is high-volume and time-sensitive. Implementation note Layer 2: Sensor Fusion (Where the Magic Happens) The problem: You have five location signals that disagree with each other. Which one is right? Or is one of them being attacked? The problem The solution: Extended Kalman Filter with adaptive process noise and attack detection on the innovation sequence. The solution This is the core of the system, so let me explain it in detail. A Kalman Filter maintains a probabilistic estimate of a system's state (in our case: position and velocity) and updates that estimate as new observations arrive. The "extended" variant handles nonlinear measurement models, which we need for converting raw signals to position estimates. The state vector: x = [latitude, longitude, velocity_north, velocity_east] x = [latitude, longitude, velocity_north, velocity_east] Each signal source has a measurement model that maps observations to position estimates, along with a measurement noise covariance matrix that encodes how much we trust that source: SignalTypical Noise (meters)NotesGPS (open sky)5-10High confidenceGPS (urban canyon)20-50Multipath effectsCellular100-300Tower density dependentWi-Fi15-40AP database quality dependent SignalTypical Noise (meters)NotesGPS (open sky)5-10High confidenceGPS (urban canyon)20-50Multipath effectsCellular100-300Tower density dependentWi-Fi15-40AP database quality dependent SignalTypical Noise (meters)Notes SignalTypical Noise (meters)Notes Signal Typical Noise (meters) Notes GPS (open sky)5-10High confidenceGPS (urban canyon)20-50Multipath effectsCellular100-300Tower density dependentWi-Fi15-40AP database quality dependent GPS (open sky)5-10High confidence GPS (open sky) 5-10 High confidence GPS (urban canyon)20-50Multipath effects GPS (urban canyon) 20-50 Multipath effects Cellular100-300Tower density dependent Cellular 100-300 Tower density dependent Wi-Fi15-40AP database quality dependent Wi-Fi 15-40 AP database quality dependent The key insight: the Kalman Filter's innovation sequence reveals attacks. the Kalman Filter's innovation sequence reveals attacks. The "innovation" is the difference between what the filter predicted and what it observed. Under normal conditions, innovations follow a known distribution. When an attacker spoofs one signal but not others, the innovations for that signal become statistical outliers. def detect_signal_manipulation(innovation, expected_covariance): """ Mahalanobis distance on innovation sequence. High values indicate the observation doesn't match the model— either the model is wrong, or the signal is being manipulated. """ mahal_distance = np.sqrt(innovation.T @ np.linalg.inv(expected_covariance) @ innovation) # Threshold tuned empirically; 3.5 works well in practice if mahal_distance > 3.5: return SignalAnomaly( severity="high", signal_source=observation.source, mahalanobis_distance=mahal_distance, expected_position=predicted_state[:2], observed_position=observation.position ) return None def detect_signal_manipulation(innovation, expected_covariance): """ Mahalanobis distance on innovation sequence. High values indicate the observation doesn't match the model— either the model is wrong, or the signal is being manipulated. """ mahal_distance = np.sqrt(innovation.T @ np.linalg.inv(expected_covariance) @ innovation) # Threshold tuned empirically; 3.5 works well in practice if mahal_distance > 3.5: return SignalAnomaly( severity="high", signal_source=observation.source, mahalanobis_distance=mahal_distance, expected_position=predicted_state[:2], observed_position=observation.position ) return None This is how we caught the Lagos attack. The GPS signal said Manhattan. The cellular signal (which the attacker hadn't spoofed) said Lagos. The innovation on the GPS measurement was off the charts. Mahalanobis distance of 847, when our threshold is 3.5. Lesson learned the hard way: Adaptive process noise matters enormously. A device sitting on a desk has different motion characteristics than one in a moving vehicle. We spent six months tuning the process noise model before it stopped generating false positives for people taking the subway. Lesson learned the hard way Layer 3: Behavioral Modeling The problem: A device in an unusual location isn't automatically suspicious. People travel. They work from coffee shops. They visit client sites. You need to distinguish "unusual but legitimate" from "unusual and concerning." The problem The solution: Gaussian Mixture Models for spatial behavior, with temporal and transition patterns. The solution For each device/user, we build a behavioral baseline: Location clusters: K-means on historical positions. "Places this device normally goes."Temporal patterns: When does the device appear at each cluster? An executive's laptop being in the office at 2 PM is normal; at 2 AM is notable.Transition patterns: How does the device move between clusters? Typical commute routes, travel velocities, common sequences. Location clusters: K-means on historical positions. "Places this device normally goes." Location clusters Temporal patterns: When does the device appear at each cluster? An executive's laptop being in the office at 2 PM is normal; at 2 AM is notable. Temporal patterns Transition patterns: How does the device move between clusters? Typical commute routes, travel velocities, common sequences. Transition patterns The model trains on 4-6 weeks of data per device. Less than that, and you don't capture enough variation. More, and you're modeling outdated patterns (someone who changed jobs, moved apartments, etc.). Anomaly scoring combines all three factors: def compute_anomaly_score(observation, baseline): # Spatial: how far from known locations? spatial_score = min_mahalanobis_to_clusters( observation.position, baseline.clusters ) # Temporal: how likely is this location at this time? temporal_score = -np.log( baseline.time_probability( observation.hour, observation.day_of_week, nearest_cluster(observation.position, baseline.clusters) ) + 1e-10 # Avoid log(0) ) # Transition: is the movement physically plausible? if baseline.last_observation: velocity = compute_velocity(baseline.last_observation, observation) transition_score = velocity_plausibility(velocity, baseline.typical_velocities) else: transition_score = 0 # Weighted combination; weights tuned per deployment return 0.4 * spatial_score + 0.35 * temporal_score + 0.25 * transition_score def compute_anomaly_score(observation, baseline): # Spatial: how far from known locations? spatial_score = min_mahalanobis_to_clusters( observation.position, baseline.clusters ) # Temporal: how likely is this location at this time? temporal_score = -np.log( baseline.time_probability( observation.hour, observation.day_of_week, nearest_cluster(observation.position, baseline.clusters) ) + 1e-10 # Avoid log(0) ) # Transition: is the movement physically plausible? if baseline.last_observation: velocity = compute_velocity(baseline.last_observation, observation) transition_score = velocity_plausibility(velocity, baseline.typical_velocities) else: transition_score = 0 # Weighted combination; weights tuned per deployment return 0.4 * spatial_score + 0.35 * temporal_score + 0.25 * transition_score Critical implementation detail: Thresholds must be per-device, not global. A salesperson who travels constantly has different "normal" variance than an engineer who works from home. We set thresholds at the 99th percentile of each device's historical anomaly score distribution. Critical implementation detail Layer 4: Correlation Engine The problem: A location anomaly alone isn't actionable. You need context. What was the device doing when the anomaly occurred? What other signals support or contradict the alert? The problem The solution: Real-time joins between location intelligence and security telemetry. The solution We maintain materialized views in ClickHouse (columnar OLAP store optimized for real-time analytics) that join: Fused position data with confidence scoresBehavioral anomaly scoresNetwork flows (source/destination, bytes, protocol)Authentication eventsEndpoint telemetry (process execution, file access)Threat intelligence feeds Fused position data with confidence scores Behavioral anomaly scores Network flows (source/destination, bytes, protocol) Authentication events Endpoint telemetry (process execution, file access) Threat intelligence feeds This enables investigation queries like: -- Find authentication events where location contradicts user baseline SELECT auth.timestamp, auth.user_id, auth.resource, auth.outcome, loc.fused_position, loc.anomaly_score, baseline.nearest_cluster FROM authentication_events auth JOIN location_intelligence loc ON auth.device_id = loc.device_id AND abs(auth.timestamp - loc.timestamp) baseline.alert_threshold AND auth.timestamp > now() - interval '1 hour' ORDER BY loc.anomaly_score DESC -- Find authentication events where location contradicts user baseline SELECT auth.timestamp, auth.user_id, auth.resource, auth.outcome, loc.fused_position, loc.anomaly_score, baseline.nearest_cluster FROM authentication_events auth JOIN location_intelligence loc ON auth.device_id = loc.device_id AND abs(auth.timestamp - loc.timestamp) baseline.alert_threshold AND auth.timestamp > now() - interval '1 hour' ORDER BY loc.anomaly_score DESC Alert rules fire on compound conditions: Location anomaly AND connection to suspicious destinationLocation anomaly AND authentication failure AND outside business hoursSignal manipulation detected AND privileged resource access Location anomaly AND connection to suspicious destination AND Location anomaly AND authentication failure AND outside business hours AND AND Signal manipulation detected AND privileged resource access AND Single-factor alerts generate too much noise. Compound conditions are where the signal-to-noise ratio becomes manageable. Results: From 61% to 94.3% After deploying this architecture across financial services, healthcare, and critical infrastructure clients, we measured the improvement: MetricIP-Only BaselineGeospatial SystemAttack attribution accuracy61%94.3%Median detection latency (location attacks)47 minutes2.8 minutesFalse positive rate12%3.1% MetricIP-Only BaselineGeospatial SystemAttack attribution accuracy61%94.3%Median detection latency (location attacks)47 minutes2.8 minutesFalse positive rate12%3.1% MetricIP-Only BaselineGeospatial System MetricIP-Only BaselineGeospatial System Metric IP-Only Baseline Geospatial System Attack attribution accuracy61%94.3%Median detection latency (location attacks)47 minutes2.8 minutesFalse positive rate12%3.1% Attack attribution accuracy61%94.3% Attack attribution accuracy 61% 94.3% Median detection latency (location attacks)47 minutes2.8 minutes Median detection latency (location attacks) 47 minutes 2.8 minutes False positive rate12%3.1% False positive rate 12% 3.1% The 33-percentage-point improvement in attribution accuracy comes primarily from catching attacks that IP-based systems miss entirely. GPS spoofing, credential cloning from different continents, geofenced malware—these attacks are invisible to traditional tools. The latency improvement (47 minutes → 2.8 minutes) matters because location-based attacks are often reconnaissance for larger operations. Catching them early disrupts the kill chain. Where This Goes Next The attacks are evolving. Three trends I am watching: Indoor positioning attacks: As enterprises deploy indoor positioning (Bluetooth beacons, Wi-Fi RTT), attackers will target these systems. Most indoor positioning has no authentication—beacon spoofing is trivial. Indoor positioning attacks ML-powered evasion: Attackers will use machine learning to model "normal" location patterns and generate spoofed trajectories that evade behavioral detection. We're already seeing primitive versions of this. ML-powered evasion Edge cases in legitimate behavior: Remote work has made behavioral baselines harder to maintain. "Works from home" now includes "works from Airbnbs in different countries." Distinguishing legitimate digital nomads from attackers using VPNs is an unsolved problem. Edge cases in legitimate behavior The fundamental insight remains: security tools that ignore physical-world signals are fighting with one hand tied behind their back. Location data is abundant, attackers are exploiting its absence, and the gap between location-aware and location-blind security is only growing. The physical world and the digital world are converging. Your security architecture should too.