Manually surveying and mapping physical infrastructure is a slow, expensive, and often dangerous job. Whether it's utility poles, cell towers, or city assets, someone has to physically go out and record the location of each item. But what if they didn't? I had a thought: Google has already driven down nearly every street on the planet. What if I could build an AI agent that could "drive" a pre-defined route, visually identify specific objects, and then use some clever math to calculate their precise GPS coordinates? So, I built one. This project is a complete pipeline that takes a route from a Google Map, uses the Street View API as its "eyes," runs a custom-trained object detection model as its "brain," and then uses geospatial calculations to create a final map of the identified assets. It's a virtual surveyor, and here's how I built it. The Blueprint: A 5-Step Geospatial AI Pipeline The entire system is a chain of Python scripts, where the output of one step becomes the input for the next. The Route (Data Prep): Clean and format a route exported from Google Maps. The Drive (Waypoint Generation): Programmatically generate high-resolution "scan points" every 10 meters along the route. The Eyes (Image Acquisition): At each scan point, use the Google Street View API to capture a 360-degree view. The Brain (Object Detection): Feed each image into a custom-trained DETR (DEtection TRansformer) model to find the target object (in this case, utility poles). The Calculation (Geolocation): This is the magic. Use the object's position in the image and the camera's metadata to triangulate the object's real-world latitude and longitude. The Route (Data Prep): Clean and format a route exported from Google Maps. The Route (Data Prep): The Drive (Waypoint Generation): Programmatically generate high-resolution "scan points" every 10 meters along the route. The Drive (Waypoint Generation): The Eyes (Image Acquisition): At each scan point, use the Google Street View API to capture a 360-degree view. The Eyes (Image Acquisition): The Brain (Object Detection): Feed each image into a custom-trained DETR (DEtection TRansformer) model to find the target object (in this case, utility poles). The Brain (Object Detection): The Calculation (Geolocation): This is the magic. Use the object's position in the image and the camera's metadata to triangulate the object's real-world latitude and longitude. The Calculation (Geolocation): Step 1 & 2: From a Rough Map to a High-Res Scan Path The process starts with a simple CSV exported from a hand-drawn Google Map. It's messy. My regex_sep_waypoint.py script uses regex to clean it up and extract a clean list of latitude/longitude waypoints. But a few points aren't enough. I need to scan the entire road. The way_point_create.py script takes these waypoints and fills in the gaps. Using the geographiclib library, it calculates the geodesic path (the shortest path on the Earth's surface) between each point and then interpolates new GPS coordinates every 10 meters. entire way_point_create.py - Generating the Scan Path way_point_create.py - Generating the Scan Path from geographiclib.geodesic import Geodesic from math import radians, sin, cos, asin, atan2, degrees # ... inside the loop between two waypoints ... result = Geodesic.WGS84.Inverse(latstart, longstart, latend, longend) distance_meters = result["s12"] bearing_degrees = result["azi1"] # Create a new point roughly every 10 meters (0.01 km) num_segments = round(distance_meters / 10) for i in range(num_segments): move_distance_km = i * 0.01 # Haversine formula to calculate the new lat/lon R = 6371 # Earth's radius in km lat1_rad = radians(latstart) lon1_rad = radians(longstart) bearing_rad = radians(bearing_degrees) lat2_rad = asin(sin(lat1_rad) * cos(move_distance_km/R) + \ cos(lat1_rad) * sin(move_distance_km/R) * cos(bearing_rad)) lon2_rad = lon1_rad + atan2(sin(bearing_rad) * sin(move_distance_km/R) * cos(lat1_rad), \ cos(move_distance_km/R) - sin(lat1_rad) * sin(lat2_rad)) new_lat = degrees(lat2_rad) new_lon = degrees(lon2_rad) # drive_predicted_location.append((new_lat, new_lon)) from geographiclib.geodesic import Geodesic from math import radians, sin, cos, asin, atan2, degrees # ... inside the loop between two waypoints ... result = Geodesic.WGS84.Inverse(latstart, longstart, latend, longend) distance_meters = result["s12"] bearing_degrees = result["azi1"] # Create a new point roughly every 10 meters (0.01 km) num_segments = round(distance_meters / 10) for i in range(num_segments): move_distance_km = i * 0.01 # Haversine formula to calculate the new lat/lon R = 6371 # Earth's radius in km lat1_rad = radians(latstart) lon1_rad = radians(longstart) bearing_rad = radians(bearing_degrees) lat2_rad = asin(sin(lat1_rad) * cos(move_distance_km/R) + \ cos(lat1_rad) * sin(move_distance_km/R) * cos(bearing_rad)) lon2_rad = lon1_rad + atan2(sin(bearing_rad) * sin(move_distance_km/R) * cos(lat1_rad), \ cos(move_distance_km/R) - sin(lat1_rad) * sin(lat2_rad)) new_lat = degrees(lat2_rad) new_lon = degrees(lon2_rad) # drive_predicted_location.append((new_lat, new_lon)) This gives me a high-resolution list of coordinates that simulates a car driving down the road and stopping every 30 feet to look around. Step 3 & 4: The Eyes and Brain - API Calls and AI Inference This is where the heavy lifting happens in deploy_model.py. The script loops through every single generated scan point. At each point, it: Calls the Google Street View API: It requests a 640x640 image at the current location, using the calculated bearing as the camera's heading. This ensures the camera is always looking down the road. Loads the Custom AI Model: I used the Hugging Face transformers library to train my own object detection model. The model_build_first.py script shows the training setup using a DETR-ResNet-50 base model. This custom model is now an expert at one thing: finding utility poles in roadside images. Performs Inference: The downloaded Street View image is passed to the model, which returns a list of detected objects, their bounding boxes, and a confidence score. Calls the Google Street View API: It requests a 640x640 image at the current location, using the calculated bearing as the camera's heading. This ensures the camera is always looking down the road. Calls the Google Street View API: Loads the Custom AI Model: I used the Hugging Face transformers library to train my own object detection model. The model_build_first.py script shows the training setup using a DETR-ResNet-50 base model. This custom model is now an expert at one thing: finding utility poles in roadside images. Loads the Custom AI Model: Performs Inference: The downloaded Street View image is passed to the model, which returns a list of detected objects, their bounding boxes, and a confidence score. Performs Inference: deploy_model.py - AI Inference Snippet```python from PIL import Image from transformers import AutoImageProcessor, AutoModelForObjectDetection import torch Load the fine-tuned model from a local directory image_processor = AutoImageProcessor.from_pretrained("./detr-5k_200e_1e-5_1e-4/78") model = AutoModelForObjectDetection.from_pretrained("./detr-5k_200e_1e-5_1e-4/78") image = Image.open('downloaded_street_view_image.jpg') with torch.no_grad(): inputs = image_processor(images=image, return_tensors="pt") outputs = model(**inputs) # Post-process to get scores and bounding boxes target_sizes = torch.tensor([image.size[::-1]]) results = image_processor.post_process_object_detection(outputs, threshold=0.9, # High confidence target_sizes=target_sizes)[0] deploy_model.py - AI Inference Snippet```python from PIL import Image from transformers import AutoImageProcessor, AutoModelForObjectDetection import torch Load the fine-tuned model from a local directory image_processor = AutoImageProcessor.from_pretrained("./detr-5k_200e_1e-5_1e-4/78") model = AutoModelForObjectDetection.from_pretrained("./detr-5k_200e_1e-5_1e-4/78") image = Image.open('downloaded_street_view_image.jpg') with torch.no_grad(): inputs = image_processor(images=image, return_tensors="pt") outputs = model(**inputs) # Post-process to get scores and bounding boxes target_sizes = torch.tensor([image.size[::-1]]) results = image_processor.post_process_object_detection(outputs, threshold=0.9, # High confidence target_sizes=target_sizes)[0] Step 5: The "Aha!" Moment - Triangulating the Object's Location This is the hardest but most rewarding part. I have the camera's GPS location and a bounding box in a 2D image. How do I get the object's real-world GPS location? With trigonometry. Estimate Distance: We can approximate the distance to the object using the pinhole camera model formula: Distance = (Focal Length * Real Object Height) / Object Height in Image. I had to estimate the "real" height of a utility pole and found the approximate focal length of Google's cameras. It's not perfect, but it's a solid starting point. Calculate Angle: The camera is looking straight ahead (the heading). The object is at some x coordinate in the image. I can calculate the angle offset from the center of the image. The final angle is camera_heading + angle_offset. Project the New Coordinate: Now I have a starting GPS point, a distance, and a bearing. I can use the Haversine formula again, but this time to project a new GPS coordinate for the detected utility pole. Estimate Distance: We can approximate the distance to the object using the pinhole camera model formula: Distance = (Focal Length * Real Object Height) / Object Height in Image. I had to estimate the "real" height of a utility pole and found the approximate focal length of Google's cameras. It's not perfect, but it's a solid starting point. Estimate Distance: Distance = (Focal Length * Real Object Height) / Object Height in Image Calculate Angle: The camera is looking straight ahead (the heading). The object is at some x coordinate in the image. I can calculate the angle offset from the center of the image. The final angle is camera_heading + angle_offset. Calculate Angle: heading x camera_heading + angle_offset Project the New Coordinate: Now I have a starting GPS point, a distance, and a bearing. I can use the Haversine formula again, but this time to project a new GPS coordinate for the detected utility pole. Project the New Coordinate: new deploy_model.py - The Geolocation Math python # ... inside the loop after a pole is detected ... # 1. Estimate Distance google_camera_focal_length = 5.1 # An approximation pic_height = 640 object_height_in_pixels = cropped_image.size[1] # A 'd_multiply' constant is used here to calibrate the distance calculation distance_estimate = ((google_camera_focal_length * pic_height) / object_height_in_pixels) * d_multiply # 2. Calculate Angle box_center_x = box[0] angle_from_camera = ((box_center_x - 320) * 0.28125) + int(camera_heading) # 0.28125 is a calculated deg/pixel ratio # 3. Project New Coordinate (using the same Haversine formula as before) lat1_rad = radians(camera_lat) lon1_rad = radians(camera_lon) # ... use distance_estimate and angle_from_camera to calculate lat2_rad, lon2_rad ... final_lat = degrees(lat2_rad) final_lon = degrees(lon2_rad) print(f"Predicted Pole Location: {final_lat}, {final_lon}") deploy_model.py - The Geolocation Math python # ... inside the loop after a pole is detected ... # 1. Estimate Distance google_camera_focal_length = 5.1 # An approximation pic_height = 640 object_height_in_pixels = cropped_image.size[1] # A 'd_multiply' constant is used here to calibrate the distance calculation distance_estimate = ((google_camera_focal_length * pic_height) / object_height_in_pixels) * d_multiply # 2. Calculate Angle box_center_x = box[0] angle_from_camera = ((box_center_x - 320) * 0.28125) + int(camera_heading) # 0.28125 is a calculated deg/pixel ratio # 3. Project New Coordinate (using the same Haversine formula as before) lat1_rad = radians(camera_lat) lon1_rad = radians(camera_lon) # ... use distance_estimate and angle_from_camera to calculate lat2_rad, lon2_rad ... final_lat = degrees(lat2_rad) final_lon = degrees(lon2_rad) print(f"Predicted Pole Location: {final_lat}, {final_lon}") The script saves every predicted location to a final CSV file, effectively building a new map from scratch. What I Learned Geospatial Math is Fun (and Hard): Libraries like geographiclib are amazing, but you still need a solid grasp of trigonometry and concepts like bearings and geodesics to make this work. Calibration is Key: The distance estimation is the weakest link. I spent a lot of time in long_test.py tuning the d_multiply variable—a "fudge factor" to calibrate the calculated distance to reality. This is a common part of turning a theoretical model into a practical tool. The Power of a Pipeline: No single script here is magic. The power comes from chaining them together. Data cleaning -> path generation -> image acquisition -> AI inference -> geospatial calculation. Each step builds on the last. Geospatial Math is Fun (and Hard): Libraries like geographiclib are amazing, but you still need a solid grasp of trigonometry and concepts like bearings and geodesics to make this work. Geospatial Math is Fun (and Hard): Calibration is Key: The distance estimation is the weakest link. I spent a lot of time in long_test.py tuning the d_multiply variable—a "fudge factor" to calibrate the calculated distance to reality. This is a common part of turning a theoretical model into a practical tool. Calibration is Key: The Power of a Pipeline: No single script here is magic. The power comes from chaining them together. Data cleaning -> path generation -> image acquisition -> AI inference -> geospatial calculation. Each step builds on the last. The Power of a Pipeline: This project was a deep dive into the practical application of computer vision. It shows that with the right tools and a bit of ingenuity, we can use the vast digital world to understand and map our physical world in entirely new ways.