Virtual Surveyor With Python, AI, and Google Street View to Automate Infrastructure Mapping

Manually surveying and mapping physical infrastructure is a slow, expensive, and often dangerous job. Whether it's utility poles, cell towers, or city assets, someone has to physically go out and record the location of each item. But what if they didn't?

I had a thought: Google has already driven down nearly every street on the planet. What if I could build an AI agent that could "drive" a pre-defined route, visually identify specific objects, and then use some clever math to calculate their precise GPS coordinates?

So, I built one. This project is a complete pipeline that takes a route from a Google Map, uses the Street View API as its "eyes," runs a custom-trained object detection model as its "brain," and then uses geospatial calculations to create a final map of the identified assets.

It's a virtual surveyor, and here's how I built it.

The Blueprint: A 5-Step Geospatial AI Pipeline

The entire system is a chain of Python scripts, where the output of one step becomes the input for the next.

The Route (Data Prep): Clean and format a route exported from Google Maps.
The Drive (Waypoint Generation): Programmatically generate high-resolution "scan points" every 10 meters along the route.
The Eyes (Image Acquisition): At each scan point, use the Google Street View API to capture a 360-degree view.
The Brain (Object Detection): Feed each image into a custom-trained DETR (DEtection TRansformer) model to find the target object (in this case, utility poles).
The Calculation (Geolocation): This is the magic. Use the object's position in the image and the camera's metadata to triangulate the object's real-world latitude and longitude.

Step 1 & 2: From a Rough Map to a High-Res Scan Path

The process starts with a simple CSV exported from a hand-drawn Google Map. It's messy. My regex_sep_waypoint.py script uses regex to clean it up and extract a clean list of latitude/longitude waypoints.

But a few points aren't enough. I need to scan the entire road. The way_point_create.py script takes these waypoints and fills in the gaps. Using the geographiclib library, it calculates the geodesic path (the shortest path on the Earth's surface) between each point and then interpolates new GPS coordinates every 10 meters.

way_point_create.py - Generating the Scan Path

from geographiclib.geodesic import Geodesic
from math import radians, sin, cos, asin, atan2, degrees

# ... inside the loop between two waypoints ...
result = Geodesic.WGS84.Inverse(latstart, longstart, latend, longend)
distance_meters = result["s12"]
bearing_degrees = result["azi1"]

# Create a new point roughly every 10 meters (0.01 km)
num_segments = round(distance_meters / 10)

for i in range(num_segments):
    move_distance_km = i * 0.01
    
    # Haversine formula to calculate the new lat/lon
    R = 6371 # Earth's radius in km
    lat1_rad = radians(latstart)
    lon1_rad = radians(longstart)
    bearing_rad = radians(bearing_degrees)

    lat2_rad = asin(sin(lat1_rad) * cos(move_distance_km/R) + \
                    cos(lat1_rad) * sin(move_distance_km/R) * cos(bearing_rad))
    
    lon2_rad = lon1_rad + atan2(sin(bearing_rad) * sin(move_distance_km/R) * cos(lat1_rad), \
                               cos(move_distance_km/R) - sin(lat1_rad) * sin(lat2_rad))

    new_lat = degrees(lat2_rad)
    new_lon = degrees(lon2_rad)
    # drive_predicted_location.append((new_lat, new_lon))

This gives me a high-resolution list of coordinates that simulates a car driving down the road and stopping every 30 feet to look around.

Step 3 & 4: The Eyes and Brain - API Calls and AI Inference

This is where the heavy lifting happens in deploy_model.py. The script loops through every single generated scan point. At each point, it:

Calls the Google Street View API: It requests a 640x640 image at the current location, using the calculated bearing as the camera's heading. This ensures the camera is always looking down the road.
Loads the Custom AI Model: I used the Hugging Face transformers library to train my own object detection model. The model_build_first.py script shows the training setup using a DETR-ResNet-50 base model. This custom model is now an expert at one thing: finding utility poles in roadside images.
Performs Inference: The downloaded Street View image is passed to the model, which returns a list of detected objects, their bounding boxes, and a confidence score.

deploy_model.py - AI Inference Snippet```python
from PIL import Image
from transformers import AutoImageProcessor, AutoModelForObjectDetection
import torch
Load the fine-tuned model from a local directory
image_processor = AutoImageProcessor.from_pretrained("./detr-5k_200e_1e-5_1e-4/78")
model = AutoModelForObjectDetection.from_pretrained("./detr-5k_200e_1e-5_1e-4/78")
image = Image.open('downloaded_street_view_image.jpg')
with torch.no_grad():
inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs)

# Post-process to get scores and bounding boxes
target_sizes = torch.tensor([image.size[::-1]])
results = image_processor.post_process_object_detection(outputs, 
                                                        threshold=0.9, # High confidence
                                                        target_sizes=target_sizes)[0]

Step 5: The "Aha!" Moment - Triangulating the Object's Location

This is the hardest but most rewarding part. I have the camera's GPS location and a bounding box in a 2D image. How do I get the object's real-world GPS location? With trigonometry.

Estimate Distance: We can approximate the distance to the object using the pinhole camera model formula: Distance = (Focal Length * Real Object Height) / Object Height in Image. I had to estimate the "real" height of a utility pole and found the approximate focal length of Google's cameras. It's not perfect, but it's a solid starting point.
Calculate Angle: The camera is looking straight ahead (the heading). The object is at some x coordinate in the image. I can calculate the angle offset from the center of the image. The final angle is camera_heading + angle_offset.
Project the New Coordinate: Now I have a starting GPS point, a distance, and a bearing. I can use the Haversine formula again, but this time to project a new GPS coordinate for the detected utility pole.

deploy_model.py - The Geolocation Math
python
# ... inside the loop after a pole is detected ...

# 1. Estimate Distance
google_camera_focal_length = 5.1 # An approximation
pic_height = 640
object_height_in_pixels = cropped_image.size[1]
# A 'd_multiply' constant is used here to calibrate the distance calculation
distance_estimate = ((google_camera_focal_length * pic_height) / object_height_in_pixels) * d_multiply

# 2. Calculate Angle
box_center_x = box[0]
angle_from_camera = ((box_center_x - 320) * 0.28125) + int(camera_heading) # 0.28125 is a calculated deg/pixel ratio

# 3. Project New Coordinate (using the same Haversine formula as before)
lat1_rad = radians(camera_lat)
lon1_rad = radians(camera_lon)
# ... use distance_estimate and angle_from_camera to calculate lat2_rad, lon2_rad ...

final_lat = degrees(lat2_rad)
final_lon = degrees(lon2_rad)
print(f"Predicted Pole Location: {final_lat}, {final_lon}")

The script saves every predicted location to a final CSV file, effectively building a new map from scratch.

What I Learned

Geospatial Math is Fun (and Hard): Libraries like geographiclib are amazing, but you still need a solid grasp of trigonometry and concepts like bearings and geodesics to make this work.
Calibration is Key: The distance estimation is the weakest link. I spent a lot of time in long_test.py tuning the d_multiply variable—a "fudge factor" to calibrate the calculated distance to reality. This is a common part of turning a theoretical model into a practical tool.
The Power of a Pipeline: No single script here is magic. The power comes from chaining them together. Data cleaning -> path generation -> image acquisition -> AI inference -> geospatial calculation. Each step builds on the last.

This project was a deep dive into the practical application of computer vision. It shows that with the right tools and a bit of ingenuity, we can use the vast digital world to understand and map our physical world in entirely new ways.