1,695 reads

How to Perform GIS Computations on Non-GIS Databases

by Joel LopesJune 15th, 2024

Too Long; Didn't Read

If your backend storage doesn't natively support spatial queries, this article is tailored for you. You can always build another microservice to handle spatial data, but this option often involves the overhead of maintaining an additional application. Another approach is to use geo-indexing libraries like S2 and H3. S2 divides the sphere into cells, each with a unique 64-bit identifier. Higher levels correspond to finer resolutions and smaller cell areas.

featured image - How to Perform GIS Computations on Non-GIS Databases

‘a detailed map of seattle’ Image created by HackerNoon AI Image Generator

Introduction:

Occasionally, you may encounter the need to perform geospatial functions within your application, such as mapping user locations or analyzing geographic data. There are numerous language-specific libraries available for these tasks, such as GDAL, Shapely, and Geopandas for Python.

Alternatively, geospatial functionality can be implemented through databases; for example, you can use the PostGIS extension with a relational database like PostgreSQL, or leverage the native support for spatial data types in a distributed database like Azure CosmosDB.

However, if your backend storage, such as Redis or Google Spanner, doesn't natively support spatial queries and you need to handle large-scale geospatial queries, this article is tailored for you.

What Are My Options?

You can always build another microservice to handle spatial data, but this option often involves the overhead of maintaining an additional application. Another approach is to use geo-indexing libraries like S2 and H3. S2, developed by Google, is based on the Hilbert curve, while H3, developed by Uber, is based on a geodesic discrete global grid system. S2 and H3 share many similarities: both divide a given region into cells and use 64-bit integers to index these cells.

However, the main difference lies in the shape of the cells; S2 uses square-shaped cells, whereas H3 uses hexagon-shaped cells. For some applications, H3 might offer better performance. However, overall, either library should suffice. In this article, we will use S2, but you can perform similar functions using H3.

Basic Concepts of Google S2 Library

Cells: S2 divides the sphere into cells, each with a unique 64-bit identifier.

Cell Levels: The hierarchy allows different levels of detail, from large regions to small precise areas. Each level represents a different resolution:
- Level 0: The largest cells, covering a significant portion of the Earth's surface.
- Higher Levels: Cells are progressively subdivided into smaller quadrants. For instance, Level 1 cells are each divided into four Level 2 cells, and so on.
- Resolution and Area: Higher levels correspond to finer resolutions and smaller cell areas. This hierarchy allows for precise indexing and querying at varying levels of detail.

The table below shows various cell levels along with their corresponding areas.

level	min area	max area	average area	units	Number of cells
00	85011012.19	85011012.19	85011012.19	km2	6
01	21252753.05	21252753.05	21252753.05	km2	24
02	4919708.23	6026521.16	5313188.26	km2	96
03	1055377.48	1646455.50	1328297.07	km2	384
04	231564.06	413918.15	332074.27	km2	1536
05	53798.67	104297.91	83018.57	km2	6K
06	12948.81	26113.30	20754.64	km2	24K
07	3175.44	6529.09	5188.66	km2	98K
08	786.20	1632.45	1297.17	km2	393K
09	195.59	408.12	324.29	km2	1573K
10	48.78	102.03	81.07	km2	6M
11	12.18	25.51	20.27	km2	25M
12	3.04	6.38	5.07	km2	100M
13	0.76	1.59	1.27	km2	402M
14	0.19	0.40	0.32	km2	1610M
15	47520.30	99638.93	79172.67	m2	6B
16	11880.08	24909.73	19793.17	m2	25B
17	2970.02	6227.43	4948.29	m2	103B
18	742.50	1556.86	1237.07	m2	412B
19	185.63	389.21	309.27	m2	1649B
20	46.41	97.30	77.32	m2	7T
21	11.60	24.33	19.33	m2	26T
22	2.90	6.08	4.83	m2	105T
23	0.73	1.52	1.21	m2	422T
24	0.18	0.38	0.30	m2	1689T
25	453.19	950.23	755.05	cm2	7e15
26	113.30	237.56	188.76	cm2	27e15
27	28.32	59.39	47.19	cm2	108e15
28	7.08	14.85	11.80	cm2	432e15
29	1.77	3.71	2.95	cm2	1729e15
30	0.44	0.93	0.74	cm2	7e18

From the table provided, it's evident that you can achieve mapping precision down to 0.44 cm^2 using S2. Within each square of an S2 cell, there exists a child cell that shares the same parent, indicating a hierarchical structure. The level of the cell can be a static value (same level applied to all the cells) or can be dynamic where S2 decides what resolution works best.

Calculating Nearest Neighbors

Let's start with an example. Consider we are writing an application that provides proximity service-like features for the Seattle area. We want to return a list of coffee shops within the given vicinity. To perform these operations, we will divide this task into 4 sub-tasks:

Loading Seattle map
Visualize S2 cells on the Seattle map
Store a few coffee shop locations in the database
Query for nearest coffee shops

Loading Seattle Map

To load a Google map, we would use the gmplot library. This library requires a Google Maps API key to load. To generate the API key, follow the instructions here.

import gmplot
import const
# plot seattle with zoom level 13
gmap = gmplot.GoogleMapPlotter(47.6061, -122.3328, 13, apikey=const.API_KEY)

# Draw the map to an HTML file:
gmap.draw('map.html')

The above code generates a map.html file as shown below:

Visualize S2 Cells on the Seattle Map

Now that we have the map, let's draw some S2 cells for maps :

from s2 import *
import gmplot

# plot seattle with zoom level 13
gmap = gmplot.GoogleMapPlotter(47.6061, -122.3328, 13, apikey=const.API_KEY)

areatobeplotted = [
   (47.64395531736767,-122.43597221319135),
   (47.51369277846956,-122.43597221319135),
   (47.51369277846956,-122.24156866779164),
   (47.64395531736767,-122.24156866779164),
   (47.64395531736767,-122.43597221319135)
]

region_rect = S2LatLngRect(
   S2LatLng.FromDegrees(47.51369277846956,-122.43597221319135),
   S2LatLng.FromDegrees(47.64395531736767, -122.24156866779164))

coverer = S2RegionCoverer()
coverer.set_min_level(8)
coverer.set_max_level(15)

covering = coverer.GetCovering(region_rect)
geoms = 0
for cellid in covering:
   new_cell = S2Cell(cellid)
   vertices = []
   for i in range(0, 4):
       vertex = new_cell.GetVertex(i)
       latlng = S2LatLng(vertex)
       vertices.append((latlng.lat().degrees(),
                        latlng.lng().degrees()))
   gmap.polygon(*zip(*vertices), face_color='pink', edge_color='cornflowerblue', edge_width=5)
   geoms+=1
gmap.polygon(*zip(*areatobeplotted), face_color='red', edge_color='green', edge_width=5)
print(f"Total Geometries: {geoms}")
gmap.draw('/tmp/map.html')

Output:
Total Geometries: 273

In the code above, we first center the Google Map plotter around the Seattle area. In S2RegionCoverer, we initialize the region coverer to have dynamic levels between a minimum level of 8 and a maximum level of 15. This allows S2 to dynamically fit all cells into specific cell sizes for the best fit. The GetCovering method returns the covering for a rectangle around the Seattle area.

Then, we iterate over each cell, calculating the vertices for the cells and plotting them on the map. We keep the count of cells generated to around 273. Finally, we plot the input rectangle in red. This code will plot the S2 cells on the Seattle map at /tmp/map.html, as shown below:

Store a Few Coffee Shop Locations In the Database

Let’s generate a database of coffee shops along with their S2 cell identifiers. You can store these cells in any database of your choice. For this tutorial, we will be using a SQLite data database. In the code sample below, we connect to the SQLite database to create a table CoffeeShops with 3 fields Id, name, and cell_id.

Similar to the previous example, we use S2RegionCoverer to calculate the cells but this time, we use a fixed level for plotting points. Finally, the calculated ID is converted to a string and stored in the database.

import sqlite3
from s2 import S2CellId,S2LatLng,S2RegionCoverer

# Connect to SQLite database
conn = sqlite3.connect('/tmp/sqlite_cells.db')
cursor = conn.cursor()

# Create a table to store cell IDs
cursor.execute('''CREATE TABLE IF NOT EXISTS CoffeeShops (
                   id INTEGER PRIMARY KEY AUTOINCREMENT,
                   name TEXT,
                   cell_id TEXT
               )''')
coverer = S2RegionCoverer()
# Function to generate S2 cell ID for a given latitude and longitude
def generate_cell_id(latitude, longitude, level=16):
   cell=S2CellId(S2LatLng.FromDegrees(latitude, longitude))
   return str(cell.parent(level))

# Function to insert cell IDs into the database
def insert_cell_ids(name,lat,lng):
   cell_id = generate_cell_id(lat, lng)
   cursor.execute("INSERT INTO CoffeeShops (name, cell_id) VALUES (?, ?)", (name, cell_id))
   conn.commit()

# Insert cell IDs into the database
insert_cell_ids("Overcast Coffee", 47.616656277302155, -122.31156460382837)
insert_cell_ids("Seattle Sunshine", 47.67366852914391, -122.29051997415843)
insert_cell_ids("Sip House", 47.6682364706238, -122.31328618043693)
insert_cell_ids("Victoria Coffee",47.624408595334536, -122.3117362652041)

# Close connection
conn.close()

At this point, we have a database that stores coffee shops along with their cell IDs, determined by the selected resolution for the cell level.

Query for the Nearest Coffee Shops

Finally, let’s query for coffee shops in the University District area.

import sqlite3
from s2 import S2RegionCoverer,S2LatLngRect, S2LatLng
# Connect to SQLite database
conn = sqlite3.connect('/tmp/sqlite_cells.db')
cursor = conn.cursor()

# Function to query database for cells intersecting with the given polygon
def query_intersecting_cells(start_x,start_y,end_x,end_y):
   # Create S2RegionCoverer
   region_rect = S2LatLngRect(
       S2LatLng.FromDegrees(start_x,start_y),
       S2LatLng.FromDegrees(end_x,end_y))
   coverer = S2RegionCoverer()
   coverer.set_min_level(8)
   coverer.set_max_level(15)

   covering = coverer.GetCovering(region_rect)

   # Query for intersecting cells
   intersecting_cells = set()
   for cell_id in covering:
       cursor.execute("SELECT name FROM CoffeeShops WHERE cell_id >= ? and cell_id<=?", (str(cell_id.range_min()),str(cell_id.range_max()),))
       intersecting_cells.update(cursor.fetchall())

   return intersecting_cells

# Query for intersecting cells
intersecting_cells = query_intersecting_cells(47.6527847,-122.3286438,47.6782181, -122.2797203)

# Print intersecting cells
print("Intersecting cells:")
for cell_id in intersecting_cells:
   print(cell_id[0])
# Close connection
conn.close()

Output:
Intersecting cells:
Sip House
Seattle Sunshine

Below is a visual representation of the cells. To maintain brevity, the visualization code for below is not added.

Since all child and parent cells share a prefix, we can query for cell ranges between min and max to get all the cells between those two values. In our example, we use the same principle to query coffee shop

Conclusion:

In this article, we have demonstrated how to use geo-indexing to store and query geospatial data in databases that don't support geospatial querying. This can be further extended to multiple use cases such as calculating routing between 2 points or getting nearest neighbors.

Typically, for geo-indexed database querying, you would have to perform some additional post-processing on the data. Careful consideration for querying and post-processing logic is needed to ensure we don't overwhelm the node.