Occasionally, you may encounter the need to perform geospatial functions within your application, such as mapping user locations or analyzing geographic data. There are numerous language-specific libraries available for these tasks, such as GDAL, Shapely, and Geopandas for Python.
Alternatively, geospatial functionality can be implemented through databases; for example, you can use the PostGIS extension with a relational database like PostgreSQL, or leverage the native support for spatial data types in a distributed database like Azure CosmosDB.
However, if your backend storage, such as Redis or Google Spanner, doesn't natively support spatial queries and you need to handle large-scale geospatial queries, this article is tailored for you.
You can always build another microservice to handle spatial data, but this option often involves the overhead of maintaining an additional application. Another approach is to use geo-indexing libraries like S2 and H3. S2, developed by Google, is based on the Hilbert curve, while H3, developed by Uber, is based on a geodesic discrete global grid system. S2 and H3 share many similarities: both divide a given region into cells and use 64-bit integers to index these cells.
However, the main difference lies in the shape of the cells; S2 uses square-shaped cells, whereas H3 uses hexagon-shaped cells. For some applications, H3 might offer better performance. However, overall, either library should suffice. In this article, we will use S2, but you can perform similar functions using H3.
Cell Levels: The hierarchy allows different levels of detail, from large regions to small precise areas. Each level represents a different resolution:
Level 0: The largest cells, covering a significant portion of the Earth's surface.
Higher Levels: Cells are progressively subdivided into smaller quadrants. For instance, Level 1 cells are each divided into four Level 2 cells, and so on.
Resolution and Area: Higher levels correspond to finer resolutions and smaller cell areas. This hierarchy allows for precise indexing and querying at varying levels of detail.
The table below shows various cell levels along with their corresponding areas.
level |
min area |
max area |
average area |
units |
Number of cells |
---|---|---|---|---|---|
00 |
85011012.19 |
85011012.19 |
85011012.19 |
km2 |
6 |
01 |
21252753.05 |
21252753.05 |
21252753.05 |
km2 |
24 |
02 |
4919708.23 |
6026521.16 |
5313188.26 |
km2 |
96 |
03 |
1055377.48 |
1646455.50 |
1328297.07 |
km2 |
384 |
04 |
231564.06 |
413918.15 |
332074.27 |
km2 |
1536 |
05 |
53798.67 |
104297.91 |
83018.57 |
km2 |
6K |
06 |
12948.81 |
26113.30 |
20754.64 |
km2 |
24K |
07 |
3175.44 |
6529.09 |
5188.66 |
km2 |
98K |
08 |
786.20 |
1632.45 |
1297.17 |
km2 |
393K |
09 |
195.59 |
408.12 |
324.29 |
km2 |
1573K |
10 |
48.78 |
102.03 |
81.07 |
km2 |
6M |
11 |
12.18 |
25.51 |
20.27 |
km2 |
25M |
12 |
3.04 |
6.38 |
5.07 |
km2 |
100M |
13 |
0.76 |
1.59 |
1.27 |
km2 |
402M |
14 |
0.19 |
0.40 |
0.32 |
km2 |
1610M |
15 |
47520.30 |
99638.93 |
79172.67 |
m2 |
6B |
16 |
11880.08 |
24909.73 |
19793.17 |
m2 |
25B |
17 |
2970.02 |
6227.43 |
4948.29 |
m2 |
103B |
18 |
742.50 |
1556.86 |
1237.07 |
m2 |
412B |
19 |
185.63 |
389.21 |
309.27 |
m2 |
1649B |
20 |
46.41 |
97.30 |
77.32 |
m2 |
7T |
21 |
11.60 |
24.33 |
19.33 |
m2 |
26T |
22 |
2.90 |
6.08 |
4.83 |
m2 |
105T |
23 |
0.73 |
1.52 |
1.21 |
m2 |
422T |
24 |
0.18 |
0.38 |
0.30 |
m2 |
1689T |
25 |
453.19 |
950.23 |
755.05 |
cm2 |
7e15 |
26 |
113.30 |
237.56 |
188.76 |
cm2 |
27e15 |
27 |
28.32 |
59.39 |
47.19 |
cm2 |
108e15 |
28 |
7.08 |
14.85 |
11.80 |
cm2 |
432e15 |
29 |
1.77 |
3.71 |
2.95 |
cm2 |
1729e15 |
30 |
0.44 |
0.93 |
0.74 |
cm2 |
7e18 |
From the table provided, it's evident that you can achieve mapping precision down to 0.44 cm^2 using S2. Within each square of an S2 cell, there exists a child cell that shares the same parent, indicating a hierarchical structure. The level of the cell can be a static value (same level applied to all the cells) or can be dynamic where S2 decides what resolution works best.
Let's start with an example. Consider we are writing an application that provides proximity service-like features for the Seattle area. We want to return a list of coffee shops within the given vicinity. To perform these operations, we will divide this task into 4 sub-tasks:
To load a Google map, we would use the gmplot library. This library requires a Google Maps API key to load. To generate the API key, follow the instructions here.
import gmplot
import const
# plot seattle with zoom level 13
gmap = gmplot.GoogleMapPlotter(47.6061, -122.3328, 13, apikey=const.API_KEY)
# Draw the map to an HTML file:
gmap.draw('map.html')
The above code generates a map.html file as shown below:
Now that we have the map, let's draw some S2 cells for maps :
from s2 import *
import gmplot
# plot seattle with zoom level 13
gmap = gmplot.GoogleMapPlotter(47.6061, -122.3328, 13, apikey=const.API_KEY)
areatobeplotted = [
(47.64395531736767,-122.43597221319135),
(47.51369277846956,-122.43597221319135),
(47.51369277846956,-122.24156866779164),
(47.64395531736767,-122.24156866779164),
(47.64395531736767,-122.43597221319135)
]
region_rect = S2LatLngRect(
S2LatLng.FromDegrees(47.51369277846956,-122.43597221319135),
S2LatLng.FromDegrees(47.64395531736767, -122.24156866779164))
coverer = S2RegionCoverer()
coverer.set_min_level(8)
coverer.set_max_level(15)
covering = coverer.GetCovering(region_rect)
geoms = 0
for cellid in covering:
new_cell = S2Cell(cellid)
vertices = []
for i in range(0, 4):
vertex = new_cell.GetVertex(i)
latlng = S2LatLng(vertex)
vertices.append((latlng.lat().degrees(),
latlng.lng().degrees()))
gmap.polygon(*zip(*vertices), face_color='pink', edge_color='cornflowerblue', edge_width=5)
geoms+=1
gmap.polygon(*zip(*areatobeplotted), face_color='red', edge_color='green', edge_width=5)
print(f"Total Geometries: {geoms}")
gmap.draw('/tmp/map.html')
Output:
Total Geometries: 273
In the code above, we first center the Google Map plotter around the Seattle area. In S2RegionCoverer
, we initialize the region coverer to have dynamic levels between a minimum level of 8 and a maximum level of 15. This allows S2 to dynamically fit all cells into specific cell sizes for the best fit. The GetCovering
method returns the covering for a rectangle around the Seattle area.
Then, we iterate over each cell, calculating the vertices for the cells and plotting them on the map. We keep the count of cells generated to around 273. Finally, we plot the input rectangle in red. This code will plot the S2 cells on the Seattle map at /tmp/map.html
, as shown below:
Let’s generate a database of coffee shops along with their S2 cell identifiers. You can store these cells in any database of your choice. For this tutorial, we will be using a SQLite data database. In the code sample below, we connect to the SQLite database to create a table CoffeeShops
with 3 fields Id
, name
, and cell_id
.
Similar to the previous example, we use S2RegionCoverer
to calculate the cells but this time, we use a fixed level for plotting points. Finally, the calculated ID is converted to a string and stored in the database.
import sqlite3
from s2 import S2CellId,S2LatLng,S2RegionCoverer
# Connect to SQLite database
conn = sqlite3.connect('/tmp/sqlite_cells.db')
cursor = conn.cursor()
# Create a table to store cell IDs
cursor.execute('''CREATE TABLE IF NOT EXISTS CoffeeShops (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT,
cell_id TEXT
)''')
coverer = S2RegionCoverer()
# Function to generate S2 cell ID for a given latitude and longitude
def generate_cell_id(latitude, longitude, level=16):
cell=S2CellId(S2LatLng.FromDegrees(latitude, longitude))
return str(cell.parent(level))
# Function to insert cell IDs into the database
def insert_cell_ids(name,lat,lng):
cell_id = generate_cell_id(lat, lng)
cursor.execute("INSERT INTO CoffeeShops (name, cell_id) VALUES (?, ?)", (name, cell_id))
conn.commit()
# Insert cell IDs into the database
insert_cell_ids("Overcast Coffee", 47.616656277302155, -122.31156460382837)
insert_cell_ids("Seattle Sunshine", 47.67366852914391, -122.29051997415843)
insert_cell_ids("Sip House", 47.6682364706238, -122.31328618043693)
insert_cell_ids("Victoria Coffee",47.624408595334536, -122.3117362652041)
# Close connection
conn.close()
At this point, we have a database that stores coffee shops along with their cell IDs, determined by the selected resolution for the cell level.
Finally, let’s query for coffee shops in the University District area.
import sqlite3
from s2 import S2RegionCoverer,S2LatLngRect, S2LatLng
# Connect to SQLite database
conn = sqlite3.connect('/tmp/sqlite_cells.db')
cursor = conn.cursor()
# Function to query database for cells intersecting with the given polygon
def query_intersecting_cells(start_x,start_y,end_x,end_y):
# Create S2RegionCoverer
region_rect = S2LatLngRect(
S2LatLng.FromDegrees(start_x,start_y),
S2LatLng.FromDegrees(end_x,end_y))
coverer = S2RegionCoverer()
coverer.set_min_level(8)
coverer.set_max_level(15)
covering = coverer.GetCovering(region_rect)
# Query for intersecting cells
intersecting_cells = set()
for cell_id in covering:
cursor.execute("SELECT name FROM CoffeeShops WHERE cell_id >= ? and cell_id<=?", (str(cell_id.range_min()),str(cell_id.range_max()),))
intersecting_cells.update(cursor.fetchall())
return intersecting_cells
# Query for intersecting cells
intersecting_cells = query_intersecting_cells(47.6527847,-122.3286438,47.6782181, -122.2797203)
# Print intersecting cells
print("Intersecting cells:")
for cell_id in intersecting_cells:
print(cell_id[0])
# Close connection
conn.close()
Output:
Intersecting cells:
Sip House
Seattle Sunshine
Below is a visual representation of the cells. To maintain brevity, the visualization code for below is not added.
Since all child and parent cells share a prefix, we can query for cell ranges between min and max to get all the cells between those two values. In our example, we use the same principle to query coffee shop
In this article, we have demonstrated how to use geo-indexing to store and query geospatial data in databases that don't support geospatial querying. This can be further extended to multiple use cases such as calculating routing between 2 points or getting nearest neighbors.
Typically, for geo-indexed database querying, you would have to perform some additional post-processing on the data. Careful consideration for querying and post-processing logic is needed to ensure we don't overwhelm the node.