Extending Postgres to Support Geospatial Data Queries I’ve lost too much time fiddling around with configurations and services just to spin up a compute instance in AWS. Sometimes, I just need a production-ready environment to play around with to test out applications and ideas. With Heroku, I can get that with just a few simple commands at the CLI. Recently, I learned that Heroku also includes support for PostGIS. I’ve personally never used PostGIS before. I know of several proprietary competitors to the product, so I figured this would be a great time to try out the open-source option. In this article, I’ll show you how to get a PostGIS-enabled Postgres instance running on Heroku. Then, I’ll run some sample queries on the database, just to give you a feel for how it works. And the best part? You can follow along and do your own exploring as we go! What Is PostGIS? Even if you’ve used Postgres for a while, you might not be familiar with PostGIS. The GIS stands for Geographic Information System. There are many solutions in the space. But the thing that makes PostGIS nice is that it’s based on the well-loved PostgreSQL database. In addition to all of the performance you’d expect from Postgres, we get a full-featured tool for storing geospatial data. Not only does PostGIS provide a good storage solution for this type of data, but it can be seamlessly integrated with several applications that can consume this data (such as ArcGIS and Tableau). TL; DR—If you need to process, store, or query location data, PostGIS is a great option for doing that. Fortunately, it’s as simple as adding an addon to a Heroku app to get a new Postgres instance up and running. So, let’s do that now. How Can We Use PostGIS? To get started, you’ll need an app of any size. Then, you add an instance of Heroku Postgres to your app. Create a Heroku App For my demo, I’m going to create an empty app. Attach a Heroku Postgres add-on Once the app is created, I can create the Heroku Postgres add-on. Because my sample data set is too big for a Mini plan instance, I need to use the Basic plan instead. I can do this from the command line: $ heroku login $ heroku addons:create heroku-postgresql:basic -a postgis-demo Creating heroku-postgresql:basic on ⬢ postgis-demo... ~$0.013/hour (max $9/month) Database has been created and is available ! This database is empty. If upgrading, you can transfer ! data from another database with pg:copy Created postgresql-fitted-78461 as DATABASE_URL Once I’ve created my Postgres database, I only have a few more steps to set up PostGIS. Create the PostGIS Extension Heroku Postgres has many possible extensions we could install. To list them, we can ask our instance: $ heroku pg:psql -a postgis-demo --> Connecting to postgresql-fitted-78461 … postgis-demo::DATABASE=> \x on; Expanded display is on. postgis-demo::DATABASE=> show extwlist.extensions; … address_standardizer,address_standardizer_data_us,amcheck,autoinc,bloom,btree_gin,btree_gist,citext,cube,dict_int,earthdistance,fuzzystrmatch,hstore,insert_username,intarray,isn,lo,ltree,moddatetime,pg_partman,pg_stat_statements,pg_trgm,pgcrypto,pgrowlocks,postgis,postgis_raster,postgis_topology,refint,seg,sslinfo,tablefunc,tcn,tsm_system_rows,tsm_system_time,unaccent,uuid-ossp We see postgis in the list of available extensions. From there, we can create the extension. postgis-demo::DATABASE=> CREATE EXTENSION postgis; CREATE EXTENSION We can confirm the extension is installed and check the version: postgis-demo::DATABASE=> SELECT postgis_version(); -[ RECORD 1 ]---+-------------------------------------- postgis_version | 3.4 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 Alright! It looks like we’re up and running with PostGIS 3.4. Load Initial Dataset Now that I’ve got the PostGIS extension enabled, I need to load in a dataset to play around with. I’m using the dataset provided by the Introduction to PostGIS book. The downloaded data bundle is a 21.5 MB zip file. In the data subfolder of the extracted archive, there’s a 9.5 MB file called nyc_data.backup. This is a file with all of the census data from the 2000 census of New York City, along with all of the streets, neighborhoods, and subway stations in the city. We can restore the data backup directly to our Heroku Postgres instance by using the heroku pg:backups:restore command. This is incredibly convenient. However, keep in mind the following caveats: The backup file that you can restore from cannot be uploaded from your local machine. It must be available online. Fortunately, I found a GitHub repo that makes nyc_data.backup available. Performing database restore starts by completely resetting your Heroku Postgres instance, including your installation of the postgis extension. So, even though we showed above how to install the extension manually, we’ll need to add a flag when we restore our database to pre-install the extension before loading the data. Here’s the command we would use to restore the database backup: $ heroku pg:backups:restore \ https://github.com/Giorgi/PostgresSamples/raw/main/nyc_data.backup \ -e postgis \ -a postgis-demo Our backup file is specified through a publicly accessible URL. You can always download the dataset from the PostGIS tutorial, extract the nyc_data.backup file, and post it online to a location of your own choosing. The -e postgis flag specifies that we want to install the postgis extension prior to loading the backup’s schema and data. That was it! Not bad for a few simple commands. We have our database and data. Why Heroku? If you already know how to set up Postgres on a local machine, you might be wondering why I went with Heroku. For me, the biggest reason is simplicity. Besides choosing a large enough Heroku Postgres plan for the analysis I plan to do and installing the PostGIS extension, there’s nothing else I need to do to get up and running. Also, collaborating on any analysis I do is easy. I can grant other people access to my database as collaborators, or I can quickly build an application on top of the database and share access through a normal web interface, rather than the Postgres client. Finally, when I’m done working on a project and I don’t need it any longer, I can just delete the app on Heroku and it’s all gone. No data files on my computer to worry about. No extra software installed locally. I’m able to enjoy a quick excursion into a new technology and then move on when I’m done. Working With PostGIS Now, let’s take a look at how PostGIS works. Work Just As You Would With Postgres The first thing to remember is that PostGIS is an extension within Postgres. That means that you can also perform any standard Postgres query. Let’s say I wanted to find out how many streets in New York start with B. A simple SQL query will tell me: postgis-demo::DATABASE=> SELECT count(*) postgis-demo::DATABASE-> FROM nyc_streets postgis-demo::DATABASE-> WHERE name LIKE 'B%'; count 1282 (1 row) How about the number of neighborhoods in each borough? Again, a simple SQL query: postgis-demo::DATABASE=> SELECT boroname, count(*) postgis-demo::DATABASE-> FROM nyc_neighborhoods postgis-demo::DATABASE-> GROUP BY boroname; boroname | count ---------------+------- Queens | 30 Brooklyn | 23 Staten Island | 24 The Bronx | 24 Manhattan | 28 (5 rows) So far, we’ve just done standard PostgreSQL. Now, let’s take a look at how to use PostGIS features. Examples of Working With Geospatial Geometries Because our dataset includes all New York streets, we can ask how many kilometers of streets there are in the city with this query: postgis-demo::DATABASE=> SELECT Sum(ST_Length(geom)) / 1000 as street_length FROM nyc_streets; street_length -------------------- 10418.904717199996 (1 row) We can also calculate areas, such as the acreage of the entirety of Manhattan: postgis-demo::DATABASE=> SELECT Sum(ST_Area(geom)) / 4047 as acreage FROM nyc_neighborhoods WHERE boroname = 'Manhattan'; acreage ------------------- 13965.32012239119 (1 row) Note that these calculations come from the geospatial data, not from columns related to aggregated data of this sort. Not only that, but these queries execute extremely quickly. One final query that I’m really amazed by involves the use of spatial joins. Much like standard database joins, spatial joins can unite multiple tables, but on the basis of spatial relationships. For example, we can query for which neighborhood a specific subway station is in, using the spatial data. To do this, we can use ST_Contains from PostGIS to determine if the geometry of the neighborhood completely contains the geometry of the subway station. Based on the name of the subway (in nyc_subway_stations), we query for the neighborhood (in nyc_neighborhoods) for which ST_Contains is true. Our query looks like this: postgis-demo::DATABASE=> SELECT subways.name AS subway_name, neighborhoods.name AS neighborhood_name, neighborhoods.boroname AS borough FROM nyc_neighborhoods AS neighborhoods JOIN nyc_subway_stations AS subways ON ST_Contains(neighborhoods.geom, subways.geom) WHERE subways.name = 'Broad St'; subway_name | neighborhood_name | borough -------------+--------------------+----------- Broad St | Financial District | Manhattan (1 row) PostGIS provides even more advanced location querying functionality with geometries, but that’s outside the scope of our simple demo here. Conclusion Having never used PostGIS before, I’m really impressed by what it can do. There’s a lot more I could do with this database too, since I’ve only made it about halfway through the official Introduction to PostGIS book. Not only that, I can build and deploy applications on top of PostGIS by using any number of languages supported by Heroku. In particular, I’m thinking I might want to find a use case for building a Rails app on top of PostGIS. I already found some documentation on how I can get started. But for now, I don’t need this instance anymore, so I’m going to clean it up and delete my app. From the CLI, this is what I need to do: $ heroku apps:destroy postgis-demo ▸ WARNING: This will delete ⬢ postgis-demo including all add-ons. ▸ To proceed, type postgis-demo or re-run this command with --confirm postgis-demo > postgis-demo Destroying ⬢ postgis-demo (including all add-ons)... done Wait, that’s all? Yeah, that’s all. With a single command and confirmation, everything is torn down, and I don’t need to worry about it anymore. $ heroku apps You have no apps. $ heroku addons No add-ons. Now that I’ve deleted my app, you have an incredible opportunity: The unique app name postgis-demo is available for the first reader who wants to grab it on Heroku! Are you ready to build your next great PostGIS app? Today is the day! Extending Postgres to Support Geospatial Data Queries Extending Postgres to Support Geospatial Data Queries I’ve lost too much time fiddling around with configurations and services just to spin up a compute instance in AWS. Sometimes, I just need a production-ready environment to play around with to test out applications and ideas. With Heroku, I can get that with just a few simple commands at the CLI. Recently, I learned that Heroku also includes support for PostGIS . I’ve personally never used PostGIS before. I know of several proprietary competitors to the product, so I figured this would be a great time to try out the open-source option. support for PostGIS In this article, I’ll show you how to get a PostGIS-enabled Postgres instance running on Heroku. Then, I’ll run some sample queries on the database, just to give you a feel for how it works. And the best part? You can follow along and do your own exploring as we go! What Is PostGIS? Even if you’ve used Postgres for a while, you might not be familiar with PostGIS. The GIS stands for Geographic Information System . There are many solutions in the space. But the thing that makes PostGIS nice is that it’s based on the well-loved PostgreSQL database. In addition to all of the performance you’d expect from Postgres, we get a full-featured tool for storing geospatial data. Geographic Information System Not only does PostGIS provide a good storage solution for this type of data, but it can be seamlessly integrated with several applications that can consume this data (such as ArcGIS and Tableau). TL; DR—If you need to process, store, or query location data, PostGIS is a great option for doing that. Fortunately, it’s as simple as adding an addon to a Heroku app to get a new Postgres instance up and running. So, let’s do that now. How Can We Use PostGIS? To get started, you’ll need an app of any size. Then, you add an instance of Heroku Postgres to your app. Heroku Postgres Create a Heroku App For my demo, I’m going to create an empty app. Attach a Heroku Postgres add-on Once the app is created, I can create the Heroku Postgres add-on. Because my sample data set is too big for a Mini plan instance, I need to use the Basic plan instead. I can do this from the command line: Mini plan $ heroku login $ heroku addons:create heroku-postgresql:basic -a postgis-demo Creating heroku-postgresql:basic on ⬢ postgis-demo... ~$0.013/hour (max $9/month) Database has been created and is available ! This database is empty. If upgrading, you can transfer ! data from another database with pg:copy Created postgresql-fitted-78461 as DATABASE_URL $ heroku login $ heroku addons:create heroku-postgresql:basic -a postgis-demo Creating heroku-postgresql:basic on ⬢ postgis-demo... ~$0.013/hour (max $9/month) Database has been created and is available ! This database is empty. If upgrading, you can transfer ! data from another database with pg:copy Created postgresql-fitted-78461 as DATABASE_URL Once I’ve created my Postgres database, I only have a few more steps to set up PostGIS . to set up PostGIS Create the PostGIS Extension Heroku Postgres has many possible extensions we could install. To list them, we can ask our instance: $ heroku pg:psql -a postgis-demo --> Connecting to postgresql-fitted-78461 … postgis-demo::DATABASE=> \x on; Expanded display is on. postgis-demo::DATABASE=> show extwlist.extensions; … address_standardizer,address_standardizer_data_us,amcheck,autoinc,bloom,btree_gin,btree_gist,citext,cube,dict_int,earthdistance,fuzzystrmatch,hstore,insert_username,intarray,isn,lo,ltree,moddatetime,pg_partman,pg_stat_statements,pg_trgm,pgcrypto,pgrowlocks,postgis,postgis_raster,postgis_topology,refint,seg,sslinfo,tablefunc,tcn,tsm_system_rows,tsm_system_time,unaccent,uuid-ossp $ heroku pg:psql -a postgis-demo --> Connecting to postgresql-fitted-78461 … postgis-demo::DATABASE=> \x on; Expanded display is on. postgis-demo::DATABASE=> show extwlist.extensions; … address_standardizer,address_standardizer_data_us,amcheck,autoinc,bloom,btree_gin,btree_gist,citext,cube,dict_int,earthdistance,fuzzystrmatch,hstore,insert_username,intarray,isn,lo,ltree,moddatetime,pg_partman,pg_stat_statements,pg_trgm,pgcrypto,pgrowlocks,postgis,postgis_raster,postgis_topology,refint,seg,sslinfo,tablefunc,tcn,tsm_system_rows,tsm_system_time,unaccent,uuid-ossp We see postgis in the list of available extensions. From there, we can create the extension. postgis postgis-demo::DATABASE=> CREATE EXTENSION postgis; CREATE EXTENSION postgis-demo::DATABASE=> CREATE EXTENSION postgis; CREATE EXTENSION We can confirm the extension is installed and check the version: postgis-demo::DATABASE=> SELECT postgis_version(); -[ RECORD 1 ]---+-------------------------------------- postgis_version | 3.4 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 postgis-demo::DATABASE=> SELECT postgis_version(); -[ RECORD 1 ]---+-------------------------------------- postgis_version | 3.4 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 Alright! It looks like we’re up and running with PostGIS 3.4. Load Initial Dataset Now that I’ve got the PostGIS extension enabled, I need to load in a dataset to play around with. I’m using the dataset provided by the Introduction to PostGIS book . The downloaded data bundle is a 21.5 MB zip file. In the data subfolder of the extracted archive, there’s a 9.5 MB file called nyc_data.backup . Introduction to PostGIS book data nyc_data.backup This is a file with all of the census data from the 2000 census of New York City, along with all of the streets, neighborhoods, and subway stations in the city. We can restore the data backup directly to our Heroku Postgres instance by using the heroku pg:backups:restore command . This is incredibly convenient. However, keep in mind the following caveats: the heroku pg:backups:restore command The backup file that you can restore from cannot be uploaded from your local machine. It must be available online. Fortunately, I found a GitHub repo that makes nyc_data.backup available. The backup file that you can restore from cannot be uploaded from your local machine. It must be available online. Fortunately, I found a GitHub repo that makes nyc_data.backup available. GitHub repo nyc_data.backup Performing database restore starts by completely resetting your Heroku Postgres instance, including your installation of the postgis extension. So, even though we showed above how to install the extension manually, we’ll need to add a flag when we restore our database to pre-install the extension before loading the data. Performing database restore starts by completely resetting your Heroku Postgres instance, including your installation of the postgis extension. So, even though we showed above how to install the extension manually, we’ll need to add a flag when we restore our database to pre-install the extension before loading the data. postgis Here’s the command we would use to restore the database backup: $ heroku pg:backups:restore \ https://github.com/Giorgi/PostgresSamples/raw/main/nyc_data.backup \ -e postgis \ -a postgis-demo $ heroku pg:backups:restore \ https://github.com/Giorgi/PostgresSamples/raw/main/nyc_data.backup \ -e postgis \ -a postgis-demo Our backup file is specified through a publicly accessible URL. You can always download the dataset from the PostGIS tutorial, extract the nyc_data.backup file, and post it online to a location of your own choosing. nyc_data.backup The -e postgis flag specifies that we want to install the postgis extension prior to loading the backup’s schema and data. -e postgis postgis That was it! Not bad for a few simple commands. We have our database and data. Why Heroku? If you already know how to set up Postgres on a local machine, you might be wondering why I went with Heroku. For me, the biggest reason is simplicity. Besides choosing a large enough Heroku Postgres plan for the analysis I plan to do and installing the PostGIS extension, there’s nothing else I need to do to get up and running. Also, collaborating on any analysis I do is easy. I can grant other people access to my database as collaborators, or I can quickly build an application on top of the database and share access through a normal web interface, rather than the Postgres client. collaborating Finally, when I’m done working on a project and I don’t need it any longer, I can just delete the app on Heroku and it’s all gone. No data files on my computer to worry about. No extra software installed locally. I’m able to enjoy a quick excursion into a new technology and then move on when I’m done. Working With PostGIS Now, let’s take a look at how PostGIS works. Work Just As You Would With Postgres The first thing to remember is that PostGIS is an extension within Postgres. That means that you can also perform any standard Postgres query. extension Let’s say I wanted to find out how many streets in New York start with B. A simple SQL query will tell me: postgis-demo::DATABASE=> SELECT count(*) postgis-demo::DATABASE-> FROM nyc_streets postgis-demo::DATABASE-> WHERE name LIKE 'B%'; count 1282 (1 row) postgis-demo::DATABASE=> SELECT count(*) postgis-demo::DATABASE-> FROM nyc_streets postgis-demo::DATABASE-> WHERE name LIKE 'B%'; count 1282 (1 row) How about the number of neighborhoods in each borough? Again, a simple SQL query: postgis-demo::DATABASE=> SELECT boroname, count(*) postgis-demo::DATABASE-> FROM nyc_neighborhoods postgis-demo::DATABASE-> GROUP BY boroname; boroname | count ---------------+------- Queens | 30 Brooklyn | 23 Staten Island | 24 The Bronx | 24 Manhattan | 28 (5 rows) postgis-demo::DATABASE=> SELECT boroname, count(*) postgis-demo::DATABASE-> FROM nyc_neighborhoods postgis-demo::DATABASE-> GROUP BY boroname; boroname | count ---------------+------- Queens | 30 Brooklyn | 23 Staten Island | 24 The Bronx | 24 Manhattan | 28 (5 rows) So far, we’ve just done standard PostgreSQL. Now, let’s take a look at how to use PostGIS features. Examples of Working With Geospatial Geometries Because our dataset includes all New York streets, we can ask how many kilometers of streets there are in the city with this query: postgis-demo::DATABASE=> SELECT Sum(ST_Length(geom)) / 1000 as street_length FROM nyc_streets; street_length -------------------- 10418.904717199996 (1 row) postgis-demo::DATABASE=> SELECT Sum(ST_Length(geom)) / 1000 as street_length FROM nyc_streets; street_length -------------------- 10418.904717199996 (1 row) We can also calculate areas, such as the acreage of the entirety of Manhattan: postgis-demo::DATABASE=> SELECT Sum(ST_Area(geom)) / 4047 as acreage FROM nyc_neighborhoods WHERE boroname = 'Manhattan'; acreage ------------------- 13965.32012239119 (1 row) postgis-demo::DATABASE=> SELECT Sum(ST_Area(geom)) / 4047 as acreage FROM nyc_neighborhoods WHERE boroname = 'Manhattan'; acreage ------------------- 13965.32012239119 (1 row) Note that these calculations come from the geospatial data, not from columns related to aggregated data of this sort. Not only that, but these queries execute extremely quickly. One final query that I’m really amazed by involves the use of spatial joins . Much like standard database joins, spatial joins can unite multiple tables, but on the basis of spatial relationships. For example, we can query for which neighborhood a specific subway station is in, using the spatial data. To do this, we can use ST_Contains from PostGIS to determine if the geometry of the neighborhood completely contains the geometry of the subway station. spatial joins ST_Contains Based on the name of the subway (in nyc_subway_stations ), we query for the neighborhood (in nyc_neighborhoods ) for which ST_Contains is true. Our query looks like this: nyc_subway_stations nyc_neighborhoods ST_Contains postgis-demo::DATABASE=> SELECT subways.name AS subway_name, neighborhoods.name AS neighborhood_name, neighborhoods.boroname AS borough FROM nyc_neighborhoods AS neighborhoods JOIN nyc_subway_stations AS subways ON ST_Contains(neighborhoods.geom, subways.geom) WHERE subways.name = 'Broad St'; subway_name | neighborhood_name | borough -------------+--------------------+----------- Broad St | Financial District | Manhattan (1 row) postgis-demo::DATABASE=> SELECT subways.name AS subway_name, neighborhoods.name AS neighborhood_name, neighborhoods.boroname AS borough FROM nyc_neighborhoods AS neighborhoods JOIN nyc_subway_stations AS subways ON ST_Contains(neighborhoods.geom, subways.geom) WHERE subways.name = 'Broad St'; subway_name | neighborhood_name | borough -------------+--------------------+----------- Broad St | Financial District | Manhattan (1 row) PostGIS provides even more advanced location querying functionality with geometries , but that’s outside the scope of our simple demo here. geometries Conclusion Having never used PostGIS before, I’m really impressed by what it can do. There’s a lot more I could do with this database too, since I’ve only made it about halfway through the official Introduction to PostGIS book. Not only that, I can build and deploy applications on top of PostGIS by using any number of languages supported by Heroku. Introduction to PostGIS languages supported In particular, I’m thinking I might want to find a use case for building a Rails app on top of PostGIS. I already found some documentation on how I can get started. some documentation But for now, I don’t need this instance anymore, so I’m going to clean it up and delete my app. From the CLI, this is what I need to do: $ heroku apps:destroy postgis-demo ▸ WARNING: This will delete ⬢ postgis-demo including all add-ons. ▸ To proceed, type postgis-demo or re-run this command with --confirm postgis-demo > postgis-demo Destroying ⬢ postgis-demo (including all add-ons)... done $ heroku apps:destroy postgis-demo ▸ WARNING: This will delete ⬢ postgis-demo including all add-ons. ▸ To proceed, type postgis-demo or re-run this command with --confirm postgis-demo > postgis-demo Destroying ⬢ postgis-demo (including all add-ons)... done Wait, that’s all? Yeah, that’s all. With a single command and confirmation, everything is torn down, and I don’t need to worry about it anymore. $ heroku apps You have no apps. $ heroku addons No add-ons. $ heroku apps You have no apps. $ heroku addons No add-ons. Now that I’ve deleted my app, you have an incredible opportunity: The unique app name postgis-demo is available for the first reader who wants to grab it on Heroku! Are you ready to build your next great PostGIS app? Today is the day! postgis-demo