How Useful is PostgreSQL Full Text Search
Father, web developer, and pizza maker; Software Entomologist
PostgreSQL is the jack of all trades when it comes to databases. It brings you all the features you've come to love about SQL plus a bunch of features from Non-SQL databases. Some of these Non-SQL features like the JSONB data type are wonderful and you don't even have to dare flirt with another database. Others are good but not as robust and featureful as other databases.
One of these newer Non-SQL features built into PostgreSQL is Full-Text Search. Is the full-text search in PostgreSQL fully baked or will you need a separate search index? It is an alluring idea if you could build out a full text search without another layer of technology.
After trying to use PostgreSQL for Full-Text Search here are my thoughts.
Why Do You Need Full Text Search?
Lets say you have a table with the following fields:
If you want to implement a full-text search with plain SQL you have to do something like:
SELECT * WHERE
name ILIKE '%search%' OR
address ILIKE '%search%' OR
city ILIKE '%search%' OR
state ILIKE '%search%'
As your table becomes large, this search will take longer and longer because Postgres will have to do a full table scan.
You can of course add an index for all the fields to make this go faster but this will lead to a very large index that also is not very efficient.
Additionally, adding common features like misspelling detection and near matching will be difficult to add.
If you have a large set of data or want a more robust feature set you will have to use a search index database like Elastic or Apache Solr. Or you can use some of Postgres full-text search capabilities.
lets you create data that is optimized for searching. This can be done on the fly or you can add a column to your table to save and index the data. By doing the latter you can create faster searches but you have to remember to keep the data up to date.
For a smaller data set, creating the tsvector data on the fly is fairly fast.
Once you have tsvector data, you can perform searches in a "Google" style on multiple fields and rank searches based on field weights. However, tsvector has some downsides.
While it can match words and phrases, it can not do a full wildcard search. To compare it to
, you can do
So while tsvector will give a performance boost and some more search oriented features, It's not going to give you all the search index features you may want out of the box.
Trigram (or Trigraph)
is a PostgreSQL extension that implements trigrams. This feature can be used to implement a fuzzier and faster more efficient
. Just like tsvector you can create the trigrams on the fly but for the fastest operations you'll want to have a text field in your table that contains all the text you want to be searchable.
You can then simply index this field with
. Because trigrams can give you similar words, you could also potentially use this for a spelling corrections feature.
While trigrams give you a better way to do partial matches and similarity matching, you lose the ability to rank searches efficiently.
Is It Worth?
Both of these features give you a good beginning for full-text search but I wouldn't say they give you a full feature set. You can definitely work around some of the short comings with some creativity and extra work.
So the real question is how important is it to you to have all your data in one location?
If you can live with some of the shortcomings then this can simplify your development a lot. However, if you want a Google Search feature set out of the box, look elsewhere.
Subscribe to get your daily round-up of top tech stories!