is a change data capture tool for moving data from to . It allows you to keep Postgres as your source-of-truth and expose structured denormalized documents in . PGSync Postgres Elasticsearch Elasticsearch This can be useful for building Backend services for driving text search applications or building real-time dashboard applications. Changes to nested entities are propagated to . PGSync’s advanced query builder then generates SQL queries based on your schema. Elasticsearch Simply describe your schema in JSON and PGSync will continuously capture changes in your data and load it into . PGSync provides a self-managed solution for change data capture. Elasticsearch Why? At a high level, you have data in a Postgres database and you want to expose it in . This means every change to your data needs to be replicated to Elasticsearch. Elasticsearch At first, this seems easy and then it’s not. Simply add some code to copy the data to after updating the database or perform the so-called dual writes at your application level. Writing SQL queries spanning multiple tables and involving multiple relationships can be nontrivial. Elasticsearch Detecting changes within nested documents can also be quite hard. Of course, if your data never changed, then you could just take a snapshot in time and load it into . Keep in mind, you shouldn’t really store your primary data in . Elasticsearch is more suited as a secondary denormalized search engine to be used alongside a traditional normalized datastore. Elasticsearch Elasticsearch One of the challenges is getting the data out of the source of truth and into the secondary store in a reasonable timeframe. Existing tools such as Apaches’ Kafka, Amazons’ Kinesis or Elastics’ Logstash require a fair amount of engineering and expertise. How it works leverages the feature of Postgres introduced in PostgreSQL 9.4 to capture a continuous stream of change events. PGSync logical decoding PGSync’s query builder is capable of building advanced relational queries dynamically from your schema. Simply, define a schema (JSON) describing the structure of your data in Elasticsearch, bootstrap the databases and start the PGSync daemon. It operates both a polling and an event-driven model to capture changes made to date and notification for changes that occur at a point in time. The initial sync polls the database for changes since the last iteration and thereafter reverts to event notifications (based on triggers and handled by ) for changes to the database. pg-notify There is no need to pollute your database with fields such as ` `, ` ` or ` ` flags to detect and track row-level changes. updated_at timestamp status When to use PGSync reduces the complexity of most application stacks. PGSync Postgres is your read/write source of truth whilst Elasticsearch is your read-only search layer. You have data that is constantly changing. You have data in an existing relational database such as Postgres and you need a secondary NoSQL database like Elasticsearch for text-based or auto-complete queries. You want to avoid the development overhead and complexity mandated by other tools. Installing PGSync Prerequisites: , , , Python 3.6+ Redis 3.1.0+ Elasticsearch 5.0+ PostgreSQL 9.4+. Install from . PGSync PyPi $ pip install pgsync Create a schema configuration file in JSON format This should mirror the structure of the resulting document in . Elasticsearch Here is an example schema: { : [
        { : , : , : [ , , ]
        }
    ]
} "nodes" "table" "book" "schema" "public" "columns" "isbn" "title" "description" Usage $ pgsync --config <absolute path to schema config> --daemon JSON More details about can be found on its . PGSync Github repository Market Alternatives : This is an Apache project which is an open-source stream-processing software platform. Kafka : Amazon managed service similar to Kafka. Kinesis : This is a product of that collects data from various sources, then parses, transforms and ships to various destinations. Logstash Elastic : Postgres extension that provides full-text search via the use of Elasticsearch indices. ZomboDB Also published at https://medium.com/@toluaina/real-time-integration-of-postgresql-with-elasticsearch-with-pgsync-9425ffa9b4e9

Stacks

Alongside

PGSync Introduction: Real-time Integration Tool For PostgreSQL And Elasticsearch

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

10 Cool CI/CD Tools For Your Project

27 Stories To Learn About Ssl

20 Essential Backend Tools For Developers

3 Most Common Ways to Connect your Node and React Applications

369 Stories To Learn About Database

10 Cool CI/CD Tools For Your Project

27 Stories To Learn About Ssl

20 Essential Backend Tools For Developers

3 Most Common Ways to Connect your Node and React Applications

369 Stories To Learn About Database

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps