paint-brush
How to Query Deeply Nested JSON Data in PSQLby@javidasgarov
8,412 reads
8,412 reads

How to Query Deeply Nested JSON Data in PSQL

by Javid AsgarovMarch 27th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Recently I had to write a script, which should’ve changed some JSON data structure in PSQL database. Since, I didn’t have a lot of experience with [SQL](https://hackernoon.com/how-to-build-a-decentralized-dsp-for-interactive-ad-campaigns) or PSQL for that matter, reading and understanding understanding for my specific use cases took a little too long. In this article, I want to document some of the things I've learned.

People Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How to Query Deeply Nested JSON Data in PSQL
Javid Asgarov HackerNoon profile picture

Recently I had to write a script, which should’ve changed some JSON data structure in PSQL database. Basically, I had to parse one value and put that in another field. Since, I didn’t have a lot of experience with SQL or PSQL for that matter, reading and understanding documentation for my specific use cases took a little too long.


I had a complicated and deeply nested JSON, which wasn’t super clear on how to access some of the information.


In this article, I want to document some of the things I’ve learned.


For starters, I need a place where I can fiddle with databases, and serendipitously enough I found dbfiddle. You can check out the queries from this article there.


First let’s start by creating a table, and specifying that we need a PSQL database. And right away let’s add some data we can work with later.


CREATE TABLE jsonb_exp (
  id serial primary key,
  data jsonb
);

INSERT INTO jsonb_exp (data) VALUES 
  ('{"data":{"1":{"items":[{"name":"John"},{"name":"Jack"}]},"2":{"items":[{"name":"John Dorian"},{"name":"Jack Sparrow"}]},"3":{"items":[{"name":"Tom Cruise"},{"name":"somerandomtext"}]},"5":{"items":[{"name":"Tom Cruise"}]}},"property":"Some string"}'),
  ('{"data":{"4":{"items":[{"name":"Maria"},{"name":"Jack"}]},"6":{"items":[{"name":"Jack Jackson"},{"name":"Thomas ---"}]},"7":{"items":[{"name":"-"},{"name":"somerandomtext"}]},"15":{"items":[{"name":"hello"}]}},"property":"Some string"}'),
  ('{"a": "b", "c": [1, 2, 3]}'


Usually, you wouldn’t have a different structure inside your JSON across records, but it’s just for the tests. We can now select and see what we’ve created in a table format.


SELECT * from jsonb_exp;


First, let’s get the basics out of the way.


-> returns jsonb
->> returns text


Not sure, what type you’re working with, you can use pg_typeof.


SELECT data->>'data' as data, pg_typeof(data->'data'), pg_typeof(data->>'data') from jsonb_exp where data->>'data' IS NOT NULL;


Querying from deeply nested JSONB

In this scenario, as you can see in the example above: We have an object, it has a defined structure, but some of the keys are like array indices. Not a common thing, maybe. But it happens.


To the rescue comes the following method:


jsonb_path_query()


It’s in the documentation, but it wasn’t clear to me right away how it works. For the first argument, you pass the object, next you can pass the query. Let’s say we need to query particular items from the object, from ‘items’ property, and only those that contain ‘Jack’.


SELECT id, data, item FROM jsonb_exp t, jsonb_path_query(data->'data', '$.*.items[*]') as item
 WHERE data->'data' IS NOT NULL AND item->>'name'='Jack';


Unpacking. Let’s first look at the jsonb_path_query query, everything else is just the support around it to demonstrate what we did.


Reading ‘$..items[]’, here is how you can read it. $ - the object itself, it can contain a key we don’t know - mark it with an asterisk. Inside of that, we want to look for property items, which is an array, and using [*] we say, that we want every item inside that array.

The comma in the query is just an implicit cross-join, not super relevant to JSONB methods.


Remove the ‘WHERE’ Clause and filter right away

Here is the code, you can add conditions inside the query.


SELECT id, data, jsonb_path_query(data->'data', '$.*.items[*] ? (@.name == $name)', '{"name": "John"}') FROM jsonb_exp
 WHERE data->'data' IS NOT NULL;


As you can see here, you can even add the third parameter where you can keep the variables to be used in the query. It’s just for demo purposes here, you can absolutely remove it and compare it with the string without using additional variables.


Notice here, you use double equal signs and not one as you would outside of this query.

You also can’t just write LIKE as your comparison operator. But you can use like_regex. Let’s say we want to find every item, where the name starts with ‘John’.


SELECT id, data, jsonb_path_query(data->'data', '$.*.items[*] ? (@.name like_regex "^John")') FROM jsonb_exp
 WHERE data->'data' IS NOT NULL;



Hope that was helpful.



Lead image by Luke Chesser on Unsplash