How to Search JSON Data from PostgreSQL

Damindu Lakmal
5 min readDec 11, 2021
Full Text Search

Postgres is open source relational database management system. Most of us use MySQL database to keep the relations between data. When we are dealing with different type of data like avro, JSON then we have to take NoSQL database to tackle our business expectation. That’s where the PostgreSQL come into the party.

Data Format

First of all setup the postgres in your environment. Further we will discuss important extensions that need to install. Let’s create the customer table as our example.

CREATE TABLE ”customer” (
“key” int NOT NULL,
“data” jsonb NOT NULL,
deleted bool NULL DEFAULT false,
CONSTRAINT customer_pk PRIMARY KEY (key)
);

customer table has three columns,

  • key : Index of the customer (primary key)
  • data : customer information in JSON format
  • deleted : identify deletion

key

Key is unique identification of the customer so this column is define as primary key of customer table.

Data

All the customer information is saved as JSON object. JSON object is saved as jsonb type to binary search through the object. As a example choose our JSON object as follow,

{
"first_name" : "dam",
"last_name" : "lak",
"email" : "damlak@gmail.com",
"phone_numbers" : [
{
"type" :"personal",
"number" : "1234552342"
},
{
"type" :"home",
"number" : "098779897"
}
],
"associate" : "none",
"type" : "primary"
}

Name fields and email are flat JSON values. Phone number filed has array of JSON objects to obtain more customer data.

Deleted

This flag is contained the state of customer. Permanent data deletion is not proper way to handle in business.

JSON Operators

Mainly we are focused on search in JSON object. Unstructured data notation is bit different. Go through this for more information. In here we are only talking about two notation,

  1. short arrow ( ->)

return pointed value type as a JSON

select data->'phone_numbers' from customer;
--- result ---
[
{
"type" :"personal",
"number" : "1234552342"
},
{
"type" :"home",
"number" : "098779897"
}
]

2. Long arrow (->>)

return pointed value type as a text

select data->>'first_name' from customer;
--- result ---
dam

Functions

Query can be simplify by using functions. That will be useful in future scenarios.

JSON Object Function

create or replace function jsonb_values_of_key(jsonb,field text)   
returns text
language sql
immutable
as $$
select $1->>field from jsonb($1)
$$;

example use case,

select * from jsonb_values_of_key($josnb${   
"name":"medium",
"index":1
}$jsonb$,'name' );
--- result ---
medium

JSON Array Function

create or replace function jsonb_array_of_value(jsonb, field text)
returns text
language sql
immutable
as $$
select array_to_string(array(select attr.value from jsonb_array_elements($1) arrayElem, jsonb_each(arrayElem) attr WHERE attr.key=field),' ')
$$;

example use case,

select * from jsonb_array_of_value($josnb$[{   
"name":"json",
"index":1
},{
"name":"array",
"index":1
}
]$jsonb$,'name' );
--- result ---
json array
Photo by Mark Duffel on Unsplash

Simple Search

In our example, customer has multiple filed which need to be searched as full text. Postgres provide multiple operators and unique notation for search terms. Operators can be identified as ilike, like, = . Notations are bit differ from normal regex such as % matches any sequence of zero or more characters, _ matches any single character.

Customer first name, last name, email and phone numbers should be search through full text search.

Search without Indexes

Full text using ilike,

select  data from customer as c left join (select key, jsonb_array_elements(s.data->'phone_numbers')->>'number' ilike '%post%' as E from customer as s) as EEE using (key) where ((c.data->>'first_name' ilike '%post%')  OR (c.data->>'last_name' ilike '%post%') OR (c.data->>'email' ilike '%post%')  OR (EEE.E) )  and not c.deleted group by c.data  ORDER BY c.data->>'first_name' ASC 

search with functions,

select  data from customer as c where ((jsonb_values_of_key(c.data,'first_name') ilike '%post%')  OR (jsonb_values_of_key(c.data,'last_name') ilike '%post%') OR (jsonb_values_of_key(c.data,'email') ilike '%post%')  OR (jsonb_array_of_value(data->'phone_numbers','number') ilike '%post%') )  and not c.deleted group by c.data  ORDER BY jsonb_values_of_key(c.data,'first_name') ASC

Search with Indexes

PosgreSQL execute searching through indexes or sequence. Most of the time indexing can be reduced the execution while using Bit map indexing or indexing. GIN indexing is suitable for our requirement. Although memory consumption will be higher with data. Let’s create major search filed indexes,

Extension

CREATE EXTENSION pg_trgm;

First Name

CREATE INDEX customer_gin_first ON customer
USING GIN(jsonb_values_of_key(data::jsonb,'first_name') gin_trgm_ops);

Last Name

CREATE INDEX customer_gin_last ON customer
USING GIN(jsonb_values_of_key(data::jsonb,'last_name') gin_trgm_ops);

Email

CREATE INDEX customer_gin_email ON customer
USING GIN(jsonb_values_of_key(data::jsonb,'email') gin_trgm_ops);

Phone Number

CREATE INDEX customer_gin_number ON customer
USING GIN(jsonb_array_of_value((data->'phone_numbers')::jsonb,'number') gin_trgm_ops);

Then execute below query to experience the time difference between with indexing.

select  data from customer as c where ((jsonb_values_of_key(c.data,'first_name') ilike '%post%')  OR (jsonb_values_of_key(c.data,'last_name') ilike '%post%') OR (jsonb_values_of_key(c.data,'email') ilike '%post%')  OR (jsonb_array_of_value(data->'phone_numbers','number') ilike '%post%') )  and not c.deleted group by c.data  ORDER BY jsonb_values_of_key(c.data,'first_name') ASC

Comparison

Let’s compare the statistics between those two queries. 100000 records has been created for our analyses. Below analyzes execute in i7 10th generation 8 cpu machine. Let’s run query without indexing,

explain analyze select  data from customer as c left join (select key, jsonb_array_elements(s.data->'phone_numbers')->>'number' ilike '%dam%' as E from customer as s) as EEE using (key) where ((c.data->>'first_name' ilike '%dam%')  OR (c.data->>'last_name' ilike '%dam%') OR (c.data->>'email' ilike '%dam%')  OR (EEE.E) )  and not c.deleted group by c.data  ORDER BY c.data->>'first_name' ASC
analyze without indexes

Then run the query with indexes,

explain analyze select  data from customer as c where ((jsonb_values_of_key(c.data,'first_name') ilike '%dam%')  OR (jsonb_values_of_key(c.data,'last_name') ilike '%dam%') OR (jsonb_values_of_key(c.data,'email') ilike '%dam%')  OR (jsonb_array_of_value(data->'phone_numbers','number') ilike '%dam%') )  and not c.deleted group by c.data  ORDER BY jsonb_values_of_key(c.data,'first_name') ASC
analyze with indexes

After indexing the query, that’s over 6x speed.

Summary

This was only an overview of search fields in PosgreSQL. I’d recommended to use documentation of Postgres. Understand notations, operators and underline architecture.

Thank you for reading this article. I hope you enjoyed it!

--

--