Most modern DBMSs (Database Management System) are relational, i.e., they represent data in the form of a two—dimensional table in which there are rows (records) and columns (fields of records). But in practice, we often come across a different data organization, namely hierarchical.
Hierarchical queries are an important tool in the field of data management, allowing you to work effectively with hierarchical data structures such as organizational trees, category trees, genealogical structures, etc. These queries play a key role in identifying relationships between data and extracting information in the form of tree structures.
The problem is that data with a hierarchical structure is very poorly represented in the relational model. The SQL-92 standard does not have the means to process them.
But such tools appeared in the SQL-1999 standard. However, by that time, Oracle already had its own CONNECT BY
operator. Despite this, in SQL 1999, the syntax of recursive queries is completely different from the CONNECT BY
syntax in Oracle and uses the WITH
keyword. The implementation of recursive queries in other DBMS is somewhat late, so it appeared in MS SQL Server only in version 2005.
As well as in syntax, there are differences in terminology. In Oracle, commonly discussed queries are called “hierarchical,” while all others are “recursive.” The essence of this does not change; I will use both.
In this article, we will focus on the consideration of hierarchical queries in two popular database management systems - Oracle and PostgreSQL. Both platforms have unique features and approaches to processing hierarchical data, and our introduction to them will allow the reader to better understand the context of their application.
A hierarchical structure can be represented as a set of nodes, where each node is connected to one or more nodes at a higher level. Important terms include parents, children, leaves, and roots.
Examples of hierarchical structures include company organizational trees, family family trees, category trees in online stores, class hierarchies in educational institutions, and more.
For demonstration, we will use a test employee table consisting of 3 fields:
id
– ID,
name
- is the name of the employee,
manager_id
- is the ID of the manager (refers to the ID of another record in the same table)
create table test_employees (
id number primary key,
name varchar2(32),
manager_id number
);
Let's fill the table with test data according to the following tree:
insert into test_employees(id, name, manager_id)
values (1, 'Henry Anderson', null);
insert into test_employees(id, name, manager_id)
values (2, 'Julia Miller', 1);
insert into test_employees(id, name, manager_id)
values (3, 'Mark White', 1);
insert into test_employees(id, name, manager_id)
values (4, 'Paul Scott', 2);
insert into test_employees(id, name, manager_id)
values (5, 'Daniel Hill', 2);
insert into test_employees(id, name, manager_id)
values (6, 'Maria Lopez', 3);
insert into test_employees(id, name, manager_id)
values (7, 'Monika Moore', 3);
insert into test_employees(id, name, manager_id)
values (8, 'Alex Smith', 4);
insert into test_employees(id, name, manager_id)
values (9, 'Brandon Brown', 4);
insert into test_employees(id, name, manager_id)
values (10, 'Elizabeth Jackson', 5);
insert into test_employees(id, name, manager_id)
values (11, 'Kevin Green', 6);
insert into test_employees(id, name, manager_id)
values (12, 'Jessica Johnson', 7);
insert into test_employees(id, name, manager_id)
values (13, 'Lisa Williams', 7);
insert into test_employees(id, name, manager_id)
values (14, 'Angela Martinez', 7);
In Oracle, hierarchical queries appeared in version 8, long before the standard appeared. Therefore, a completely different syntax is still used.
The optional START WITH
operator tells the Oracle where to start the loop, i.e. which row (or strings) will be the root. The condition can be almost anything, you can even use functions or internal queries: manager_id is null
, or manager_id = 1
, or even name like 'Henry%'
.
The CONNECT BY
operator is required. It establishes a relationship between the parent and child elements of the hierarchy. In the condition of the CONNECT BY
operator, it is absolutely necessary to use the unary PRIOR
operator, which refers to the previous record.
How does it work? Oracle finds the first record that satisfies the condition in START WITH
, and starts looking for the next one. At the same time, that first entry can be accessed via PRIOR
. If we did everything correctly, then Oracle will search for records in which the field for storing information about the parent (manager_id
) will contain a value equal to the id of our first record.
This way, all descendants of the root record will be found. And since the process is recursive, a similar search will continue with each row found until all the descendants are found. It is also possible to use the rownum pseudo column, in which the rows are numbered starting from 1 in the order in which they are issued. And the pseudo-column LEVEL
, which shows the level in the hierarchy, will also be very useful to us. So, the 1st record will have level 1, its descendants level 2, descendants of descendants - 3, etc.
select level, te.id, te.manager_id, te.name
from test_employees te
start with te.manager_id is null
connect by prior te.id = te.manager_id
order siblings by te.id
Using the siblings
keyword, we say that you need to sort only within one level of the hierarchy. This will become more clear if you remove all unnecessary fields in the request and add margins:
select lpad(' ', 3 * level) || te.name as Tree
from test_employees te
start with te.manager_id is null
connect by prior te.id = te.manager_id
order siblings by te.id
TREE
Henry Anderson
Julia Miller
Paul Scott
Alex Smith
Brandon Brown
Daniel Hill
Elizabeth Jackson
Mark White
Maria Lopez
Kevin Green
Monika Moore
Jessica Johnson
Lisa Williams
Angela Martinez
NOCYCLE
is a parameter for excluding cyclic references. Use this parameter along with the CONNECT_BY_ISCYCLE
pseudocolumn to see which rows contain the loop.
We use the same test table.
create table test_employees (
id integer primary key,
name text,
manager_id integer
);
Queries to fill the table with data will look exactly the same as in Oracle.
with recursive employees(id, manager_id, name) as (
select te.id, te.manager_id, te.name
from test_employees te
where te.manager_id is null
union all
select te.id, te.manager_id, te.name
from employees e, test_employees te
where te.manager_id = e.id
)
select id, manager_id, name from employees e;
Calculating a recursive query:
The non-recursive part is calculated. For UNION
(but not UNION ALL
), duplicate rows are discarded. All remaining rows are included in the result of the recursive query and are also placed in a temporary worktable. The non-recursive part is an analog of START WITH
for an oracle query.
While the worksheet is not empty, the following steps are repeated:
What are the other differences from the Oracle? There are no amenities like level, but they can be done on their own:
with recursive employees(id, level, manager_id, name) as (
select te.id, 1, te.manager_id, te.name
from test_employees te
where te.manager_id is null
union all
select te.id, e.level + 1, te.manager_id, te.name
from employees e, test_employees te
where te.manager_id = e.id
)
select id, level, manager_id, name from employees e;
The decision between Oracle and PostgreSQL to process hierarchical data depends on a number of factors.
In general, the capabilities of PostgreSQL are about the same as in Oracle, although, in my opinion, it is a little less convenient and transparent.