A hierarchical query is one that works on data with a "tree" relationship.
An Example of Hierarchical Data
The employee-manager relationship in SCOTT’s famous EMP table. Each employee record has a manager’s id associated with it. In effect, there is a "tree" of data relationships
At each level, I can get the employee-manager relationship as below:
SQL> select empno, ename, mgr mgrno, 2 (select e2.ename from emp e2 3 where e1.mgr = e2.empno) mgrname 4 from emp e1 5 / EMPNO ENAME MGRNO MGRNAME ---------- -------- ---------- ---------- 7369 SMITH 7902 FORD 7499 ALLEN 7698 BLAKE 7521 WARD 7698 BLAKE 7566 JONES 7839 KING 7654 MARTIN 7698 BLAKE 7698 BLAKE 7839 KING 7782 CLARK 7839 KING 7788 SCOTT 7566 JONES 7839 KING 7844 TURNER 7698 BLAKE 7876 ADAMS 7788 SCOTT 7900 JAMES 7698 BLAKE 7902 FORD 7566 JONES 7934 MILLER 7782 CLARK 14 rows selected.
But what if I want to select –
- the entire hierarchy of employees under JONES?
- the line of reporting above MILLER, till the topmost manager?
These queries require a special way of traversing through the "tree". The next sections introduces you to the SQL keywords that you will need to write such a query, and finally the query.
Processing Hierarchical Data: START WITH.CONNECT BY
The START WITH.CONNECT BY clause can select hierarchical data, to answer the above two requirements.
The structure of a hierarchical query is:
SELECT <data> FROM <table> START WITH <root> CONNECT BY PRIOR <condition>
START WITH specifies the topmost node of the tree or in other words, where to start parsing from.
CONNECT BY specifies the relationship between the parent and the child.
PRIOR achieves the recursive condition to traverse the tree.
Let’s see the query in action.
SQL to get the entire hierarchy below a node
In the EMP table example, the query to get the entire hierarchy of employees under JONES:
SQL> select empno, ename 2 , sys_connect_by_path(ename, ' -> ') path 3 from emp 4 start with ename = 'JONES' -- the root node 5 connect by prior empno = mgr -- the recursive condition 6 / EMPNO ENAME PATH ---------- -------- ----------------------------------------- 7566 JONES -> JONES 7788 SCOTT -> JONES -> SCOTT 7876 ADAMS -> JONES -> SCOTT -> ADAMS 7902 FORD -> JONES -> FORD 7369 SMITH -> JONES -> FORD -> SMITH
SYS_CONNECT_BY_PATH(column, char) returns the path of a column value from root to node. The column values in the path are separated by the "char" specified in the SQL, such as -> in this case.
SQL to get the entire hierarchy above a node
In the EMP table example, the query to get the line of reporting above MILLER, till the topmost manager:
SQL> select sys_connect_by_path(ename, ' -> ') line_of_reporting 2 from emp 3 start with ename = 'MILLER' -- the root node 4* connect by prior mgr = empno -- the recursive condition SQL> / LINE_OF_REPORTING ------------------------------------------------------------ -> MILLER -> MILLER -> CLARK -> MILLER -> CLARK -> KING
Almost like the previous one with a crucial difference: the connect by prior condition is reversed.
A Memory Aid
To get the reporting hierarchy below an employee, should the condition be written as
connect by prior mgr = empno;
or
connect by prior empno = mgr?
This will not be confusing once you commit this pneumonic to memory:
When the tree to be queried has a heavy bottom (i.e. reporting hierarchy below an employee, starts with one node and branches to many), then the heavy side must be put later.
By this logic, the condition for querying the hierarchy below a node is: connect by prior empno = mgr
Summary
This article covers:
- The meaning of hierarchical data
- The type of query to process hierarchical data
- Keywords START WITH.CONNECT BY and others that feature in hierarchical queries
- SQL to get all nodes below a specific node in a hierarchy
- SQL to get all nodes above a specific node in a hierarchy
In the next article, we’ll talk about the LEVEL pseudocolumn in hierarchical queries.
Photo by shapeshift