Overview and Introduction
How Analytic Functions
Work
The Syntax
Calculate a
running Total
Top-N Queries
Example 1
Example 2
Windows
Range Windows
Compute
average salary for defined range
Row Windows
Accessing Rows Around Your Current
Row
LAG
LEAD
Determine
the First Value / Last Value of a Group
Crosstab or Pivot
Queries
Conclusion
Links and Documents
Analytic Functions, which have been available since
Oracle 8.1.6, are designed to address such problems as "Calculate a running total", "Find
percentages within a group", "Top-N queries", "Compute a moving average" and many more.
Most of these problems can be solved using standard PL/SQL, however the performance is
often not what it should be. Analytic Functions add extensions to the SQL language that
not only make these operations easier to code; they make them faster than could be
achieved with pure SQL or PL/SQL. These extensions are currently under review by the ANSI
SQL committee for inclusion in the SQL specification.
Analytic functions compute an aggregate value based
on a group of rows. They differ from aggregate functions in that they return multiple
rows for each group. The group of rows is called a window and is defined by the
analytic clause. For each row, a "sliding" window of rows is defined. The window
determines the range of rows used to perform the calculations for the "current row".
Window sizes can be based on either a physical number of rows or a logical interval such
as time.
Analytic functions are the last set of operations
performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP
BY, and HAVING clauses are completed before the analytic functions are processed.
Therefore, analytic functions can appear only in the select list or ORDER BY
clause.
The Syntax of analytic functions is rather
straightforward in appearance
Analytic-Function(<Argument>,<Argument>,...)
OVER (
<Query-Partition-Clause>
<Order-By-Clause>
<Windowing-Clause>
)
Specify the name of an analytic function, Oracle
actually provides many analytic functions such as AVG, CORR,
COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK,
FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE,
LEAD, MAX, MIN, NTILE, PERCENT_RANK,
PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT,
STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP,
VAR_SAMP, VARIANCE.
Analytic functions take 0 to 3
arguments.
The PARTITION BY clause logically breaks a single
result set into N groups, according to the criteria set by the partition expressions.
The words "partition" and "group" are used synonymously here. The analytic functions
are applied to each group independently, they are reset for
each group.
The ORDER BY clause specifies how the data is
sorted within each group (partition). This will definitely affect the outcome of any
analytic function.
The windowing clause gives us a way to define a
sliding or anchored window of data, on which the analytic
function will operate, within a group. This clause can be used to have the analytic
function compute its value based on any arbitrary sliding or anchored window within a
group. More information on windows can be found here.
This example shows the cumulative salary within a
departement row by row, with each row including a summation of the prior rows
salary.
set autotrace traceonly explain
break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999
column seq format 999
SELECT ename "Ename", deptno "Deptno", sal
"Sal",
SUM(sal)
OVER (ORDER BY deptno, ename) "Running Total",
SUM(SAL)
OVER (PARTITION BY deptno
ORDER BY ename) "Dept Total",
ROW_NUMBER()
OVER (PARTITION BY deptno
ORDER BY ENAME) "Seq"
FROM emp
ORDER BY deptno, ename
/
Ename
Deptno Sal Running Total Dept Total Seq
------ ------ ------ ------------- ---------- ----
CLARK 10
2450
2450 2450 1
KING
5000
7450 7450 2
MILLER
1300
8750 8750 3
ADAMS 20
1100
9850 1100 1
FORD
3000
12850 4100 2
JONES
2975
15825 7075 3
SCOTT
3000 18825
10075 4
SMITH
800 19625
10875 5
ALLEN 30
1600
21225 1600 1
BLAKE
2850
24075 4450 2
JAMES
950
25025 5400 3
MARTIN
1250
26275 6650 4
TURNER
1500
27775 8150 5
WARD
1250
29025 9400 6
Execution Plan
---------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 WINDOW (SORT)
2 1 TABLE ACCESS (FULL) OF 'EMP'
Statistics
---------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
1658 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed
The example shows how to calculate a "Running Total"
for the entire query. This is done using the entire ordered result set, via
SUM(sal) OVER (ORDER BY deptno, ename).
Further, we were able to compute a running total
within each department, a total that would be reset at the beginning of the next
department. The PARTITION BY deptno in that SUM(sal) caused this to happen,
a partitioning clause was specified in the query in order to break the data up into
groups.
The ROW_NUMBER() function is used to
sequentially number the
rows returned in each
group, according to our ordering criteria (a "Seq" column was added to
in order to display this position).
The execution plan shows, that the whole query is
very well performed with only 3 consistent gets, this can never be accomplished with
standard SQL or even PL/SQL.
How can we get the Top-N records by some set of
fields ?
Prior to having access to these analytic functions, questions of this nature were
extremely difficult to answer.
There are some problems with Top-N queries however;
mostly in the way people phrase them. It is something to be careful about when designing
reports. Consider this seemingly sensible request:
I would like the top three paid sales reps by
department
The problem with this question is that it is
ambiguous. It is ambiguous because of repeated values, there might be four people who all
make the same salary, what should we do then ?
Let's look at three examples, all use the well known
table EMP.
Sort the sales people by salary from greatest to
least. Give the first three rows. If there are less then three people in a department,
this will return less than three records.
set autotrace on explain
break on deptno skip 1
SELECT * FROM (
SELECT deptno, ename, sal, ROW_NUMBER()
OVER (
PARTITION BY deptno ORDER BY sal DESC
) Top3 FROM emp
)
WHERE Top3 <= 3
/
DEPTNO
ENAME
SAL TOP3
---------- ---------- ---------- ----------
10
KING
5000 1
CLARK
2450 2
MILLER
1300 3
20
SCOTT
3000 1
FORD
3000 2
JONES
2975 3
30
BLAKE
2850 1
ALLEN
1600 2
TURNER
1500 3
9 rows selected.
Execution Plan
--------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 VIEW
2 1 WINDOW (SORT)
3 2 TABLE ACCESS
(FULL) OF 'EMP'
This query works by sorting each partition (or
group, which is the deptno), in a descending order, based on the salary column and then
assigning a sequential row number to each row in the group as it is processed. The use of
a WHERE clause after doing this to get just the first three rows in each
partition.
Give me the set of sales people who make the top 3
salaries - that is, find the set of distinct salary amounts, sort them, take the largest
three, and give me everyone who makes one of those values.
SELECT * FROM (
SELECT deptno, ename, sal,
DENSE_RANK()
OVER (
PARTITION BY deptno ORDER BY sal desc
) TopN FROM emp
)
WHERE TopN <= 3
ORDER BY deptno, sal DESC
/
DEPTNO
ENAME
SAL TOPN
---------- ---------- ---------- ----------
10
KING
5000 1
CLARK
2450 2
MILLER
1300 3
20
SCOTT
3000 1 <--- !
FORD
3000 1 <---
!
JONES
2975 2
ADAMS
1100 3
30
BLAKE
2850 1
ALLEN
1600 2
30
TURNER
1500 3
10 rows selected.
Execution Plan
--------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 VIEW
2 1 WINDOW (SORT PUSHED RANK)
3 2 TABLE ACCESS
(FULL) OF 'EMP'
Here the DENSE_RANK function was used to get the top three
salaries. We assigned the dense rank to the salary column and sorted it in a descending
order.
The DENSE_RANK function computes the rank of a row
in an ordered group of rows. The ranks are consecutive integers beginning with 1. The
largest rank value is the number of unique values returned by the query. Rank values are
not skipped in the event of ties. Rows with equal values for the ranking criteria receive
the same rank.
The DENSE_RANK function does not skip numbers and
will assign the same number to those rows with the same value. Hence, after the
result set is built in the inline view, we can simply select
all of the rows with a dense rank of three or less, this gives us everyone who makes the
top three salaries by department number.
The windowing clause gives us a way to define a
sliding or anchored window of data, on which the analytic
function will operate, within a group. The default window is an anchored
window that simply starts at the first row of a group an continues to the current
row.
We can set up windows based on two criteria: RANGES of data
values or ROWS offset from the current row. It can be said, that the existance
of an ORDER BY in an analytic function will add a default window clause of RANGE
UNBOUNDED PRECEDING. That says to get all rows in our partition that came before us as
specified by the ORDER BY clause.
Let's look at an example with a sliding window within a group and
compute the sum of the current row's SAL column plus the previous 2 rows in that group.
If we need a report that shows the sum of the current employee's salary with the
preceding two salaries within a departement, it would look like this.
break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999
SELECT deptno "Deptno", ename "Ename", sal "Sal",
SUM(SAL)
OVER (PARTITION BY deptno
ORDER BY ename
ROWS 2 PRECEDING) "Sliding
Total"
FROM emp
ORDER BY deptno, ename
/
Deptno Ename Sal Sliding
Total
------ ------ ------ -------------
10 CLARK
2450 2450
KING
5000 7450
MILLER
1300 8750
20 ADAMS
1100 1100
FORD
3000 4100
JONES
2975 7075 ^
SCOTT
3000 8975 |
SMITH
800 6775 \-- Sliding
Window
30 ALLEN
1600 1600
BLAKE
2850 4450
JAMES
950 5400
MARTIN
1250 5050
TURNER
1500 3700
WARD
1250 4000
The partition clause makes the SUM (sal) be computed within each
department, independent of the other groups. Tthe SUM (sal) is ' reset ' as the
department changes. The ORDER BY ENAME clause sorts the data within each department by
ENAME; this allows the window clause: ROWS 2 PRECEDING, to access the 2 rows prior to
the current row in a group in order to sum the salaries.
For example, if you note the SLIDING TOTAL value for SMITH is 6 7 7 5, which is the sum of 800, 3000, and 2975. That was
simply SMITH's row plus the salary from the preceding two rows in the window.
Range windows collect rows together based on a WHERE clause. If I say '
range 5 preceding ' for example, this will generate a sliding window that has the set of
all preceding rows in the group such that they are within 5 units of the current row.
These units may either be numeric comparisons or date comparisons and it is not valid to
use RANGE with datatypes other than numbers and dates.
Count the employees which where hired within the last 100 days
preceding the own hiredate. The range window goes back 100 days from the current row's
hiredate and then counts the rows within this range. The solution ist to use the
following window specification:
COUNT(*) OVER (ORDER BY hiredate ASC RANGE 100 PRECEDING)
column ename heading "Name" format a8
column hiredate heading "Hired" format a10
column hiredate_pre heading "Hired-100" format a10
column cnt heading "Cnt" format 99
SELECT ename, hiredate, hiredate-100 hiredate_pre,
COUNT(*)
OVER (
ORDER BY hiredate ASC
RANGE 100 PRECEDING
) cnt
FROM emp
ORDER BY hiredate ASC
/
Name Hired
Hired-100 Cnt
-------- ---------- ---------- ---
SMITH 17-DEC-80 08-SEP-80 1
ALLEN 20-FEB-81 12-NOV-80 2
WARD 22-FEB-81 14-NOV-80 3
JONES 02-APR-81
23-DEC-80 3
BLAKE 01-MAY-81
21-JAN-81 4
CLARK 09-JUN-81
01-MAR-81 3
TURNER 08-SEP-81 31-MAY-81 2
MARTIN 28-SEP-81 20-JUN-81 2
KING 17-NOV-81 09-AUG-81 3
JAMES 03-DEC-81 25-AUG-81 5
FORD 03-DEC-81 25-AUG-81 5
MILLER 23-JAN-82 15-OCT-81 4
SCOTT 09-DEC-82 31-AUG-82 1
ADAMS 12-JAN-83 04-OCT-82 2
We ordered the single partition by hiredate ASC. If we look for example
at the row for CLARK we can see that his hiredate was 09-JUN-81, and 100 days prior to
that is the date 01-MAR-81. If we look who was hired between 01-MAR-81 and 09-JUN-81, we
find JONES (hired: 02-APR-81) and BLAKE (hired: 01-MAY-81). This are 3 rows including the
current row, this is what we see in the column "Cnt" of CLARK's row.
As an example, compute the average salary of people hired within 100
days before for each employee. The query looks like this:
column ename heading "Name" format a8
column hiredate heading "Hired" format a10
column hiredate_pre heading "Hired-100" format a10
column avg_sal heading "Avg-100" format 999999
SELECT ename, hiredate, sal,
AVG(sal)
OVER (
ORDER BY hiredate ASC
RANGE 100 PRECEDING
) avg_sal
FROM emp
ORDER BY hiredate ASC
/
Name
Hired SAL
Avg-100
-------- ---------- ---------- -------
SMITH 17-DEC-80
800 800
ALLEN 20-FEB-81
1600 1200
WARD 22-FEB-81
1250 1217 JONES
02-APR-81 2975 1942
BLAKE 01-MAY-81
2850 2169
CLARK 09-JUN-81
2450 2758
TURNER 08-SEP-81
1500 1975
MARTIN 28-SEP-81
1250 1375
KING 17-NOV-81
5000 2583
JAMES 03-DEC-81
950 2340
FORD 03-DEC-81
3000 2340
MILLER 23-JAN-82
1300 2563
SCOTT 09-DEC-82
3000 3000
ADAMS 12-JAN-83
1100 2050
Look at CLARK again, since we understand his range
window within the group. We can see that the average salary of 2758 is equal to
(2975+2850+2450)/3. This is the average of the salaries for CLARK and the rows preceding
CLARK, those of JONES and BLAKE. The data must be sorted in ascending order.
Row Windows are physical units; physical number of
rows, to include in the window. For example you can calculate the average salary of a
given record with the (up to 5) employees hired before them or after them as
follows:
set numformat 9999
SELECT ename, hiredate, sal,
AVG(sal)
OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) AvgAsc,
COUNT(*)
OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) CntAsc,
AVG(sal)
OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) AvgDes,
COUNT(*)
OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) CntDes
FROM emp
ORDER BY hiredate
/
ENAME HIREDATE SAL AVGASC
CNTASC AVGDES CNTDES
---------- --------- ----- ------ ------ ------ ------ SMITH 17-DEC-80 800
800 1 1988 6
ALLEN 20-FEB-81 1600
1200 2 2104 6
WARD 22-FEB-81 1250
1217 3 2046 6
JONES 02-APR-81 2975
1656 4 2671 6
BLAKE 01-MAY-81 2850
1895 5 2675 6
CLARK 09-JUN-81 2450
1988 6 2358 6
TURNER 08-SEP-81 1500
2104 6 2167 6
MARTIN 28-SEP-81 1250
2046 6 2417 6
KING 17-NOV-81 5000
2671 6 2392 6
JAMES 03-DEC-81 950
2333 6 1588 4
FORD 03-DEC-81 3000
2358 6 1870 5
MILLER 23-JAN-82 1300
2167 6 1800 3
SCOTT 09-DEC-82 3000
2417 6 2050 2
ADAMS 12-JAN-83 1100
2392 6 1100
1
The window consist of up to 6 rows, the current row
and five rows " in front of " this row, where " in front of " is defined by the ORDER BY
clause. With ROW partitions, we do not have the limitation of RANGE partition - the data
may be of any type and the order by may include many columns. Notice, that we selected
out a COUNT(*) as well. This is useful just to demonstrate how many rows went into making
up a given average. We can see clearly that for ALLEN's record, the average salary
computation for people hired before him used only 2 records whereas
the computation for salaries of people hired after him used 6.
Frequently you want to
access data not only from the current row but the current row " in front of " or " behind " them. For example, let's say you need a report
that shows, by department all of the employees; their hire
date; how many days before was the last hire; how many
days after was the next hire.
Using straight SQL this query would be
difficult to write. Not only that but its
performance would once again definitely be questionable. The approach I typically took in the past was either to " select a select " or write a PL/SQL function
that would take some data from the current row and "
find " the previous and next rows data. This
worked, but introduce large overhead into both the development
of the query and the run-time execution of the
query.
Using analytic functions, this is easy and efficient to do.
set echo on
column deptno format 99 heading Dep
column ename format a6 heading Ename
column hiredate heading Hired
column last_hire heading LastHired
column days_last heading DaysLast
column next_hire heading NextHire
column days_next heading NextDays
break on deptno skip 1
SELECT deptno, ename, hiredate,
LAG(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) last_hire,
hiredate - LAG(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) days_last,
LEAD(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) next_hire,
LEAD(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) - hiredate
days_next
FROM emp
ORDER BY deptno, hiredate
/
Dep Ename Hired
LastHired DaysLast NextHire NextDays
--- ------ --------- --------- -------- --------- --------
10 CLARK 09-JUN-81
17-NOV-81 161
KING 17-NOV-81 09-JUN-81 161
23-JAN-82 67
MILLER 23-JAN-82
17-NOV-81 67
20 SMITH
17-DEC-80
02-APR-81 106
JONES 02-APR-81 17-DEC-80 106
03-DEC-81 245
FORD 03-DEC-81 02-APR-81 245
09-DEC-82 371
SCOTT 09-DEC-82 03-DEC-81 371
12-JAN-83 34
ADAMS 12-JAN-83 09-DEC-82
34
30 ALLEN
20-FEB-81
22-FEB-81 2
WARD 22-FEB-81
20-FEB-81 2
01-MAY-81 68
BLAKE 01-MAY-81 22-FEB-81 68
08-SEP-81 130
TURNER 08-SEP-81 01-MAY-81 130
28-SEP-81 20
MARTIN 28-SEP-81 08-SEP-81 20
03-DEC-81 66
JAMES 03-DEC-81 28-SEP-81
66
The LEAD and LAG routines could be considered a
way to " index into your partitioned group ". Using these functions you can access any individual
row. Notice for example in the above printout, it shows that
the record for KING includes the data (in bold
red font) from the prior row (LAST
HIRE) and the next row (NEXT-HIRE). We can access the fields in records preceding or following the current record in
an ordered partition easily.
LAG ( value_expr [, offset] [, default] )
OVER ( [query_partition_clause] order_by_clause )
LAG provides access to more than one row of a table at the same time
without a self join. Given a series of rows returned from a query and a position
of the cursor, LAG provides access to a row at a given physical offset prior to
that position.
If you do not specify offset, then its default is 1. The
optional default value is returned if the offset goes beyond the scope of the
window. If you do not specify default, then its default value is null.
The following example provides, for each person in the EMP table, the
salary of the employee hired just before:
SELECT ename,hiredate,sal,
LAG(sal, 1, 0)
OVER (ORDER BY hiredate) AS PrevSal
FROM emp
WHERE job = 'CLERK';
Ename Hired SAL PREVSAL
------ --------- ----- -------
SMITH 17-DEC-80 800 0
JAMES 03-DEC-81 950 800
MILLER 23-JAN-82 1300 950
ADAMS 12-JAN-83 1100 1300
LEAD ( value_expr [, offset] [, default] )
OVER ( [query_partition_clause] order_by_clause )
LEAD provides access to more than one row of a table at the same time without a
self join. Given a series of rows returned from a query and a position of the cursor,
LEAD provides access to a row at a given physical offset beyond that position.
If you do not specify offset, then its default is 1. The optional default value is
returned if the offset goes beyond the scope of the table. If you do not specify default,
then its default value is null.
The following example provides, for each employee in the EMP table, the hire date of
the employee hired just after:
SELECT ename, hiredate,
LEAD(hiredate, 1)
OVER (ORDER BY hiredate) AS NextHired
FROM emp WHERE deptno = 30;
Ename Hired
NEXTHIRED
------ --------- ---------
ALLEN 20-FEB-81 22-FEB-81
WARD 22-FEB-81 01-MAY-81
BLAKE 01-MAY-81 08-SEP-81
TURNER 08-SEP-81 28-SEP-81
MARTIN 28-SEP-81 03-DEC-81
JAMES 03-DEC-81
The FIRST_VALUE and LAST_VALUE functions allow you
to select the first and last rows from a group. These rows are especially valuable
because they are often used as the baselines in calculations.
The following example selects, for each employee in
each department, the name of the employee with the lowest salary.
break on deptno skip 1
SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (PARTITION BY deptno
ORDER BY sal ASC) AS MIN_SAL_HAS
FROM emp
ORDER BY deptno, ename;
DEPTNO
ENAME SAL
MIN_SAL_HAS
---------- ---------- ---------- -----------
10
CLARK 2450 MILLER
KING 5000
MILLER
MILLER
1300 MILLER
20
ADAMS 1100 SMITH
FORD 3000
SMITH
JONES 2975 SMITH
SCOTT 3000 SMITH
SMITH
800 SMITH
30
ALLEN 1600 JAMES
BLAKE 2850 JAMES
JAMES
950 JAMES
MARTIN 1250 JAMES
TURNER 1500 JAMES
WARD 1250
JAMES
The following example selects, for each employee in
each department, the name of the employee with the highest salary.
SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (PARTITION BY deptno
ORDER BY sal DESC) AS MAX_SAL_HAS
FROM emp
ORDER BY deptno, ename;
DEPTNO
ENAME SAL
MAX_SAL_HAS
---------- ---------- ---------- -----_-----
10
CLARK 2450 KING
KING
5000 KING
MILLER 1300 KING
20
ADAMS 1100 FORD
FORD
3000 FORD
JONES 2975 FORD
SCOTT 3000 FORD
SMITH 800
FORD
30
ALLEN 1600 BLAKE
BLAKE
2850 BLAKE
JAMES 950
BLAKE
MARTIN 1250 BLAKE
TURNER 1500 BLAKE
WARD 1250
BLAKE
The following example selects, for each employee in
department 30 the name of the employee with the lowest salary using an inline
view
SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (ORDER BY sal ASC) AS MIN_SAL_HAS
FROM (SELECT * FROM emp WHERE deptno = 30)
DEPTNO
ENAME SAL
MIN_SAL_HAS
---------- ---------- ---------- -----------
30
JAMES 950
JAMES
MARTIN 1250 JAMES
WARD 1250
JAMES
TURNER 1500 JAMES
ALLEN 1600 JAMES
BLAKE 2850
JAMES
A crosstab query, sometimes known as a
pivot query, groups your data in a slightly different way from those we have seen
hitherto. A crosstab query can be used to get a result with three rows (one for each
project), with each row having three columns (the first listing the projects and then one
column for each year) -- like this:
Project
2001 2002
ID
CHF CHF
-------------------------------
100 123.00
234.50
200 543.00
230.00
300 238.00
120.50
Example
Let's say you want to show the top 3 salary earners
in each department as columns. The query needs to return exactly 1 row per
department and the row would have 4 columns. The DEPTNO, the name of the highest paid
employee in the department, the name of the next highest paid, and so on. Using analytic
functions this almost easy, without analytic functions this was virtually
impossible.
SELECT deptno,
MAX(DECODE(seq,1,ename,null)) first,
MAX(DECODE(seq,2,ename,null)) second,
MAX(DECODE(seq,3,ename,null)) third
FROM (SELECT deptno, ename,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal
desc NULLS LAST) seq
FROM emp)
WHERE seq <= 3
GROUP BY deptno
/
DEPTNO
FIRST SECOND THIRD
---------- ---------- ---------- ----------
10 KING
CLARK MILLER
20 SCOTT
FORD JONES
30 BLAKE
ALLEN TURNER
Note the inner query, that
assigned a sequence (RowNr) to each employee by department number in order of
salary.
SELECT deptno, ename, sal,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) RowNr
FROM emp;
DEPTNO
ENAME
SAL ROWNR
---------- ---------- ---------- ----------
10
KING
5000 1
10
CLARK
2450 2
10
MILLER
1300 3
20
SCOTT
3000 1
20
FORD
3000 2
20
JONES
2975 3
20
ADAMS
1100 4
20
SMITH
800 5
30
BLAKE
2850 1
30
ALLEN
1600 2
30
TURNER
1500 3
30
WARD
1250 4
30
MARTIN
1250 5
30
JAMES
950 6
The DECODE in the outer query keeps only rows with
sequences 1, 2 or 3 and assigns them
to the correct "column". The GROUP BY gets rid of the redundant rows and we are left with
our collapsed result. It may be easier to understand if you see the resultset without the
aggregate function MAX grouped by deptno.
SELECT deptno,
DECODE(seq,1,ename,null) first,
DECODE(seq,2,ename,null) second,
DECODE(seq,3,ename,null) third
FROM (SELECT deptno, ename,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal
desc NULLS LAST) seq
FROM emp)
WHERE seq <= 3
/
DEPTNO
FIRST SECOND THIRD
---------- ---------- ---------- ----------
10 KING
10 CLARK
10
MILLER
20 SCOTT
20 FORD
20
JONES
30 BLAKE
30 ALLEN
30
TURNER
The MAX aggregate function will be applied by the
GROUP BY column DEPTNO. In any given DEPTNO above only one row will have a non-null value
for FIRST, the remaining rows in that group will always be NULL. The MAX function will
pick out the non-null row and keep that for us. Hence, the group by and MAX will collapse
our resultset, removing the NULL values from it and giving us what we want.
This new set of functionality holds some exiting possibilities. It opens up a whole new
way of looking at the data. It will remove a lot of procedural code and complex or
inefficient queries that would have taken a long tome to develop, to
achieve the same result.
Further articles about Analytic Functions can be
found in:
|