Window Functions
Table of Contents
Definition
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities and they all are displayed in the query’s output.
Example: Sales Orders Table
Let’s analyze (artificial) sales orders data (Table 1). The table has the following attributes:
order_id
: the id of the sales ordersales_value_thsd
: the total amount of the sales order, in thousands of dollarspoint_of_sale
: represents the store where the sale took placedate
: date of the transaction,YYYY-MM-DD
Table0. Sales Orders Table
order_id | sales_value_thsd | point_of_sale | date |
---|---|---|---|
1 | 6080.25 | 1 | 2019-11-29 |
2 | 8175.90 | 2 | 2019-11-29 |
3 | 8175.00 | 2 | 2020-05-25 |
4 | 2199.00 | 5 | 2020-06-29 |
5 | 3970.100 | 5 | 2020-07-29 |
6 | 3299.33 | 3 | 2020-10-29 |
7 | 2088.75 | 1 | 2020-11-29 |
8 | 5299.10 | 1 | 2021-01-29 |
9 | 1199.00 | 5 | 2022-12-22 |
Query1: Running Sum/Total Across All Orders by Date (No Partitions)
--running sum across all orders by date
SELECT date, sales_value_thsd, SUM(sales_value_thsd) OVER
(ORDER BY date ASC ROWS UNBOUNDED PRECEDING) as running_sum
-- UNBOUNDED PRECEDING is the default
FROM sales_orders;
Table1. Running Sum/Total without Partitions
date | sales_value_thsd | running_sum |
---|---|---|
2019-11-29 | 6080.25 | 6080.25 |
2019-11-29 | 8175.90 | 14256.15 |
2020-05-25 | 8175.00 | 22431.15 |
2020-06-29 | 2199.00 | 24630.15 |
2020-07-29 | 3970.100 | 28600.250 |
2020-10-29 | 3299.33 | 31899.580 |
2020-11-29 | 2088.75 | 33988.330 |
2021-01-29 | 5299.10 | 39287.430 |
2022-12-22 | 1199.00 | 40486.430 |
Query2: Ranking Largest Orders Partitioning by point_of_sale
--ranking largest orders by value_thsd partitioning by point_of_sale
SELECT point_of_sale, order_id, sales_value_thsd, RANK() OVER
(PARTITION BY point_of_sale ORDER BY sales_value_thsd DESC)
FROM sales_orders;
Table2. Value Rank by Partition (descending)
point_of_sale | order_id | sales_value_thsd | rank |
---|---|---|---|
1 | 1 | 6080.25 | 1 |
1 | 8 | 5299.10 | 2 |
1 | 7 | 2088.75 | 3 |
2 | 2 | 8175.90 | 1 |
2 | 3 | 8175.00 | 2 |
3 | 6 | 3299.33 | 1 |
5 | 5 | 3970.100 | 1 |
5 | 4 | 2199.00 | 2 |
5 | 9 | 1199.00 | 3 |
As shown here, the RANK
function produces a numerical rank for each distinct ORDER BY
value in the current row’s partition, using the order defined by the ORDER BY
clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER
clause.
The rows considered by a window function are those of the “virtual table” produced by the query’s FROM
clause as filtered by its WHERE
, GROUP BY
, and HAVING
clauses if any. For example, a row removed because it does not meet the WHERE
condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER
clauses, but they all act on the same collection of rows defined by this virtual table.
Query3: Average Sales Partitioning by point_of_sale
--average sales_value_thsd by point of sale
SELECT point_of_sale, sales_value_thsd, round(avg_sales_thsd, 2) AS pos_avg
FROM (
SELECT point_of_sale, sales_value_thsd, avg(sales_value_thsd) OVER
(partition by point_of_sale) as avg_sales_thsd
FROM sales_orders
) as subq;
Table3. Average Sales by Partition
point_of_sale | sales_value_thsd | pos_avg |
---|---|---|
1 | 2088.75 | 4489.37 |
1 | 5299.10 | 4489.37 |
1 | 6080.25 | 4489.37 |
2 | 8175.90 | 8175.45 |
2 | 8175.00 | 8175.45 |
3 | 3299.33 | 3299.33 |
5 | 1199.00 | 2456.03 |
5 | 2199.00 | 2456.03 |
5 | 3970.100 | 2456.03 |
Query4: Average Sales and RunningSum Partitioning by point_of_sale
--average and running sum of sales by point_of_sale
SELECT
point_of_sale,
date,
order_id,
sales_value_thsd,
round(running_sum, 2) AS running_sum,
round(running_avg, 2) AS running_avg
FROM (
SELECT
point_of_sale,
date,
order_id,
sales_value_thsd,
sum(sales_value_thsd) OVER w AS running_sum,
avg(sales_value_thsd) OVER w AS running_avg
FROM sales_orders
WINDOW w AS (PARTITION BY point_of_sale ORDER BY date ASC)
) AS subq;
Table4. Two Window Functions (sum
and avg
) Over the Same Window
point_of_sale | date | order_id | sales_value_thsd | running_sum | running_avg |
---|---|---|---|---|---|
1 | 2019-11-29 | 1 | 6080.25 | 6080.25 | 6080.25 |
1 | 2020-11-29 | 7 | 2088.75 | 8169.00 | 4084.50 |
1 | 2021-01-29 | 8 | 5299.10 | 13468.10 | 4489.37 |
2 | 2019-11-29 | 2 | 8175.90 | 8175.90 | 8175.90 |
2 | 2020-05-25 | 3 | 8175.00 | 16350.90 | 8175.45 |
3 | 2020-10-29 | 6 | 3299.33 | 3299.33 | 3299.33 |
5 | 2020-06-29 | 4 | 2199.00 | 2199.00 | 2199.00 |
5 | 2020-07-29 | 5 | 3970.100 | 6169.10 | 3084.55 |
5 | 2022-12-22 | 9 | 1199.00 | 7368.10 | 2456.03 |
Window Function Terminology
A window function call always contains an OVER
clause directly following the window function’s name and argument(s). This is what syntactically distinguishes it from a normal function or non-window aggregate. The OVER
clause determines exactly how the rows of the query are split up for processing by the window function.
The PARTITION BY
clause within OVER
divides the rows into groups, or partitions, that share the same values of the PARTITION BY
expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.
You can also control the order in which rows are processed by window functions using ORDER BY
within OVER
(the window ORDER BY
does not even have to match the order in which the rows are output).
Window functions are permitted only in the SELECT
list and the ORDER BY
clause of the query. They are forbidden elsewhere, such as in GROUP BY
, HAVING
and WHERE
clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa. If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select (see Query3, which uses a sub-select to execute a window function and an outer select to round it).
The frame_clause
specifies the set of rows constituting the window frame, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition. The set of rows in the frame can vary depending on which row is the current row. The frame can be specified in RANGE
, ROWS
or GROUPS
mode; in each case, it runs from the frame_start
to the frame_end
. If frame_end
is omitted, the end defaults to CURRENT ROW
.
A frame_start
of UNBOUNDED PRECEDING
means that the frame starts with the first row of the partition, and similarly a frame_end of UNBOUNDED FOLLOWING
means that the frame ends with the last row of the partition. The default framing option is RANGE UNBOUNDED PRECEDING
, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. Where BETWEEN
is not specified, the frame is implicitly bounded by the current row.
With ORDER BY
, this sets the frame to be all rows from the partition start up through the current row’s last ORDER BY
peer. Without ORDER BY
, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.
Window Function Processing
When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY
and ORDER BY
clauses in their window definitions are guaranteed to be evaluated in a single pass over the data. Therefore they will see the same sort ordering, even if the ORDER BY
does not uniquely determine an ordering. However, no guarantees are made about the evaluation of functions having different PARTITION BY
or ORDER BY
specifications. (In such cases a sort step is typically required between the passes of window function evaluations, and the sort is not guaranteed to preserve ordering of rows that its ORDER BY
sees as equivalent.)
Currently, window functions always require presorted data, and so the query output will be ordered according to one or another of the window functions' PARTITION BY
/ORDER BY
clauses. It is not recommended to rely on this, however. Use an explicit top-level ORDER BY
clause if you want to be sure the results are sorted in a particular way.
Table of General-Purpose Window Functions
function | description |
---|---|
row_number () → bigint | Returns the number of the current row within its partition, counting from 1. |
rank () → bigint | Returns the rank of the current row, with gaps; that is, the row_number of the first row in its peer group. |
dense_rank () → bigint | Returns the rank of the current row, without gaps; this function effectively counts peer groups. |
percent_rank () → double precision | Returns the relative rank of the current row, that is (rank - 1) / (total partition rows - 1). The value thus ranges from 0 to 1 inclusive. |
cume_dist () → double precision | Returns the cumulative distribution, that is (number of partition rows preceding or peers with current row) / (total partition rows). The value thus ranges from 1/N to 1. |
ntile ( num_buckets integer ) → integer | Returns an integer ranging from 1 to the argument value, dividing the partition as equally as possible. |
lag ( value anycompatible [, offset integer [, default anycompatible ]] ) → anycompatible | Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default (which must be of a type compatible with value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL. |
lead ( value anycompatible [, offset integer [, default anycompatible ]] ) → anycompatible | Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default (which must be of a type compatible with value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL. |
first_value ( value anyelement ) → anyelement | Returns value evaluated at the row that is the first row of the window frame. |
last_value ( value anyelement ) → anyelement | Returns value evaluated at the row that is the last row of the window frame. |
nth_value ( value anyelement, n integer ) → anyelement | Returns value evaluated at the row that is the n’th row of the window frame (counting from 1); returns NULL if there is no such row. |
Bibliography
- PostgreSQL 14 Docs - Chapter 3.5 – Window Functions
- PostgreSQL 14 Docs - Chapter 4.2.8 – Window Function Calls
- PostgreSQL 14 Docs - Chapter 7.2 – Window Function Processing
- PostgreSQL 14 Docs - Chapter 9.22 – [List] of Window Functions