Window Functions

August 2021

Table of Contents

Definition

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities and they all are displayed in the query’s output.

Example: Sales Orders Table

Let’s analyze (artificial) sales orders data (Table 1). The table has the following attributes:

order_id: the id of the sales order
sales_value_thsd: the total amount of the sales order, in thousands of dollars
point_of_sale: represents the store where the sale took place
date: date of the transaction, YYYY-MM-DD

Table0. Sales Orders Table

order_id	sales_value_thsd	point_of_sale	date
1	6080.25	1	2019-11-29
2	8175.90	2	2019-11-29
3	8175.00	2	2020-05-25
4	2199.00	5	2020-06-29
5	3970.100	5	2020-07-29
6	3299.33	3	2020-10-29
7	2088.75	1	2020-11-29
8	5299.10	1	2021-01-29
9	1199.00	5	2022-12-22

Query1: Running Sum/Total Across All Orders by Date (No Partitions)

--running sum across all orders by date
SELECT date, sales_value_thsd, SUM(sales_value_thsd) OVER
	(ORDER BY date ASC ROWS UNBOUNDED PRECEDING) as running_sum
    -- UNBOUNDED PRECEDING is the default
FROM sales_orders;

Table1. Running Sum/Total without Partitions

date	sales_value_thsd	running_sum
2019-11-29	6080.25	6080.25
2019-11-29	8175.90	14256.15
2020-05-25	8175.00	22431.15
2020-06-29	2199.00	24630.15
2020-07-29	3970.100	28600.250
2020-10-29	3299.33	31899.580
2020-11-29	2088.75	33988.330
2021-01-29	5299.10	39287.430
2022-12-22	1199.00	40486.430

Query2: Ranking Largest Orders Partitioning by point_of_sale

--ranking largest orders by value_thsd partitioning by point_of_sale
SELECT point_of_sale, order_id, sales_value_thsd, RANK() OVER
	(PARTITION BY point_of_sale ORDER BY sales_value_thsd DESC)
FROM sales_orders;

Table2. Value Rank by Partition (descending)

point_of_sale	order_id	sales_value_thsd	rank
1	1	6080.25	1
1	8	5299.10	2
1	7	2088.75	3
2	2	8175.90	1
2	3	8175.00	2
3	6	3299.33	1
5	5	3970.100	1
5	4	2199.00	2
5	9	1199.00	3

As shown here, the RANK function produces a numerical rank for each distinct ORDER BY value in the current row’s partition, using the order defined by the ORDER BY clause. rank needs no explicit parameter, because its behavior is entirely determined by the OVER clause.

The rows considered by a window function are those of the “virtual table” produced by the query’s FROM clause as filtered by its WHERE, GROUP BY, and HAVING clauses if any. For example, a row removed because it does not meet the WHERE condition is not seen by any window function. A query can contain multiple window functions that slice up the data in different ways using different OVER clauses, but they all act on the same collection of rows defined by this virtual table.

Query3: Average Sales Partitioning by point_of_sale

--average sales_value_thsd by point of sale
SELECT point_of_sale, sales_value_thsd, round(avg_sales_thsd, 2) AS pos_avg
FROM (
    SELECT point_of_sale, sales_value_thsd, avg(sales_value_thsd)  OVER
        (partition by point_of_sale) as avg_sales_thsd
    FROM sales_orders
) as subq;

Table3. Average Sales by Partition

point_of_sale	sales_value_thsd	pos_avg
1	2088.75	4489.37
1	5299.10	4489.37
1	6080.25	4489.37
2	8175.90	8175.45
2	8175.00	8175.45
3	3299.33	3299.33
5	1199.00	2456.03
5	2199.00	2456.03
5	3970.100	2456.03

Query4: Average Sales and RunningSum Partitioning by point_of_sale

--average and running sum of sales by point_of_sale
SELECT
	point_of_sale,
	date,
	order_id,
	sales_value_thsd,
	round(running_sum, 2) AS running_sum,
	round(running_avg, 2) AS running_avg
FROM (
	SELECT
		point_of_sale,
		date,
		order_id,
		sales_value_thsd,
		sum(sales_value_thsd) OVER w AS running_sum,
		avg(sales_value_thsd) OVER w AS running_avg
	FROM sales_orders
	WINDOW w AS (PARTITION BY point_of_sale	ORDER BY date ASC)
) AS subq;

Table4. Two Window Functions (sum and avg) Over the Same Window

point_of_sale	date	order_id	sales_value_thsd	running_sum	running_avg
1	2019-11-29	1	6080.25	6080.25	6080.25
1	2020-11-29	7	2088.75	8169.00	4084.50
1	2021-01-29	8	5299.10	13468.10	4489.37
2	2019-11-29	2	8175.90	8175.90	8175.90
2	2020-05-25	3	8175.00	16350.90	8175.45
3	2020-10-29	6	3299.33	3299.33	3299.33
5	2020-06-29	4	2199.00	2199.00	2199.00
5	2020-07-29	5	3970.100	6169.10	3084.55
5	2022-12-22	9	1199.00	7368.10	2456.03

Window Function Terminology

A window function call always contains an OVER clause directly following the window function’s name and argument(s). This is what syntactically distinguishes it from a normal function or non-window aggregate. The OVER clause determines exactly how the rows of the query are split up for processing by the window function.

The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.

You can also control the order in which rows are processed by window functions using ORDER BY within OVER (the window ORDER BY does not even have to match the order in which the rows are output).

Window functions are permitted only in the SELECT list and the ORDER BY clause of the query. They are forbidden elsewhere, such as in GROUP BY, HAVING and WHERE clauses. This is because they logically execute after the processing of those clauses. Also, window functions execute after non-window aggregate functions. This means it is valid to include an aggregate function call in the arguments of a window function, but not vice versa. If there is a need to filter or group rows after the window calculations are performed, you can use a sub-select (see Query3, which uses a sub-select to execute a window function and an outer select to round it).

The frame_clause specifies the set of rows constituting the window frame, which is a subset of the current partition, for those window functions that act on the frame instead of the whole partition. The set of rows in the frame can vary depending on which row is the current row. The frame can be specified in RANGE, ROWS or GROUPS mode; in each case, it runs from the frame_start to the frame_end. If frame_end is omitted, the end defaults to CURRENT ROW.

A frame_start of UNBOUNDED PRECEDING means that the frame starts with the first row of the partition, and similarly a frame_end of UNBOUNDED FOLLOWING means that the frame ends with the last row of the partition. The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Where BETWEEN is not specified, the frame is implicitly bounded by the current row.

With ORDER BY, this sets the frame to be all rows from the partition start up through the current row’s last ORDER BY peer. Without ORDER BY, this means all rows of the partition are included in the window frame, since all rows become peers of the current row.

Window Function Processing

When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data. Therefore they will see the same sort ordering, even if the ORDER BY does not uniquely determine an ordering. However, no guarantees are made about the evaluation of functions having different PARTITION BY or ORDER BY specifications. (In such cases a sort step is typically required between the passes of window function evaluations, and the sort is not guaranteed to preserve ordering of rows that its ORDER BY sees as equivalent.)

Currently, window functions always require presorted data, and so the query output will be ordered according to one or another of the window functions' PARTITION BY/ORDER BY clauses. It is not recommended to rely on this, however. Use an explicit top-level ORDER BY clause if you want to be sure the results are sorted in a particular way.

Table of General-Purpose Window Functions

function	description
row_number () → bigint	Returns the number of the current row within its partition, counting from 1.
rank () → bigint	Returns the rank of the current row, with gaps; that is, the row_number of the first row in its peer group.
dense_rank () → bigint	Returns the rank of the current row, without gaps; this function effectively counts peer groups.
percent_rank () → double precision	Returns the relative rank of the current row, that is (rank - 1) / (total partition rows - 1). The value thus ranges from 0 to 1 inclusive.
cume_dist () → double precision	Returns the cumulative distribution, that is (number of partition rows preceding or peers with current row) / (total partition rows). The value thus ranges from 1/N to 1.
ntile ( num_buckets integer ) → integer	Returns an integer ranging from 1 to the argument value, dividing the partition as equally as possible.
lag ( value anycompatible [, offset integer [, default anycompatible ]] ) → anycompatible	Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default (which must be of a type compatible with value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.
lead ( value anycompatible [, offset integer [, default anycompatible ]] ) → anycompatible	Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default (which must be of a type compatible with value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to NULL.
first_value ( value anyelement ) → anyelement	Returns value evaluated at the row that is the first row of the window frame.
last_value ( value anyelement ) → anyelement	Returns value evaluated at the row that is the last row of the window frame.
nth_value ( value anyelement, n integer ) → anyelement	Returns value evaluated at the row that is the n’th row of the window frame (counting from 1); returns NULL if there is no such row.

Bibliography

PostgreSQL 14 Docs - Chapter 3.5 – Window Functions
PostgreSQL 14 Docs - Chapter 4.2.8 – Window Function Calls
PostgreSQL 14 Docs - Chapter 7.2 – Window Function Processing
PostgreSQL 14 Docs - Chapter 9.22 – [List] of Window Functions

Rafael Marino