Window function syntax
<window_function>() OVER (
[PARTITION BY <column>]
[ORDER BY <column> [ASC|DESC]]
[ROWS|RANGE BETWEEN <frame_specification>]
)
Interviewers often ask candidates to solve problems that involve:
- Calculating running totals or moving averages
- Ranking or assigning row numbers within groups
- Comparing values to previous/next rows
- Finding top N per group
- Calculating percentiles or distributions
SQL window functions perform calculations across a set of rows related to the current row, which are often used with the OVER
clause. Here’s an exhaustive list of SQL window functions, categorized by their functionality:
1. Ranking Functions
ROW_NUMBER()
: Assigns a unique number to each row within the partition of a result set.RANK()
: Assigns a rank to each row within the partition of a result set, with gaps in ranking when there are ties.DENSE_RANK()
: Similar toRANK()
, but without gaps in the ranking.NTILE(n)
: Divides the result set inton
roughly equal parts and assigns a bucket number to each row.
Example 1
Example 2
2. Aggregate Functions
-
COUNT()
: Counts the number of rows in a partition. -
SUM()
: Computes the sum of values in a partition. -
AVG()
: Calculates the average of values in a partition. -
MIN()
: Finds the minimum value in a partition. -
MAX()
: Finds the maximum value in a partition.
3. Value Functions
FIRST_VALUE()
: Returns the first value in an ordered partition.LAST_VALUE()
: Returns the last value in an ordered partition.NTH_VALUE()
: Returns then
th value in an ordered partition.
4. Analytic Functions
LEAD()
: Accesses the next row’s value in the partition, often used to compare current and subsequent rows.LAG()
: Accesses the previous row’s value in the partition, often used to compare current and prior rows.CUME_DIST()
: Computes the cumulative distribution of a value in a partition.PERCENT_RANK()
: Calculates the relative rank of a row within a partition, expressed as a percentage.PERCENTILE_CONT()
: Calculates the continuous percentile of a value in a partition.PERCENTILE_DISC()
: Calculates the discrete percentile of a value in a partition.FLOOR()
: Returns the largest integer less than or equal to a specified value.CEIL()
: Returns the smallest integer greater than or equal to a specified value.
5. Offset Functions
LEAD()
: Retrieves the value of a subsequent row.LAG()
: Retrieves the value of a preceding row.
6. Windowing Functions
OVER()
: Defines the window for the window function. It can be combined with partitioning (PARTITION BY
) and ordering (ORDER BY
).ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
: Specifies a window range, starting from the first row in the partition to the current row.ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
: Specifies a window range, starting from the current row to the last row in the partition.ROWS BETWEEN <start> AND <end>
: Specifies the start and end rows within a window.
7. Miscellaneous Functions
NTH_VALUE()
: Returns then
th value of a specific column from a window.GROUPS
: Used in conjunction withGROUPING SETS
orCUBE
to compute additional rows.
These functions help to perform various types of analysis and calculations over sets of rows related to each row in the dataset. They can be used for advanced queries like running totals, moving averages, and complex rankings.
In SQL, window functions allow you to perform calculations across a specific range of rows relative to the current row. The RANGE
clause in a window function specifies a range of rows based on a logical value (like the value of a column), not the physical position of rows. This makes it different from the ROWS
clause, which considers the physical row position.
Syntax
<window_function>() OVER (
PARTITION BY <partition_column>
ORDER BY <order_column>
RANGE BETWEEN <range_start> AND <range_end>
)
Key Points:
- Logical Values:
RANGE
works with the values of theORDER BY
column. - Supported Keywords:
UNBOUNDED PRECEDING
: Includes all rows from the start of the partition.UNBOUNDED FOLLOWING
: Includes all rows to the end of the partition.CURRENT ROW
: Includes the current row only.<n> PRECEDING
: Includes rows withinn
units before the current row.<n> FOLLOWING
: Includes rows withinn
units after the current row.
Example Scenarios
1. Cumulative Sum Up to Current Row
SELECT
customer_id,
order_date,
total_amount,
SUM(total_amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_sum
FROM orders;
- Explanation: This calculates a running total of
total_amount
for eachcustomer_id
based on theorder_date
.
2. Sums Over a Range of Values
SELECT
product_id,
sales_date,
daily_sales,
SUM(daily_sales) OVER (
ORDER BY sales_date
RANGE BETWEEN 3 PRECEDING AND 3 FOLLOWING
) AS seven_day_window_sum
FROM sales;
- Explanation: The
SUM
is calculated for a range of 7 days centered around the current row (3 days before
to3 days after
the current row).
Difference Between RANGE
and ROWS
Clause | Behavior |
---|---|
RANGE | Considers all rows with the same value in the ORDER BY column. |
ROWS | Considers a specific number of physical rows, regardless of column value. |
Example of RANGE
vs ROWS
:
For a table where ORDER BY
column values are not unique:
-- RANGE example
SUM(salary) OVER (ORDER BY department RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING)
-- ROWS example
SUM(salary) OVER (ORDER BY department ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
RANGE
: Includes all rows where thedepartment
value is within ±1000 of the current value.ROWS
: Includes the two rows physically before and after the current row.
Important Notes
RANGE
requires theORDER BY
clause.RANGE
is typically used with numerical or date columns.- If rows have the same value in the
ORDER BY
column, all such rows are included in the calculation.
Would you like further clarification or examples?