Window function syntax

<window_function>() OVER (
    [PARTITION BY <column>]
    [ORDER BY <column> [ASC|DESC]]
    [ROWS|RANGE BETWEEN <frame_specification>]
)

Interviewers often ask candidates to solve problems that involve:

  • Calculating running totals or moving averages
  • Ranking or assigning row numbers within groups
  • Comparing values to previous/next rows
  • Finding top N per group
  • Calculating percentiles or distributions

SQL window functions perform calculations across a set of rows related to the current row, which are often used with the OVER clause. Here’s an exhaustive list of SQL window functions, categorized by their functionality:

1. Ranking Functions

  • ROW_NUMBER(): Assigns a unique number to each row within the partition of a result set.
  • RANK(): Assigns a rank to each row within the partition of a result set, with gaps in ranking when there are ties.
  • DENSE_RANK(): Similar to RANK(), but without gaps in the ranking.
  • NTILE(n): Divides the result set into n roughly equal parts and assigns a bucket number to each row.

Example 1

Example 2

2. Aggregate Functions

  • Where_we_can_use_aggregate_functions

  • COUNT(): Counts the number of rows in a partition.

  • SUM(): Computes the sum of values in a partition.

  • AVG(): Calculates the average of values in a partition.

  • MIN(): Finds the minimum value in a partition.

  • MAX(): Finds the maximum value in a partition.

3. Value Functions

  • FIRST_VALUE(): Returns the first value in an ordered partition.
  • LAST_VALUE(): Returns the last value in an ordered partition.
  • NTH_VALUE(): Returns the nth value in an ordered partition.

4. Analytic Functions

  • LEAD(): Accesses the next row’s value in the partition, often used to compare current and subsequent rows.
  • LAG(): Accesses the previous row’s value in the partition, often used to compare current and prior rows.
  • CUME_DIST(): Computes the cumulative distribution of a value in a partition.
  • PERCENT_RANK(): Calculates the relative rank of a row within a partition, expressed as a percentage.
  • PERCENTILE_CONT(): Calculates the continuous percentile of a value in a partition.
  • PERCENTILE_DISC(): Calculates the discrete percentile of a value in a partition.
  • FLOOR(): Returns the largest integer less than or equal to a specified value.
  • CEIL(): Returns the smallest integer greater than or equal to a specified value.

5. Offset Functions

  • LEAD(): Retrieves the value of a subsequent row.
  • LAG(): Retrieves the value of a preceding row.

6. Windowing Functions

  • OVER(): Defines the window for the window function. It can be combined with partitioning (PARTITION BY) and ordering (ORDER BY).
  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Specifies a window range, starting from the first row in the partition to the current row.
  • ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING: Specifies a window range, starting from the current row to the last row in the partition.
  • ROWS BETWEEN <start> AND <end>: Specifies the start and end rows within a window.

7. Miscellaneous Functions

  • NTH_VALUE(): Returns the nth value of a specific column from a window.
  • GROUPS: Used in conjunction with GROUPING SETS or CUBE to compute additional rows.

These functions help to perform various types of analysis and calculations over sets of rows related to each row in the dataset. They can be used for advanced queries like running totals, moving averages, and complex rankings.

In SQL, window functions allow you to perform calculations across a specific range of rows relative to the current row. The RANGE clause in a window function specifies a range of rows based on a logical value (like the value of a column), not the physical position of rows. This makes it different from the ROWS clause, which considers the physical row position.

Syntax

<window_function>() OVER (
    PARTITION BY <partition_column>
    ORDER BY <order_column>
    RANGE BETWEEN <range_start> AND <range_end>
)

Key Points:

  1. Logical Values: RANGE works with the values of the ORDER BY column.
  2. Supported Keywords:
    • UNBOUNDED PRECEDING: Includes all rows from the start of the partition.
    • UNBOUNDED FOLLOWING: Includes all rows to the end of the partition.
    • CURRENT ROW: Includes the current row only.
    • <n> PRECEDING: Includes rows within n units before the current row.
    • <n> FOLLOWING: Includes rows within n units after the current row.

Example Scenarios

1. Cumulative Sum Up to Current Row

SELECT 
    customer_id,
    order_date,
    total_amount,
    SUM(total_amount) OVER (
        PARTITION BY customer_id 
        ORDER BY order_date 
        RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_sum
FROM orders;
  • Explanation: This calculates a running total of total_amount for each customer_id based on the order_date.

2. Sums Over a Range of Values

SELECT 
    product_id,
    sales_date,
    daily_sales,
    SUM(daily_sales) OVER (
        ORDER BY sales_date 
        RANGE BETWEEN 3 PRECEDING AND 3 FOLLOWING
    ) AS seven_day_window_sum
FROM sales;
  • Explanation: The SUM is calculated for a range of 7 days centered around the current row (3 days before to 3 days after the current row).

Difference Between RANGE and ROWS

ClauseBehavior
RANGEConsiders all rows with the same value in the ORDER BY column.
ROWSConsiders a specific number of physical rows, regardless of column value.

Example of RANGE vs ROWS:

For a table where ORDER BY column values are not unique:

-- RANGE example
SUM(salary) OVER (ORDER BY department RANGE BETWEEN 1000 PRECEDING AND 1000 FOLLOWING)
 
-- ROWS example
SUM(salary) OVER (ORDER BY department ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
  • RANGE: Includes all rows where the department value is within ±1000 of the current value.
  • ROWS: Includes the two rows physically before and after the current row.

Important Notes

  • RANGE requires the ORDER BY clause.
  • RANGE is typically used with numerical or date columns.
  • If rows have the same value in the ORDER BY column, all such rows are included in the calculation.

Would you like further clarification or examples?