Advanced SQL Queries: Leveraging Window Functions and Common Table Expressions (CTEs)

Introduction

Advanced SQL features such as Window Functions and Common Table Expressions (CTEs) provide powerful tools for performing complex data analysis and manipulation. These features simplify query logic, improve performance, and enable advanced analytics, making them indispensable for database professionals. This guide explores the use of window functions and CTEs to handle tasks such as ranking, aggregation, and processing hierarchical data.

Window Functions

Window Functions allow calculations across a set of table rows that are related to the current row. Unlike traditional aggregate functions, which collapse the result set into a single value per group, window functions return a value for every row in the result set, while still considering a specified "window" of rows.

Key Features:

  • PARTITION BY: Divides the data into subsets, or partitions, over which the window function is applied.
  • ORDER BY: Determines the order in which rows are processed within each partition.
  • Window Frame: Defines the range of rows used for the calculation relative to the current row (optional).

Common Window Functions:

  • ROW_NUMBER(): Assigns a unique row number to each row within a partition.
  • RANK(): Assigns a rank to each row, with gaps in ranks for tied values.
  • DENSE_RANK(): Similar to RANK() but with no gaps in ranking for tied values.
  • LEAD() and LAG(): Access data from the following (LEAD) or preceding (LAG) row relative to the current row.
  • SUM(), AVG(), COUNT(): Windowed versions of these aggregate functions.

Example 1: ROW_NUMBER() with PARTITION BY and ORDER BY

This example assigns row numbers to employees based on their department, ordered by salary within each department.


SELECT employee_id, department_id, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num FROM employees;
  • PARTITION BY: Divides the employees by department.
  • ORDER BY: Orders employees by salary within each department.
  • ROW_NUMBER(): Assigns a sequential row number to each employee within their department.

Example 2: RANK() to Find Top Salaries

This example ranks employees by salary within their departments. If two employees have the same salary, they receive the same rank.


SELECT employee_id, department_id, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank FROM employees;
  • RANK(): Assigns ranks based on salary. Ties receive the same rank, with gaps in the numbering.

Example 3: LEAD() and LAG()

This example compares an employee’s salary with the next and previous employee's salary in the order.

SELECT employee_id, salary, LAG(salary, 1, 0) OVER (ORDER BY salary) AS prev_salary, LEAD(salary, 1, 0) OVER (ORDER BY salary) AS next_salary FROM employees;
  • LAG(): Retrieves the previous row’s salary.
  • LEAD(): Retrieves the next row’s salary.
  • The third argument (0) is the default value if there is no preceding or following row.

Common Table Expressions (CTEs)

A Common Table Expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs make complex queries easier to read, write, and maintain by breaking them down into logical, manageable parts.

Key Features:

  • WITH Clause: Defines the CTE.
  • Recursive CTEs: Allow recursive operations, which are useful for querying hierarchical data like organizational trees.

Example 1: Simple CTE

This example finds employees with salaries higher than their department’s average.


WITH DepartmentAvg AS ( SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id ) SELECT e.employee_id, e.salary, da.avg_salary FROM employees e JOIN DepartmentAvg da ON e.department_id = da.department_id WHERE e.salary > da.avg_salary;
  • DepartmentAvg: CTE that calculates the average salary per department.
  • The main query selects employees with a salary higher than their department's average.

Example 2: Recursive CTE (Hierarchical Data)

This example finds all employees and their managers, organized hierarchically.


WITH RECURSIVE EmployeeHierarchy AS ( SELECT employee_id, manager_id, 1 AS level FROM employees WHERE manager_id IS NULL -- Start with top-level managers UNION ALL SELECT e.employee_id, e.manager_id, eh.level + 1 FROM employees e JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id ) SELECT * FROM EmployeeHierarchy;
  • Base Case: Selects top-level managers (those without a manager).
  • Recursive Case: Joins the employees table with the CTE to find employees reporting to managers found in the previous step.
  • The level column tracks the depth of the hierarchy.

Example 3: Multiple CTEs

This example defines multiple CTEs and uses them in the final query.


WITH SalesCTE AS ( SELECT salesperson_id, SUM(sales_amount) AS total_sales FROM sales GROUP BY salesperson_id ), TopSales AS ( SELECT salesperson_id, total_sales FROM SalesCTE WHERE total_sales > 50000 ) SELECT * FROM TopSales;
  • SalesCTE: Calculates total sales per salesperson.
  • TopSales: Filters out salespeople with total sales greater than 50,000.
  • The final query retrieves the top-performing salespeople.

Key Differences Between Window Functions and CTEs

Aspect 

Window Functions 

CTEs (Common Table Expressions) 

Purpose 

Perform calculations over a set of rows 

Define temporary result sets for complex queries 

Use Case 

Ranking, aggregation, running totals, comparisons 

Simplifying complex queries, hierarchical queries 

Result Set 

Returns all rows, with additional calculated columns 

Provides a temporary, reusable result set 

Scope 

Works within the context of a single query result set 

Can be referenced by multiple parts of a query 

Recursive Capabilities 

No recursion 

Supports recursive queries 


Conclusion

  • Window Functions enable sophisticated analytics over a set of rows without collapsing the result set, making them ideal for tasks like ranking, aggregation, and comparative analysis.
  • CTEs simplify complex SQL queries by breaking them into smaller, more manageable parts, and they are especially useful for handling hierarchical data and recursive operations.

Both window functions and CTEs are powerful tools for writing advanced SQL queries, offering flexibility and efficiency in data management and analysis.

Post a Comment

0 Comments