Select Page

Using Multiple Common Table Expressions In One SQL Query In MySQL

Ben Nadel
Published: May 10, 2024

A couple of years ago, I upgraded this blog from MySQL 5.7.10 to MySQL 8.0.28. This gave me access to newer SQL features like Common Table Expressions (CTE) and JSON column types. But, my blog requires little more that CRUD (Create, Read, Update, Delete) operations; so, there’s not much of need for me to write advanced SQL statements.

At work, however, we recently upgraded to MySQL 8. And this finally gave me an opportunity to start playing around with more advanced concepts due to the large number of one-off reports that I have to run. A couple of weeks ago, I looked at using VALUES and ROW constructs to create a common table expression. And, as a follow-up to that, I just learned that you can have multiple CTEs within a single MySQL query. And, that these CTEs can reference each other. So freaking cool!

To demonstrate, I’m going to build-up a SQL query using common table expressions. First, we’ll start with a list of fake email addresses. This is something that I often have to do, copy-pasting them from a CSV (Comma Separated Values) file. The following CTE just gets the email addresses into a consumable format:

WITH
	emails ( `email` ) AS (
		VALUES
			ROW( 'laura@acme.example' ),
			ROW( 'tim@acme.example' ),
			ROW( 'anna@masters.example' ),
			ROW( 'sonya@vids.example' ),
			ROW( 'robert@acme.example' ),
			ROW( 'alice@vids.example' ),
			ROW( 'martha@chomp.example' ),
			ROW( 'douglas@chomp.example' )
	)

This creates a derived table, emails, with a single column, email. Common table expressions can be referenced by the main SQL statement; but, they can also be referenced by other CTEs in the same query. And that’s exactly what we’ll do next—create another CTE that builds upon the prior CTE and extracts the domain from each row:

WITH
	-- ... truncated SQL query ...
	deconstructed ( `email`, `domain` ) AS (
		SELECT
			e.email,
			SUBSTRING_INDEX( e.email, '@', -1 ) AS domain
		FROM
			emails e -- CTE reference!
	)

As you can see, this CTE is querying from the previous CTE, emails. Then, it’s using the SUBSTRING_INDEX() function to parse the domain out of each email, creating yet another derived table / CTE.

Next, we’ll use this new CTE to create another CTE which groups the emails by domain and records the COUNT() statistics:

WITH
	-- ... truncated SQL query ...
	stats ( `domain`, `emailCount` ) AS (
		SELECT
			d.domain,
			COUNT( * ) AS emailCount
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain
	)

As you can see, this CTE is querying from the previous CTE, deconstructed.

Next, we’ll create another CTE which again references the previous CTE, deconstructed. But, this time, instead of getting the count, we’ll collect the email addresses—per domain—into a JSON aggregation.

Note: I could have combined the following CTE with the previous CTE and used both the COUNT(*) and the JSON_ARRAYAGG() in the same query; but, splitting them up allowed me to explore the space a bit more.

WITH
	-- ... truncated SQL query ...
	aggregated ( `domain`, `collection` ) AS (
		SELECT
			d.domain,
			JSON_ARRAYAGG( d.email ) AS collection
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain
	)

So far, we’ve only created CTEs that reference other CTEs. But, these CTEs merely setup utility tables, they don’t actually return data to the client. Now it’s time to write the actual SQL that returns actual data. This SQL will combine multiple CTEs from above:

WITH
	-- ... truncated SQL query ...
SELECT
	s.domain,
	s.emailCount,
	a.collection
FROM
	stats s -- CTE reference!
INNER JOIN
	aggregated a -- CTE reference!
ON
	a.domain = s.domain
ORDER BY
	s.emailCount DESC,
	s.domain ASC
;

As you can see, we’re taking two of the CTEs, stats and aggregated, and we’re JOINing them together.

It’s just CTEs all the way down! We’ve looked at the individual parts in isolation. Now, here’s the entire SQL query in one query:

WITH
	-- First, we'll start with a common table expression (CTE) for the email addresses.
	emails ( `email` ) AS (
		VALUES
			ROW( 'laura@acme.example' ),
			ROW( 'tim@acme.example' ),
			ROW( 'anna@masters.example' ),
			ROW( 'sonya@vids.example' ),
			ROW( 'robert@acme.example' ),
			ROW( 'alice@vids.example' ),
			ROW( 'martha@chomp.example' ),
			ROW( 'douglas@chomp.example' )
	),
	-- Second, we'll create a common table expression (CTE) that extracts the domain from
	-- the email address.
	deconstructed ( `email`, `domain` ) AS (
		SELECT
			e.email,
			SUBSTRING_INDEX( e.email, '@', -1 ) AS domain
		FROM
			emails e -- CTE reference!
	),
	-- Third, we'll create a common table expression (CTE) that provides some stats for
	-- how many emails belong to each domain.
	stats ( `domain`, `emailCount` ) AS (
		SELECT
			d.domain,
			COUNT( * ) AS emailCount
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain
	),
	-- Fourth, we'll create a common table expression (CTE) that groups the emails by
	-- domain and provides the collection of emails as a JSON payload.
	aggregated ( `domain`, `collection` ) AS (
		SELECT
			d.domain,
			JSON_ARRAYAGG( d.email ) AS collection
		FROM
			deconstructed d -- CTE reference!
		GROUP BY
			d.domain
	)
SELECT
	s.domain,
	s.emailCount,
	a.collection
FROM
	stats s -- CTE reference!
INNER JOIN
	aggregated a -- CTE reference!
ON
	a.domain = s.domain
ORDER BY
	s.emailCount DESC,
	s.domain ASC
;

And, when we run this MySQL query, we get the following output:

Navicat SQL client showing a partial SQL query and the results of a JOIN product of multiple common table expressions in MySQL.

When it comes to writing basic business applications, I don’t need these sassy SQL mechanics. But, when it comes to reporting and other types of one-off data introspection tasks, features like VALUES/ROW, iteratively enhanced common table expressions, and JSON aggregations are just amazing! SQL is a truly wonderful language.

Want to use code from this post?
Check out the license.


https://bennadel.com/4652

Source: www.bennadel.com