Image by Author
In the world of data, SQL still stands as the lingua franca for interacting with databases.
Still today it stands as one of the most used languages to deal with data and is still considered a must-have for any good data professional.
However, anyone who has worked with complex SQL queries knows they can quickly turn into unwieldy beasts—difficult to read, maintain, or reuse.
This is why today it is not enough to know SQL, we need to be good at crafting queries. And this is actually a type of art.
This is where Common Table Expressions (CTEs) come into play, transforming the art of query writing into a more structured and approachable craft.
So let’s discover together how to code readable and reusable queries.
If you are wondering what is a CTE, you are in the right article.
A Common Table Expression (CTE) is a temporary result set that is defined within the execution scope of a single SQL statement.
They are temporal tables that can be referenced many times within a single query and typically are used to simplify complex joins and subqueries, with the ultimate goal of increasing the readability and organization of SQL code.
So they are powerful tool for breaking down complex queries into simpler parts.
Here’s why you should consider using CTEs:
- Modularity: You can break down complex logic into readable chunks.
- Readability: It makes understanding the flow of SQL queries easier.
- Reusability: CTEs can be referenced multiple times in a single query, avoiding repetition.
The magic begins with the WITH clause, which precedes your main query and defines different temporal tables (CTEs) with aliases.
Therefore, we always need to start our query with the “WITH” command to start defining out own CTEs. By using CTEs, we can break any complex SQL query into:
– Small temporal tables that compute related variables.
– A final table that only takes those variables that we want as our output.
And this is precisely the MODULAR approach we want in any code!
Image by Author
So using CTEs in our queries allows us to:
– Execute a temporal table ONCE and reference it MULTIPLE times.
– Improve readability and simplify complex logic.
– Promote code reusability and modular design.
In order to understand this better, we can take a practical example of Airbnb listings in Barcelona.
Imagine we want to analyze the performance of listings by neighborhood and compare it with the city’s overall performance. You’d need to pull together information about neighborhoods, individual apartments, hosts, and pricing.
To exemplify this, we will use the InsideAirbnb table of Barcelona, which looks like follows:
A naive approach might lead you to create nested subqueries that quickly become a maintenance nightmare like the one as follows:
Code by Author
Instead, we can utilize CTEs to compartmentalize our query into logical sections—each defining a piece of the puzzle.
- Neighborhood Data: Create a CTE to summarize data by neighborhood.
- Apartment and Host Information: Define CTEs for details about apartments and hosts.
- City-wide Metrics: Another CTE to gather city-level statistics for comparison.
- Final Assembly: Combine the CTEs in a final SELECT statement to present the data cohesively.
Image by Author
And we would end up with the following query:
Code by Author
By using CTEs, we turn a potentially monstrous single query into an organized set of data modules. This modular approach makes the SQL code more intuitive and adaptable to changes.
If new requirements emerge, you can adjust or add CTEs without overhauling the entire query.
Once you’ve established your CTEs, you can reuse them to perform comparative analysis. For instance, if you want to compare neighborhood data against city-wide metrics, you can reference your CTEs in a series of JOIN operations.
This not only saves time but also keeps your code efficient, as you won’t have to repeat the same query twice!
CTEs are a testament to the principle that a little structure goes a long way in programming. By adopting CTEs, you can write clearer, more maintainable, and reusable SQL queries.
It streamlines the query development process and makes it easier to communicate complex data retrieval logic to others.
Remember, the next time you find yourself about to embark on writing a multi-join, nested subquery monster, consider breaking it down with CTEs.
Your future self—and anyone else who might read your code—will thank you.
Josep Ferrer is an analytics engineer from Barcelona. He graduated in physics engineering and is currently working in the Data Science field applied to human mobility. He is a part-time content creator focused on data science and technology. You can contact him on LinkedIn, Twitter or Medium.