If you’re aspiring and currently interviewing for roles such as data scientists, data analysts, and data engineers then you are likely to encounter one or more technical interviews that require live coding, usually involving SQL. While later interviews might require different programming languages like Python, which is common in the data domain, let’s focus on the typical SQL questions that I’ve encountered during these interviews. For the purpose of this discussion, I’ll assume that you’re already familiar with fundamental SQL concepts such as SELECT
, FROM
, WHERE
, as well as aggregate functions like SUM
and COUNT
. Let’s get into the specifics!
1. Mastering Joins and Table Types
Without a doubt, the most common SQL question is around table joins. It might seem too obvious, but every interview I’ve participated in has centered around this topic. You should feel at ease with inner joins and left joins. Additionally, proficiency in handling self-joins and unions is valuable. Equally important is the ability to execute these joins across different table types, particularly fact and dimension tables. Here are my loose definitions for these two terms:
Fact Table: A table containing numerous rows but relatively few attributes or columns. Imagine an example where an online retailer maintains an “orders” table with columns like: date, customer_id, order_id, product_id, units, amount
. This table has few attributes but contains a huge volume of records.
Dimension Table: A dimensional table with fewer rows yet many attributes. For instance, the same online retailer’s “customer” table might hold one row per customer, featuring attributes such as customer_id, first_name, last_name, ship_street_addr, ship_zip_code
and more.
Understanding these two primary table types is important. It’s crucial to grasp why and how to merge fact and dimension tables to ensure accurate results. Let’s consider a real-world example: the interview question presents two tables (“orders” and “customer”) and asks:
How many customers have purchased at least 3 units in their lifetime and have a shipping zip code of 90210?