Image by Author
Strong database and SQL skills are necessary for all data roles. In practice, you’ll query super large database tables—with several thousands or even millions of rows—on a typical day at work. Which is why the performance of SQL queries becomes a significant factor in deciding the overall performance of the application.
Poorly optimized queries can often lead to slower response times, increased server load, and a suboptimal user experience. Therefore, understanding and applying SQL query optimization techniques is essential.
This tutorial goes over practical tips for optimizing SQL queries. Let’s get started.
Before You Start: Get a Sample Database Table
You can use the following tips when writing SQL queries for any database you’re working with. But if you’d like to use a sample database table to run these queries, you can use this Python script.
It connects to an SQLite database: employees.db, creates an employees table and populates it with 10000 records. As mentioned, you can always spin up your own example.
1. Don’t Use SELECT *; Select Specific Columns Instead
It’s quite common for beginners to use SELECT * to retrieve all columns from the table. This can be inefficient if you only need a few columns—which is almost always the case.
Using SELECT * can, therefore, lead to excessive data processing, especially if the table has many columns or if you’re working with a large dataset.
Instead of this:
Do this:
SELECT employee_id, first_name, last_name FROM employees;
Reading only the necessary columns can make the queries more readable and maintainable.
2. Avoid Using SELECT DISTINCT; Use GROUP BY Instead
SELECT DISTINCT can be costly because it requires sorting and filtering the results to remove duplicates. It’s better to ensure that the data being queried is unique by design—using primary keys or unique constraints.
Instead of this:
SELECT DISTINCT department FROM employees;
The following query with the GROUP BY clause is much more helpful:
SELECT department FROM employees GROUP BY department;
GROUP BY can be more efficient, especially with proper indexing (we’ll talk about indexes later). So when writing queries, ensure you understand your data—the different fields—at the data model level.
3. Limit Query Results
Often you’ll query large tables with thousands of rows, but you don’t always need to (and cannot) process all the rows. Using the LIMIT clause (or its equivalent) helps to reduce the number of rows returned, which can speed up query performance.
You can limit the results to 15 records:
SELECT employee_id, first_name, last_name FROM employees LIMIT 15;
Using a LIMIT clause reduces the result set size, decreasing the amount of data to process and transfer. This is also useful for paginating results in applications.
4. Use Indexes for Faster Retrieval
Indexes can significantly improve query performance by allowing the database to find rows faster than scanning the entire table. They are particularly useful for columns frequently used in WHERE, JOIN, and ORDER BY clauses.
Here’s an example index created on the ‘department’ column:
CREATE INDEX idx_employee_department ON employees(department);
You can now run queries that involve filtering on the ‘department’ column and compare the execution times. You should be able to see the results are much faster with the index. To learn more about creating indexes and performance improvements, use How To Speed Up SQL Queries Using Indexes [Python Edition].
As mentioned, indexing improves the efficiency of queries that filter on indexed columns. But creating too many indexes can become too much of a good thing. Which leads us to the next tip!
5. Use Indexes with Caution
While indexes improve read performance, they can degrade write performance—INSERT, UPDATE, and DELETE queries—because the index must be updated each time the table is modified. It’s important to balance the number and types of indexes based on the type of queries you run often.
As go-to rules:
- Only index columns that are frequently queried.
- Avoid excessive indexing on columns with low cardinality (few unique values)
- Regularly check indexes and update and remove them as needed.
In summary, create indexes to speed up retrieval on columns that are frequently queried but rarely updated. This ensures that the benefits of indexes outweigh their maintenance costs.
Wrapping Up
Optimizing SQL queries involves understanding the specific needs of your queries and the structure of your data.
By avoiding SELECT *, being careful with using SELECT DISTINCT, limiting query results, creating appropriate indexes, and being mindful of the trade-offs with indexing, you can significantly improve the performance and efficiency of your database operations.
So happy querying!
Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.