Lesson 10.2 · Reading time: ~10 min
This lesson introduces the fundamentals of writing high-performance SQL queries. You will learn how to avoid unnecessary database load, why SELECT * often hurts performance, and how to filter data effectively. We will review practical techniques that help queries run faster, even on large datasets. By the end of this lesson, you will be able to write efficient SQL that uses server resources responsibly.
In the previous lesson, we focused on readability for people. But SQL must also be readable and efficient for the database engine. Even perfectly formatted code can perform poorly if it forces the server to do unnecessary work.
Query efficiency directly affects application and reporting speed. In large-data or high-load systems, the difference between "works" and "optimized" can be minutes or even hours of runtime.
Modern DBMS engines have powerful optimizers that can rewrite queries under the hood. Still, an optimizer is not all-powerful. It does not know your business intent and cannot always fix fundamentally inefficient design choices. Code quality remains the developer's responsibility.
The most common reason for slow queries is moving too much unnecessary data between the database server and the client.
SELECT *While SELECT * is convenient for quick exploration, avoid it in production SQL.
-- Poor
SELECT * FROM film;
-- Better
SELECT film_id, title, release_year
FROM film;
How you limit rows determines how much data the DBMS must scan and process.
Apply WHERE conditions as early as possible, before heavy aggregation or return to clients. The earlier you reduce rows, the faster downstream steps (JOIN, GROUP BY) become.
WHERE (SARGable queries)To use indexes effectively, WHERE predicates should be SARGable (Search ARGumentable). If you wrap an indexed column in a function, the optimizer often cannot use the index efficiently and may scan the full table.
-- Slow (Non-SARGable: index on rental_date may not be used)
SELECT count(*)
FROM rental
WHERE YEAR(rental_date) = 2005;
-- Fast (SARGable: index can be used)
SELECT count(*)
FROM rental
WHERE rental_date >= '2005-01-01' AND rental_date < '2006-01-01';
JOINJoining tables is one of the most resource-intensive query operations.
CROSS JOIN: Cartesian products can explode quickly.EXISTS for existence checks: if you only need to know whether related rows exist, EXISTS is often cheaper than JOIN.-- Less efficient (JOIN forces matching through all payments)
SELECT DISTINCT c.first_name, c.last_name
FROM customer c
JOIN payment p ON c.customer_id = p.customer_id;
-- More efficient (EXISTS can stop on first match)
SELECT c.first_name, c.last_name
FROM customer c
WHERE EXISTS (
SELECT 1 FROM payment p WHERE p.customer_id = c.customer_id
);
LIMIT while testingWhen debugging a query, always use LIMIT to avoid accidentally returning millions of rows.
SELECT customer_id, first_name, last_name
FROM customer
WHERE active = 1
LIMIT 10;
Assume we need films rented more than 30 times, limited to one category.
Less efficient approach:
SELECT f.title, COUNT(r.rental_id)
FROM film f
JOIN inventory i ON f.film_id = i.film_id
JOIN rental r ON i.inventory_id = r.inventory_id
JOIN film_category fc ON f.film_id = fc.film_id
JOIN category c ON fc.category_id = c.category_id
WHERE c.name = 'Action'
GROUP BY f.title
HAVING COUNT(r.rental_id) > 30;
More efficient approach: If we know the category ID, we can skip joining the category-name lookup.
SELECT f.title, COUNT(r.rental_id) AS rental_count
FROM film f
JOIN film_category fc ON f.film_id = fc.film_id
JOIN inventory i ON f.film_id = i.film_id
JOIN rental r ON i.inventory_id = r.inventory_id
WHERE fc.category_id = 1 -- Use ID instead of string lookup
GROUP BY f.film_id, f.title
HAVING COUNT(r.rental_id) > 30;
Note: filtering by numeric IDs is usually faster than filtering by text names and often allows fewer joins.
Key takeaways from this lesson:
SELECT * in production queries; list only needed columns.WHERE.EXISTS over JOIN for pure existence checks.LIMIT during exploration and debugging.SELECT * harmful in production queries?It returns unnecessary columns, increases traffic, and may block index-friendly plans. Explicit column lists are typically faster and safer.
A SARGable predicate allows index-based search. Wrapping indexed columns in functions often prevents efficient index use.
EXISTS instead of JOIN?Use EXISTS when you only need to know whether related rows exist and you do not need columns from the second table.
Check whether it uses SELECT *, review WHERE selectivity, and identify non-SARGable predicates. Then inspect join strategy and row volume early in the query.
It reduces the number of rows involved in joins, sorting, and aggregation, lowering the overall plan cost.
JOIN and EXISTS differ from a performance perspective?JOIN combines row sets and is required when you need columns from both sides. EXISTS is often faster for boolean existence checks because it can stop at first match.
In the next lesson, we will go deeper into execution analysis and see how indexes accelerate queries at the physical level.