SQL code copied to buffer
Learn how to use SQL SELF JOIN to find flight routes with one connection. Step-by-step guide with examples for efficient data analysis.
RU PT

Analyzing SQL Task: Finding Flights with One Layover

This task is an excellent example of using one of the most powerful tools in SQL for analyzing relationships in data: joining a table to itself (SELF JOIN). Let's break down the solution logic step by step.

Task Description

We need to find all possible flight options from Pulkovo Airport (LED) to Bryansk (BZK) that include exactly one layover.

Key Idea

A route with one layover is not one, but two independent flights connected by an intermediate airport.

  • Segment 1: Departure from LED to a certain layover airport C.
  • Segment 2: Departure from the same airport C and arrival at BZK.

Our goal is to find all such pairs of flights where the arrival airport of the first flight matches the departure airport of the second.

Technical Approach: SELF JOIN

To implement this logic, we refer to the flights table as if it were two different tables. We mentally create two aliases for it, for example, first_leg (first segment) and second_leg (second segment).

Step-by-Step Solution Plan

  1. Define the first segment of the route. We use the alias first_leg to find all flights that depart from our starting point.

    • Condition: first_leg.departure_airport = 'LED'.
  2. Define the second segment of the route. Using the alias second_leg, we look for all flights that arrive at our destination.

    • Condition: second_leg.arrival_airport = 'BZK'.
  3. Find the connection point (key JOIN condition). Now the most important thing: we need to connect these two sets of flights. We combine them on the condition that the arrival airport of the first segment must exactly match the departure airport of the second.

    • Connection condition: first_leg.arrival_airport = second_leg.departure_airport.
    • This common airport is our desired layover airport (connection_airport).
  4. Check the layover time: We also need to make sure that the second flight departs after the first flight arrives. This is ensured by the condition:

    • second_leg.scheduled_departure > first_leg.scheduled_arrival - This condition ensures that we only consider connections where there is enough time to make the transfer between flights.
  5. Form the result. After the tables are joined, we have all the necessary information in one row. We can select:

    • Departure airport from first_leg (departure_airport).
    • Layover airport (this is first_leg.arrival_airport or, equivalently, second_leg.departure_airport).
    • Destination airport from second_leg (arrival_airport).
  6. Sorting. In the end, we only need to sort the resulting list by the layover airport field, as required by the task condition.

Thus, SELF JOIN allows us to "unfold" one table and match rows from it with each other, finding complex, multi-stage routes.

Spoiler: the SQL query with the solution is hidden below. Click to reveal.

        select 
            first_leg.flight_no flight1_no,
            first_leg.departure_airport,
            first_leg.arrival_airport connection_airport,
            first_leg.scheduled_arrival - first_leg.scheduled_departure flight1_time,
            second_leg.scheduled_departure - first_leg.scheduled_arrival connection_time,
            second_leg.flight_no flight2_no,
            second_leg.arrival_airport,
            second_leg.scheduled_arrival - second_leg.scheduled_departure flight2_time,
            (second_leg.scheduled_arrival - first_leg.scheduled_departure) total_trip_time
        from flights first_leg
        join flights second_leg on first_leg.arrival_airport=second_leg.departure_airport 
            and second_leg.arrival_airport = 'BZK'
            and second_leg.scheduled_departure > first_leg.scheduled_arrival
        where first_leg.departure_airport = 'LED'
            and first_leg.scheduled_departure between '2017-08-16' and '2017-08-17'
            and second_leg.scheduled_departure between '2017-08-16' and '2017-08-17'
        order by (second_leg.scheduled_arrival - first_leg.scheduled_departure)
        limit 1;