Question

What was the first item from the menu purchased by each customer?

Solution

WITH orders_ranked_by_date AS (
    SELECT customer_id,
           product_id,
           order_date,
           DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
    FROM sales
)
SELECT orders_ranked_by_date.customer_id, menu.product_id, menu.product_name, orders_ranked_by_date.order_date
FROM orders_ranked_by_date
INNER JOIN menu ON menu.product_id = orders_ranked_by_date.product_id
WHERE rank = 1;
customer_idproduct_idproduct_nameorder_date
A1sushi2021-01-01
B2curry2021-01-01
C3ramen2021-01-01

Walkthrough

Let’s start by asking a simpler question.

What was the first item from the menu purchased by customer B?

SELECT customer_id, product_name, order_date
FROM sales
INNER JOIN menu m on sales.product_id = m.product_id
WHERE customer_id = 'B'
ORDER BY order_date
LIMIT 1;

Now, if we only had a specific customer, then this would be good enough, but the resulting dataset we want varies over customer_id and lists one such row per customer.

Essentially, we wish to select the first product purchased (by date) per customer in the group of orders made by that customer. This is an instance of the “Top N per group” problem with N=1, and the usual way we approach it is via assigning distinct ranks to all rows of the same group.

In this case, we assign ranks 1, 2, ... to the orders made by customer A with the lowest rank going to the earliest order_date. Then, we can pluck out the lowest ranked row out of all groups.

In practice, this is implemented using window functions (or DENSE_RANK() OVER (PARTITION BY <colA> ...), to be precise), where <colA> is a value shared by all rows within a group (i.e. customer_id in this case, since each customer forms their own group of orders).

SELECT customer_id,
        product_id,
        order_date,
        DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM sales
customer_idproduct_idorder_daterank
A12021-01-011
A22021-01-012
A22021-01-073
A32021-01-104
A32021-01-115
A32021-01-116
B22021-01-011
B22021-01-022
B12021-01-043
B12021-01-114
B32021-01-165
B32021-02-016
C32021-01-011
C32021-01-012
C32021-01-073

Notice that the first order for every customer corresponds to the row with the lowest rank (i.e. rank = 1) in that group. So, let’s filter that out, and with a Common Table Expression, join this data with menu to get the product names.

WITH orders_ranked_by_date AS (
    SELECT customer_id,
           product_id,
           order_date,
           DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
    FROM sales
)
SELECT orders_ranked_by_date.customer_id, menu.product_id, menu.product_name, orders_ranked_by_date.order_date
FROM orders_ranked_by_date
INNER JOIN menu ON menu.product_id = orders_ranked_by_date.product_id
WHERE rank = 1;
customer_idproduct_idproduct_nameorder_date
A1sushi2021-01-01
B2curry2021-01-01
C3ramen2021-01-01