8 Week SQL Challenge

We’ll try to implement solutions for some of the case studies from 8 Week SQL Challenge.

Source Repo

Setting up

If you wish to follow along these solutions/walkthroughs, you might find the best value in running them as you go, and for that you need a working Postgres database environment.

We’ll use a Docker container for Postgres so we don’t have to install it locally.

First, clone the repository to get the schemas for the challenges into a schema/ directory.

git clone https://github.com/aalekhpatel07/8-week-sql-challenge && \
    cd 8-week-sql-challenge

Then, run the following command to start a Postgres14 docker container that includes the schemas for the challenges.

docker run \
  --name "8-week-sql-challenge" \
  --publish "5432:5432" \
  --env POSTGRES_USER=postgres \
  --env POSTGRES_PASSWORD=postgres \
  --volume `pwd`/postgres_db:/var/lib/postgresql/data \
  --volume `pwd`/schema:/docker-entrypoint-initdb.d/ \
  --rm \
  --detach \
  postgres:14.1-alpine

Once the container has started, use your favourite IDE to connect to the database at 0.0.0.0:5432.

The database has user/pass of postgres/postgres and the URL is jdbc:postgresql://localhost:5432/.

Danny’s Diner

Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.

Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business.

Problem Statement

Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.

He plans on using these insights to help him decide whether he should expand the existing customer loyalty program - additionally he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.

Danny has provided you with a sample of his overall customer data due to privacy issues - but he hopes that these examples are enough for you to write fully functioning SQL queries to help him answer his questions!

Danny has shared with you 3 key datasets for this case study:

  • sales
  • menu
  • members

Entity Relationship Diagram

Check the following diagram for a quick preview or access the interactive version at dbdiagram.io:

Schema

Full DDL is available here.

Case Study Questions

Each of the following case study questions can be answered using a single SQL statement:

  1. What is the total amount each customer spent at the restaurant?
  2. How many days has each customer visited the restaurant?
  3. What was the first item from the menu purchased by each customer?
  4. What is the most purchased item on the menu and how many times was it purchased by all customers?
  5. Which item was the most popular for each customer?
  6. Which item was purchased first by the customer after they became a member?
  7. Which item was purchased just before the customer became a member?
  8. What is the total items and amount spent for each member before they became a member?
  9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
  10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

Question

What is the total amount each customer spent at the restaurant?

Solution

SELECT customer_id, '$' || SUM(price) as total_amount_spent
FROM sales
INNER JOIN menu on sales.product_id = menu.product_id
GROUP BY customer_id;
customer_idtotal_amount_spent
B$74
C$36
A$76

Walkthrough

Let’s start by asking who has visited our restaurant so far?

SELECT DISTINCT(customer_id) 
FROM sales;
customer_id
B
C
A

Okay, what kinda purchases were made by A ?

SELECT *
FROM sales 
WHERE customer_id = 'A';
customer_idorder_dateproduct_id
A2021-01-011
A2021-01-012
A2021-01-072
A2021-01-103
A2021-01-113
A2021-01-113

Nice. Can we get the prices for these products next to their sale records?

SELECT *
FROM sales
INNER JOIN menu ON sales.product_id = menu.product_id
WHERE customer_id = 'A';
customer_idorder_dateproduct_idproduct_idproduct_nameprice
A2021-01-0111sushi10
A2021-01-0722curry15
A2021-01-0122curry15
A2021-01-1133ramen12
A2021-01-1133ramen12
A2021-01-1033ramen12

Okay, so the total amount that customer A spent must be the sum of all these prices. There’s a SUM aggregator we could use here.

SELECT SUM(price)
FROM sales
INNER JOIN menu ON sales.product_id = menu.product_id
WHERE customer_id = 'A';
sum
76

Can we perform this summation for every customer and store it as one row per customer. Yup, GROUP BY comes to the rescue.

SELECT customer_id, SUM(price)
FROM sales
INNER JOIN menu ON sales.product_id = menu.product_id
GROUP BY customer_id;
customer_idsum
B74
C36
A76

Excellent, we got the data we need. Maybe polish it slightly by prefixing the numbers with a $ and naming the column to something better?

SELECT customer_id, '$' || SUM(price) as total_amount_spent
FROM sales
INNER JOIN menu on sales.product_id = menu.product_id
GROUP BY customer_id;
customer_idtotal_amount_spent
B$74
C$36
A$76

Question

How many days has each customer visited the restaurant?

Solution

SELECT customer_id, COUNT(DISTINCT order_date)
FROM sales
GROUP BY customer_id;
customer_idcount
A4
B6
C2

Walkthrough

Let’s start by simplifying the question to an individual customer.

Which days did customer A visit the restaurant?

SELECT order_date
FROM sales
WHERE customer_id = 'A';
order_date
2021-01-01
2021-01-01
2021-01-07
2021-01-10
2021-01-11
2021-01-11

Okay. There’s repeats though. We can get rid of the duplicates if we apply DISTINCT to the order_date.

SELECT DISTINCT(order_date)
FROM sales
WHERE customer_id = 'A';
order_date
2021-01-01
2021-01-07
2021-01-10
2021-01-11

Now, we could get the COUNT of these distinct dates to get the number of different days that customer A visited the restaurant.

SELECT COUNT(DISTINCT(order_date))
FROM sales
WHERE customer_id = 'A';
count
4

We’d be done if we could generalize this over all customers and get one row per customer_id. So, GROUP BY is our friend.

SELECT customer_id, COUNT(DISTINCT order_date)
FROM sales
GROUP BY customer_id;
customer_idcount
A4
B6
C2

Question

What was the first item from the menu purchased by each customer?

Solution

WITH orders_ranked_by_date AS (
    SELECT customer_id,
           product_id,
           order_date,
           DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
    FROM sales
)
SELECT orders_ranked_by_date.customer_id, menu.product_id, menu.product_name, orders_ranked_by_date.order_date
FROM orders_ranked_by_date
INNER JOIN menu ON menu.product_id = orders_ranked_by_date.product_id
WHERE rank = 1;
customer_idproduct_idproduct_nameorder_date
A1sushi2021-01-01
B2curry2021-01-01
C3ramen2021-01-01

Walkthrough

Let’s start by asking a simpler question.

What was the first item from the menu purchased by customer B?

SELECT customer_id, product_name, order_date
FROM sales
INNER JOIN menu m on sales.product_id = m.product_id
WHERE customer_id = 'B'
ORDER BY order_date
LIMIT 1;

Now, if we only had a specific customer, then this would be good enough, but the resulting dataset we want varies over customer_id and lists one such row per customer.

Essentially, we wish to select the first product purchased (by date) per customer in the group of orders made by that customer. This is an instance of the “Top N per group” problem with N=1, and the usual way we approach it is via assigning distinct ranks to all rows of the same group.

In this case, we assign ranks 1, 2, ... to the orders made by customer A with the lowest rank going to the earliest order_date. Then, we can pluck out the lowest ranked row out of all groups.

In practice, this is implemented using window functions (or DENSE_RANK() OVER (PARTITION BY <colA> ...), to be precise), where <colA> is a value shared by all rows within a group (i.e. customer_id in this case, since each customer forms their own group of orders).

SELECT customer_id,
        product_id,
        order_date,
        DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM sales
customer_idproduct_idorder_daterank
A12021-01-011
A22021-01-012
A22021-01-073
A32021-01-104
A32021-01-115
A32021-01-116
B22021-01-011
B22021-01-022
B12021-01-043
B12021-01-114
B32021-01-165
B32021-02-016
C32021-01-011
C32021-01-012
C32021-01-073

Notice that the first order for every customer corresponds to the row with the lowest rank (i.e. rank = 1) in that group. So, let’s filter that out, and with a Common Table Expression, join this data with menu to get the product names.

WITH orders_ranked_by_date AS (
    SELECT customer_id,
           product_id,
           order_date,
           DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
    FROM sales
)
SELECT orders_ranked_by_date.customer_id, menu.product_id, menu.product_name, orders_ranked_by_date.order_date
FROM orders_ranked_by_date
INNER JOIN menu ON menu.product_id = orders_ranked_by_date.product_id
WHERE rank = 1;
customer_idproduct_idproduct_nameorder_date
A1sushi2021-01-01
B2curry2021-01-01
C3ramen2021-01-01

Question

What is the most purchased item on the menu and how many times was it purchased by all customers?

Solution

SELECT sales.product_id, product_name, COUNT(*) AS total_purchases
FROM sales
INNER JOIN menu m on sales.product_id = m.product_id
GROUP BY sales.product_id, product_name
ORDER BY total_purchases DESC
LIMIT 1;
product_idproduct_nametotal_purchases
3ramen8

Walkthrough

We can simply COUNT the rows for every group of sales for a given product_id.

SELECT product_id, COUNT(*) AS total_purchases
FROM sales
GROUP BY product_id
ORDER BY total_purchases DESC
LIMIT 1;
product_idtotal_purchases
38

Now, bring in menu to get the name of the product alongside it.

SELECT sales.product_id, product_name, COUNT(*) AS total_purchases
FROM sales
INNER JOIN menu m on sales.product_id = m.product_id
GROUP BY sales.product_id, product_name
ORDER BY total_purchases DESC
LIMIT 1;
product_idproduct_nametotal_purchases
3ramen8

Question

Which item was the most popular for each customer?

Solution

WITH number_of_purchases_per_customer_and_product AS (
    SELECT customer_id, product_id, COUNT(*) num_of_purchases
    FROM sales
    GROUP BY customer_id, product_id
),
purchases_ranked_by_customer AS (
    SELECT 
        customer_id, 
        product_id, 
        num_of_purchases,
        DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY num_of_purchases DESC) AS rank
    FROM number_of_purchases_per_customer_and_product
)
SELECT customer_id, menu.product_id, menu.product_name, num_of_purchases
FROM purchases_ranked_by_customer
INNER JOIN menu ON menu.product_id = purchases_ranked_by_customer.product_id
WHERE rank = 1;
customer_idproduct_idproduct_namenum_of_purchases
C3ramen3
B3ramen2
A3ramen3

Walkthrough

Similar to Question 3, this looks like another “Top N per group” instance with N=1 where rows in a group share the customer_id, and we wish to rank the rows by the total number of purchases per product.

Let’s start by building an intermediate table that stores the number of purchases for every pair of (customer_id, product_id).

SELECT customer_id, product_id, COUNT(*) num_of_purchases
FROM sales
GROUP BY customer_id, product_id;
customer_idproduct_idnum_of_purchases
B32
A33
A11
C33
B12
B22
A22

Next, we’d like to form groups of these rows based on the shared value of customer_id and rank the rows in the groups by the descending order of num_of_purchases.

WITH number_of_purchases_per_customer_and_product AS (
    SELECT customer_id, product_id, COUNT(*) num_of_purchases
    FROM sales
    GROUP BY customer_id, product_id
)
SELECT 
    customer_id, 
    product_id, 
    num_of_purchases,
    DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY num_of_purchases DESC) AS rank
FROM number_of_purchases_per_customer_and_product;
customer_idproduct_idnum_of_purchasesrank
A331
A222
A113
B321
B122
B223
C331

Finally, the rows of interest are those where rank = 1, so let’s filter it out. Seems like we’ll have to chain the CTE.

WITH number_of_purchases_per_customer_and_product AS (
    SELECT customer_id, product_id, COUNT(*) num_of_purchases
    FROM sales
    GROUP BY customer_id, product_id
),
purchases_ranked_by_customer AS (
    SELECT 
        customer_id, 
        product_id, 
        num_of_purchases,
        DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY num_of_purchases DESC) AS rank
    FROM number_of_purchases_per_customer_and_product
)
SELECT customer_id, menu.product_id, menu.product_name, num_of_purchases
FROM purchases_ranked_by_customer
INNER JOIN menu ON menu.product_id = purchases_ranked_by_customer.product_id
WHERE rank = 1;
customer_idproduct_idproduct_namenum_of_purchases
C3ramen3
B3ramen2
A3ramen3

Question

Which item was purchased first by the customer after they became a member?

Solution

WITH orders_by_customer_after_membership AS (
    SELECT sales.customer_id, order_date, product_id
    FROM sales
        INNER JOIN members m ON sales.customer_id = m.customer_id
    WHERE order_date >= m.join_date
),
orders_by_customer_ranked_by_date_after_membership AS (
    SELECT customer_id,
           order_date,
           product_id,
           DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
    FROM orders_by_customer_after_membership
)
SELECT customer_id, order_date, menu.product_id, menu.product_name
FROM orders_by_customer_ranked_by_date_after_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_ranked_by_date_after_membership.product_id
WHERE rank = 1;
customer_idorder_dateproduct_idproduct_name
B2021-01-111sushi
A2021-01-072curry

Walkthrough

A variant of Question 3, this is a “Top N per group” instance with N=1 where rows in a group share the customer_id, and we wish to rank the rows by the ascending order of the order_date but we only include those rows in the groups that have order_date >= join_date for that member.

Let’s start by building an intermediate table representing a set of rows that will participate in the customer_id grouping, filtering out those sales records where the user wasn’t yet a member.

SELECT sales.customer_id, order_date, product_id
FROM sales
    INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date >= m.join_date;
customer_idorder_dateproduct_id
A2021-01-072
A2021-01-103
A2021-01-113
A2021-01-113
B2021-01-111
B2021-01-163
B2021-02-013

Now, we’ll rank the rows in the groups by the ascending order of order_date.

WITH orders_by_customer_after_membership AS (
    SELECT sales.customer_id, order_date, product_id
    FROM sales
        INNER JOIN members m ON sales.customer_id = m.customer_id
    WHERE order_date >= m.join_date
)
SELECT customer_id,
       order_date,
       product_id,
       DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM orders_by_customer_after_membership;
customer_idorder_dateproduct_idrank
A2021-01-0721
A2021-01-1032
A2021-01-1133
A2021-01-1134
B2021-01-1111
B2021-01-1632
B2021-02-0133

Finally, the rows of interest are those with rank = 1. So, we’ll retain those, and bring in menu to get the name of the products alongside.

WITH orders_by_customer_after_membership AS (
    SELECT sales.customer_id, order_date, product_id
    FROM sales
        INNER JOIN members m ON sales.customer_id = m.customer_id
    WHERE order_date >= m.join_date
),
orders_by_customer_ranked_by_date_after_membership AS (
    SELECT customer_id,
           order_date,
           product_id,
           DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
    FROM orders_by_customer_after_membership
)
SELECT customer_id, order_date, menu.product_id, menu.product_name
FROM orders_by_customer_ranked_by_date_after_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_ranked_by_date_after_membership.product_id
WHERE rank = 1;
customer_idorder_dateproduct_idproduct_name
B2021-01-111sushi
A2021-01-072curry

Question

Which item was purchased just before the customer became a member?

Solution

WITH orders_by_customer_before_membership AS (
    SELECT sales.customer_id, order_date, product_id
    FROM sales
        INNER JOIN members m ON sales.customer_id = m.customer_id
    WHERE order_date < m.join_date
),
orders_by_customer_ranked_by_date_before_membership AS (
    SELECT customer_id,
           order_date,
           product_id,
           DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rank
    FROM orders_by_customer_before_membership
)
SELECT customer_id, order_date, menu.product_id, menu.product_name
FROM orders_by_customer_ranked_by_date_before_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_ranked_by_date_before_membership.product_id
WHERE rank = 1;
customer_idorder_dateproduct_idproduct_name
B2021-01-041sushi
A2021-01-011sushi

Walkthrough

In the previous question, we built an intermediate table that filtered out all the sales records before the customer became a member. Then we ranked the grouped rows based on customer_id in the ascending order of order_date. Finally, we retained the highest ranked rows.

To answer this question, we’ll take a similar but mirrored approach (i.e. reflection across the vertical line x=0). Our intermediate table will filter out all the sales records after the customer became a member, inclusive of the start date. Then we’ll rank the grouped rows based on customer_id in the descending order of order_date and retain the highest ranked rows.

SELECT sales.customer_id, order_date, product_id
FROM sales
    INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date;
customer_idorder_dateproduct_id
A2021-01-011
A2021-01-012
B2021-01-012
B2021-01-022
B2021-01-041
WITH orders_by_customer_before_membership AS (
    SELECT sales.customer_id, order_date, product_id
    FROM sales
        INNER JOIN members m ON sales.customer_id = m.customer_id
    WHERE order_date < m.join_date
)
SELECT customer_id,
       order_date,
       product_id,
       DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rank
FROM orders_by_customer_before_membership;
customer_idorder_dateproduct_idrank
A2021-01-0111
A2021-01-0122
B2021-01-0411
B2021-01-0222
B2021-01-0123
WITH orders_by_customer_before_membership AS (
    SELECT sales.customer_id, order_date, product_id
    FROM sales
        INNER JOIN members m ON sales.customer_id = m.customer_id
    WHERE order_date < m.join_date
),
orders_by_customer_ranked_by_date_before_membership AS (
    SELECT customer_id,
           order_date,
           product_id,
           DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rank
    FROM orders_by_customer_before_membership
)
SELECT customer_id, order_date, menu.product_id, menu.product_name
FROM orders_by_customer_ranked_by_date_before_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_ranked_by_date_before_membership.product_id
WHERE rank = 1;
customer_idorder_dateproduct_idproduct_name
B2021-01-041sushi
A2021-01-011sushi

Question

What is the total items and amount spent for each member before they became a member?

Solution

WITH orders_by_customer_before_membership AS (
    SELECT sales.customer_id, order_date, product_id
    FROM sales
        INNER JOIN members m ON sales.customer_id = m.customer_id
    WHERE order_date < m.join_date
)
SELECT orders_by_customer_before_membership.customer_id,
       SUM(price),
       COUNT(*)
FROM orders_by_customer_before_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_before_membership.product_id
GROUP BY customer_id;
customer_idsumcount
B403
A252

Note: Since Customer C never became a member, we won’t include them in this calculation. But if we were asked to, we’d have to change the orders_by_customer_before_membership table to the following:

SELECT sales.customer_id, order_date, product_id
FROM sales
-    INNER JOIN members m ON sales.customer_id = m.customer_id
+    LEFT OUTER JOIN members m ON sales.customer_id = m.customer_id
- WHERE order_date < m.join_date
+ WHERE order_date < m.join_date OR m.join_date IS NULL

This would produce the following as the final result:

customer_idsumcount
B403
C363
A252

Walkthrough

Let’s start by building an intermediate table of all the sales records for customer before they became a member.

SELECT sales.customer_id, order_date, product_id, m.join_date
FROM sales
    INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date;
customer_idorder_dateproduct_idjoin_date
A2021-01-0112021-01-07
A2021-01-0122021-01-07
B2021-01-0122021-01-09
B2021-01-0222021-01-09
B2021-01-0412021-01-09

Now, the total number of items they purchased is the number of sales (i.e. COUNT of the rows), and the total amount they spent is the SUM of the prices of the items in the sales records.

WITH orders_by_customer_before_membership AS (
    SELECT sales.customer_id, order_date, product_id
    FROM sales
        INNER JOIN members m ON sales.customer_id = m.customer_id
    WHERE order_date < m.join_date
)
SELECT orders_by_customer_before_membership.customer_id,
       SUM(price),
       COUNT(*)
FROM orders_by_customer_before_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_before_membership.product_id
GROUP BY customer_id;
customer_idsumcount
B403
A252

Question

If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?

Solution

WITH sales_with_price AS (
    SELECT customer_id, order_date, m.product_id, product_name, price
    FROM sales
    INNER JOIN menu m ON sales.product_id = m.product_id
)
SELECT customer_id,
       SUM(
           CASE product_name
            WHEN 'sushi' THEN price * 10 * 2
            ELSE price * 10
            END
        ) AS total_points
FROM sales_with_price
GROUP BY customer_id;
customer_idtotal_points
B940
C360
A860

Walkthrough

Let’s start with pulling in the amounts from the menu table next to our sales records.

SELECT customer_id, order_date, m.product_id, product_name, price
FROM sales
INNER JOIN menu m ON sales.product_id = m.product_id
customer_idorder_dateproduct_idproduct_nameprice
B2021-01-041sushi10
A2021-01-011sushi10
B2021-01-111sushi10
B2021-01-012curry15
B2021-01-022curry15
A2021-01-012curry15
A2021-01-072curry15
A2021-01-113ramen12
A2021-01-113ramen12
A2021-01-103ramen12
B2021-01-163ramen12
B2021-02-013ramen12
C2021-01-013ramen12
C2021-01-013ramen12
C2021-01-073ramen12

Then we’ll SUM the transformed prices based on the values that product_name assumes:

WITH sales_with_price AS (
    SELECT customer_id, order_date, m.product_id, product_name, price
    FROM sales
    INNER JOIN menu m ON sales.product_id = m.product_id
)
SELECT customer_id,
       SUM(
           CASE product_name
            WHEN 'sushi' THEN price * 10 * 2
            ELSE price * 10
            END
        ) AS total_points
FROM sales_with_price
GROUP BY customer_id;
customer_idtotal_points
B940
C360
A860

Question

In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

Solution

WITH sales_annotated_with_first_week_check AS (
    SELECT sales.customer_id,
           order_date,
           menu.product_id,
           product_name,
           price,
           join_date,
           ((sales.order_date < members.join_date + INTERVAL '7 DAYS') AND (sales.order_date >= members.join_date)) is_within_first_week
    FROM sales
         LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
         INNER JOIN menu ON sales.product_id = menu.product_id
    WHERE order_date <= '2021-01-31' AND sales.customer_id IN ('A', 'B')
),
sales_with_points_multiplier AS (
    SELECT customer_id,
           order_date,
           product_name,
           price,
           join_date,
           (
               CASE is_within_first_week
                   WHEN TRUE THEN 2
                   ELSE (
                       CASE product_name
                           WHEN 'sushi' THEN 2
                           ELSE 1
                           END
                       )
                   END
               ) points_multiplier
    FROM sales_annotated_with_first_week_check
)
SELECT customer_id, SUM(price * points_multiplier * 10) AS total_points
FROM sales_with_points_multiplier
GROUP BY customer_id;
customer_idtotal_points
A1370
B820

Walkthrough

Let’s break this down into multiple sub-problems.

First, we’ll build an intermediate table that just stores whether the sales orders that the customer made are within the first week of them becoming a member (i.e. whether the new membership 2x point bonus applies).

While we’re there, we’ll also retain sales records only up to January 2021 and for customers A, and B.

SELECT sales.customer_id,
       order_date,
       menu.product_id,
       product_name,
       price,
       members.join_date,
       ((sales.order_date < members.join_date + INTERVAL '7 DAYS') AND (sales.order_date >= members.join_date)) is_within_first_week
FROM sales
    LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
    INNER JOIN menu ON sales.product_id = menu.product_id
WHERE order_date <= '2021-01-31' AND sales.customer_id IN ('A', 'B');
customer_idorder_dateproduct_idproduct_namepricejoin_dateis_within_first_week
A2021-01-103ramen122021-01-07true
A2021-01-113ramen122021-01-07true
A2021-01-113ramen122021-01-07true
A2021-01-012curry152021-01-07false
A2021-01-072curry152021-01-07true
A2021-01-011sushi102021-01-07false
B2021-01-163ramen122021-01-09false
B2021-01-012curry152021-01-09false
B2021-01-022curry152021-01-09false
B2021-01-041sushi102021-01-09false
B2021-01-111sushi102021-01-09true

Next, we’ll determine a points multiplier for these sales based on whether the customer bought Sushi or the order was made within the first week of gaining membership.

WITH sales_annotated_with_first_week_check AS (
    SELECT sales.customer_id,
           order_date,
           menu.product_id,
           product_name,
           price,
           members.join_date,
           ((sales.order_date < members.join_date + INTERVAL '7 DAYS') AND (sales.order_date >= members.join_date)) is_within_first_week
    FROM sales
        LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
        INNER JOIN menu ON sales.product_id = menu.product_id
    WHERE order_date <= '2021-01-31' AND sales.customer_id IN ('A', 'B')
)
SELECT customer_id,
       order_date,
       product_name,
       price,
       join_date,
       (
        CASE is_within_first_week
            WHEN TRUE THEN 2
            ELSE (
                CASE product_name
                    WHEN 'sushi' THEN 2
                    ELSE 1
                END
            )
        END
        ) points_multiplier

FROM sales_annotated_with_first_week_check;
customer_idorder_dateproduct_namepricejoin_datepoints_multiplier
A2021-01-10ramen122021-01-072
A2021-01-11ramen122021-01-072
A2021-01-11ramen122021-01-072
A2021-01-01curry152021-01-071
A2021-01-07curry152021-01-072
A2021-01-01sushi102021-01-072
B2021-01-16ramen122021-01-091
B2021-01-01curry152021-01-091
B2021-01-02curry152021-01-091
B2021-01-04sushi102021-01-092
B2021-01-11sushi102021-01-092

Finally, we’ll SUM up price * points_multiplier * 10 for these sales records and report it as the total points.

WITH sales_annotated_with_first_week_check AS (
    SELECT sales.customer_id,
           order_date,
           menu.product_id,
           product_name,
           price,
           join_date,
           ((sales.order_date < members.join_date + INTERVAL '7 DAYS') AND (sales.order_date >= members.join_date)) is_within_first_week
    FROM sales
         LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
         INNER JOIN menu ON sales.product_id = menu.product_id
    WHERE order_date <= '2021-01-31' AND sales.customer_id IN ('A', 'B')
),
sales_with_points_multiplier AS (
    SELECT customer_id,
           order_date,
           product_name,
           price,
           join_date,
           (
               CASE is_within_first_week
                   WHEN TRUE THEN 2
                   ELSE (
                       CASE product_name
                           WHEN 'sushi' THEN 2
                           ELSE 1
                           END
                       )
                   END
               ) points_multiplier
    FROM sales_annotated_with_first_week_check
)
SELECT customer_id, SUM(price * points_multiplier * 10) AS total_points
FROM sales_with_points_multiplier
GROUP BY customer_id;
customer_idtotal_points
A1370
B820

Join All The Things

The following questions are related to creating basic data tables that Danny and his team can use to quickly derive insights without needing to join the underlying tables using SQL.

Recreate the following table output using the available data:

customer_id order_date product_name price member
A 2021-01-01 curry 15 N
A 2021-01-01 sushi 10 N
A 2021-01-07 curry 15 Y
A 2021-01-10 ramen 12 Y
A 2021-01-11 ramen 12 Y
A 2021-01-11 ramen 12 Y
B 2021-01-01 curry 15 N
B 2021-01-02 curry 15 N
B 2021-01-04 sushi 10 N
B 2021-01-11 sushi 10 Y
B 2021-01-16 ramen 12 Y
B 2021-02-01 ramen 12 Y
C 2021-01-01 ramen 12 N
C 2021-01-01 ramen 12 N
C 2021-01-07 ramen 12 N

Solution

SELECT sales.customer_id,
       order_date,
       product_name,
       price,
       (
           CASE join_date IS NULL
           WHEN TRUE THEN 'N'
           ELSE
               (
                   CASE order_date >= join_date
                       WHEN TRUE THEN 'Y'
                       ELSE 'N'
                   END
                )
           END
       ) member
FROM sales
LEFT OUTER JOIN members m ON sales.customer_id = m.customer_id
INNER JOIN menu m2 ON sales.product_id = m2.product_id
ORDER BY customer_id, order_date, product_name;
customer_idorder_dateproduct_namepricemember
A2021-01-01curry15N
A2021-01-01sushi10N
A2021-01-07curry15Y
A2021-01-10ramen12Y
A2021-01-11ramen12Y
A2021-01-11ramen12Y
B2021-01-01curry15N
B2021-01-02curry15N
B2021-01-04sushi10N
B2021-01-11sushi10Y
B2021-01-16ramen12Y
B2021-02-01ramen12Y
C2021-01-01ramen12N
C2021-01-01ramen12N
C2021-01-07ramen12N

Walkthrough

Join all the tables, ensuring a LEFT OUTER JOIN from sales to members on customer_id so that even those customers that have never become a member show up in the table.

Additionally, CASE out the order_date and return Y if the customer was a member on the day of the order, and N otherwise. Finally, order the rows by (customer_id, order_date, product_name).

SELECT sales.customer_id,
       order_date,
       product_name,
       price,
       (
           CASE join_date IS NULL
           WHEN TRUE THEN 'N'
           ELSE
               (
                   CASE order_date >= join_date
                       WHEN TRUE THEN 'Y'
                       ELSE 'N'
                   END
                )
           END
       ) member
FROM sales
LEFT OUTER JOIN members m ON sales.customer_id = m.customer_id
INNER JOIN menu m2 ON sales.product_id = m2.product_id
ORDER BY customer_id, order_date, product_name;
customer_idorder_dateproduct_namepricemember
A2021-01-01curry15N
A2021-01-01sushi10N
A2021-01-07curry15Y
A2021-01-10ramen12Y
A2021-01-11ramen12Y
A2021-01-11ramen12Y
B2021-01-01curry15N
B2021-01-02curry15N
B2021-01-04sushi10N
B2021-01-11sushi10Y
B2021-01-16ramen12Y
B2021-02-01ramen12Y
C2021-01-01ramen12N
C2021-01-01ramen12N
C2021-01-07ramen12N

Rank All The Things

Danny also requires further information about the ranking of customer products, but he purposely does not need the ranking for non-member purchases so he expects null ranking values for the records when customers are not yet part of the loyalty program.

customer_id order_date product_name price member ranking
A 2021-01-01 curry 15 N null
A 2021-01-01 sushi 10 N null
A 2021-01-07 curry 15 Y 1
A 2021-01-10 ramen 12 Y 2
A 2021-01-11 ramen 12 Y 3
A 2021-01-11 ramen 12 Y 3
B 2021-01-01 curry 15 N null
B 2021-01-02 curry 15 N null
B 2021-01-04 sushi 10 N null
B 2021-01-11 sushi 10 Y 1
B 2021-01-16 ramen 12 Y 2
B 2021-02-01 ramen 12 Y 3
C 2021-01-01 ramen 12 N null
C 2021-01-01 ramen 12 N null
C 2021-01-07 ramen 12 N null

Solution

WITH sales_with_membership AS (
    SELECT sales.customer_id,
           order_date,
           product_name,
           price,
           (
               CASE order_date >= join_date
                   WHEN TRUE THEN 'Y'
                   ELSE 'N'
                   END
               ) member
    FROM sales
         LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
         INNER JOIN menu ON sales.product_id = menu.product_id
)
SELECT *,
       (
           CASE member
           WHEN 'N' THEN NULL
           ELSE
               DENSE_RANK() OVER (PARTITION BY customer_id, member ORDER BY order_date)
           END
        ) ranking
FROM sales_with_membership
ORDER BY customer_id, order_date;
customer_idorder_dateproduct_namepricememberranking
A2021-01-01sushi10Nnull
A2021-01-01curry15Nnull
A2021-01-07curry15Y1
A2021-01-10ramen12Y2
A2021-01-11ramen12Y3
A2021-01-11ramen12Y3
B2021-01-01curry15Nnull
B2021-01-02curry15Nnull
B2021-01-04sushi10Nnull
B2021-01-11sushi10Y1
B2021-01-16ramen12Y2
B2021-02-01ramen12Y3
C2021-01-01ramen12Nnull
C2021-01-01ramen12Nnull
C2021-01-07ramen12Nnull

Walkthrough

Most of the data remains the same from the previous bonus question, except for the addition of a new column that stores the ranking of the items per customer.

SELECT sales.customer_id,
        order_date,
        product_name,
        price,
        (
            CASE order_date >= join_date
                WHEN TRUE THEN 'Y'
                ELSE 'N'
            END
        ) member
FROM sales
        LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
        INNER JOIN menu ON sales.product_id = menu.product_id;

Now, we rank the rows by partitioning over (customer_id, member).

WITH sales_with_membership AS (
    SELECT sales.customer_id,
           order_date,
           product_name,
           price,
           (
               CASE order_date >= join_date
                   WHEN TRUE THEN 'Y'
                   ELSE 'N'
                   END
               ) member
    FROM sales
         LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
         INNER JOIN menu ON sales.product_id = menu.product_id
)
SELECT *,
       (
           CASE member
           WHEN 'N' THEN NULL
           ELSE
               DENSE_RANK() OVER (PARTITION BY customer_id, member ORDER BY order_date)
           END
        ) ranking
FROM sales_with_membership
ORDER BY customer_id, order_date;
customer_idorder_dateproduct_namepricememberranking
A2021-01-01sushi10Nnull
A2021-01-01curry15Nnull
A2021-01-07curry15Y1
A2021-01-10ramen12Y2
A2021-01-11ramen12Y3
A2021-01-11ramen12Y3
B2021-01-01curry15Nnull
B2021-01-02curry15Nnull
B2021-01-04sushi10Nnull
B2021-01-11sushi10Y1
B2021-01-16ramen12Y2
B2021-02-01ramen12Y3
C2021-01-01ramen12Nnull
C2021-01-01ramen12Nnull
C2021-01-07ramen12Nnull