8 Week SQL Challenge
We’ll try to implement solutions for some of the case studies from 8 Week SQL Challenge.
Setting up
If you wish to follow along these solutions/walkthroughs, you might find the best value in running them as you go, and for that you need a working Postgres database environment.
We’ll use a Docker container for Postgres so we don’t have to install it locally.
First, clone the repository to get the schemas for the challenges into a schema/
directory.
git clone https://github.com/aalekhpatel07/8-week-sql-challenge && \
cd 8-week-sql-challenge
Then, run the following command to start a Postgres14 docker container that includes the schemas for the challenges.
docker run \
--name "8-week-sql-challenge" \
--publish "5432:5432" \
--env POSTGRES_USER=postgres \
--env POSTGRES_PASSWORD=postgres \
--volume `pwd`/postgres_db:/var/lib/postgresql/data \
--volume `pwd`/schema:/docker-entrypoint-initdb.d/ \
--rm \
--detach \
postgres:14.1-alpine
Once the container has started, use your favourite IDE to connect to the database at 0.0.0.0:5432
.
The database has user/pass of postgres/postgres
and the URL is jdbc:postgresql://localhost:5432/
.
Danny’s Diner
Danny seriously loves Japanese food so in the beginning of 2021, he decides to embark upon a risky venture and opens up a cute little restaurant that sells his 3 favourite foods: sushi, curry and ramen.
Danny’s Diner is in need of your assistance to help the restaurant stay afloat - the restaurant has captured some very basic data from their few months of operation but have no idea how to use their data to help them run the business.
Problem Statement
Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite. Having this deeper connection with his customers will help him deliver a better and more personalised experience for his loyal customers.
He plans on using these insights to help him decide whether he should expand the existing customer loyalty program - additionally he needs help to generate some basic datasets so his team can easily inspect the data without needing to use SQL.
Danny has provided you with a sample of his overall customer data due to privacy issues - but he hopes that these examples are enough for you to write fully functioning SQL queries to help him answer his questions!
Danny has shared with you 3 key datasets for this case study:
sales
menu
members
Entity Relationship Diagram
Check the following diagram for a quick preview or access the interactive version at dbdiagram.io:
Full DDL is available here.
Case Study Questions
Each of the following case study questions can be answered using a single SQL statement:
- What is the total amount each customer spent at the restaurant?
- How many days has each customer visited the restaurant?
- What was the first item from the menu purchased by each customer?
- What is the most purchased item on the menu and how many times was it purchased by all customers?
- Which item was the most popular for each customer?
- Which item was purchased first by the customer after they became a member?
- Which item was purchased just before the customer became a member?
- What is the total items and amount spent for each member before they became a member?
- If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
- In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
Question
What is the total amount each customer spent at the restaurant?
Solution
SELECT customer_id, '$' || SUM(price) as total_amount_spent
FROM sales
INNER JOIN menu on sales.product_id = menu.product_id
GROUP BY customer_id;
customer_id | total_amount_spent |
---|---|
B | $74 |
C | $36 |
A | $76 |
Walkthrough
Let’s start by asking who has visited our restaurant so far?
SELECT DISTINCT(customer_id)
FROM sales;
customer_id |
---|
B |
C |
A |
Okay, what kinda purchases were made by A
?
SELECT *
FROM sales
WHERE customer_id = 'A';
customer_id | order_date | product_id |
---|---|---|
A | 2021-01-01 | 1 |
A | 2021-01-01 | 2 |
A | 2021-01-07 | 2 |
A | 2021-01-10 | 3 |
A | 2021-01-11 | 3 |
A | 2021-01-11 | 3 |
Nice. Can we get the prices for these products next to their sale records?
SELECT *
FROM sales
INNER JOIN menu ON sales.product_id = menu.product_id
WHERE customer_id = 'A';
customer_id | order_date | product_id | product_id | product_name | price |
---|---|---|---|---|---|
A | 2021-01-01 | 1 | 1 | sushi | 10 |
A | 2021-01-07 | 2 | 2 | curry | 15 |
A | 2021-01-01 | 2 | 2 | curry | 15 |
A | 2021-01-11 | 3 | 3 | ramen | 12 |
A | 2021-01-11 | 3 | 3 | ramen | 12 |
A | 2021-01-10 | 3 | 3 | ramen | 12 |
Okay, so the total amount that customer A
spent must be the sum of all these prices. There’s a SUM
aggregator we could use here.
SELECT SUM(price)
FROM sales
INNER JOIN menu ON sales.product_id = menu.product_id
WHERE customer_id = 'A';
sum |
---|
76 |
Can we perform this summation for every customer and store it as one row per customer. Yup, GROUP BY
comes to the rescue.
SELECT customer_id, SUM(price)
FROM sales
INNER JOIN menu ON sales.product_id = menu.product_id
GROUP BY customer_id;
customer_id | sum |
---|---|
B | 74 |
C | 36 |
A | 76 |
Excellent, we got the data we need.
Maybe polish it slightly by prefixing the numbers with a $
and naming the column to something better?
SELECT customer_id, '$' || SUM(price) as total_amount_spent
FROM sales
INNER JOIN menu on sales.product_id = menu.product_id
GROUP BY customer_id;
customer_id | total_amount_spent |
---|---|
B | $74 |
C | $36 |
A | $76 |
Question
How many days has each customer visited the restaurant?
Solution
SELECT customer_id, COUNT(DISTINCT order_date)
FROM sales
GROUP BY customer_id;
customer_id | count |
---|---|
A | 4 |
B | 6 |
C | 2 |
Walkthrough
Let’s start by simplifying the question to an individual customer.
Which days did customer A
visit the restaurant?
SELECT order_date
FROM sales
WHERE customer_id = 'A';
order_date |
---|
2021-01-01 |
2021-01-01 |
2021-01-07 |
2021-01-10 |
2021-01-11 |
2021-01-11 |
Okay. There’s repeats though. We can get rid of the duplicates if we apply DISTINCT
to the order_date
.
SELECT DISTINCT(order_date)
FROM sales
WHERE customer_id = 'A';
order_date |
---|
2021-01-01 |
2021-01-07 |
2021-01-10 |
2021-01-11 |
Now, we could get the COUNT
of these distinct dates to get the number of different days that customer A
visited the restaurant.
SELECT COUNT(DISTINCT(order_date))
FROM sales
WHERE customer_id = 'A';
count |
---|
4 |
We’d be done if we could generalize this over all customers and get one row per customer_id
. So, GROUP BY
is our friend.
SELECT customer_id, COUNT(DISTINCT order_date)
FROM sales
GROUP BY customer_id;
customer_id | count |
---|---|
A | 4 |
B | 6 |
C | 2 |
Question
What was the first item from the menu purchased by each customer?
Solution
WITH orders_ranked_by_date AS (
SELECT customer_id,
product_id,
order_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM sales
)
SELECT orders_ranked_by_date.customer_id, menu.product_id, menu.product_name, orders_ranked_by_date.order_date
FROM orders_ranked_by_date
INNER JOIN menu ON menu.product_id = orders_ranked_by_date.product_id
WHERE rank = 1;
customer_id | product_id | product_name | order_date |
---|---|---|---|
A | 1 | sushi | 2021-01-01 |
B | 2 | curry | 2021-01-01 |
C | 3 | ramen | 2021-01-01 |
Walkthrough
Let’s start by asking a simpler question.
What was the first item from the menu purchased by customer B
?
SELECT customer_id, product_name, order_date
FROM sales
INNER JOIN menu m on sales.product_id = m.product_id
WHERE customer_id = 'B'
ORDER BY order_date
LIMIT 1;
Now, if we only had a specific customer, then this would be good enough, but the resulting dataset we want varies over customer_id
and lists one such row per customer.
Essentially, we wish to select the first product purchased (by date) per customer in the group of orders made by that customer. This is an instance of the “Top N per group” problem with N=1
, and the usual way we approach it is via assigning distinct ranks to all rows of the same group.
In this case, we assign ranks 1, 2, ...
to the orders made by customer A
with the lowest rank going to the earliest order_date
. Then, we can pluck out the lowest ranked row out of all groups.
In practice, this is implemented using window functions (or DENSE_RANK() OVER (PARTITION BY <colA> ...)
, to be precise), where <colA>
is a value shared by all rows within a group (i.e. customer_id
in this case, since each customer forms their own group of orders).
SELECT customer_id,
product_id,
order_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM sales
customer_id | product_id | order_date | rank |
---|---|---|---|
A | 1 | 2021-01-01 | 1 |
A | 2 | 2021-01-01 | 2 |
A | 2 | 2021-01-07 | 3 |
A | 3 | 2021-01-10 | 4 |
A | 3 | 2021-01-11 | 5 |
A | 3 | 2021-01-11 | 6 |
B | 2 | 2021-01-01 | 1 |
B | 2 | 2021-01-02 | 2 |
B | 1 | 2021-01-04 | 3 |
B | 1 | 2021-01-11 | 4 |
B | 3 | 2021-01-16 | 5 |
B | 3 | 2021-02-01 | 6 |
C | 3 | 2021-01-01 | 1 |
C | 3 | 2021-01-01 | 2 |
C | 3 | 2021-01-07 | 3 |
Notice that the first order for every customer corresponds to the row with the lowest rank (i.e. rank = 1
) in that group. So, let’s filter that out, and with a Common Table Expression, join this data with menu
to get the product names.
WITH orders_ranked_by_date AS (
SELECT customer_id,
product_id,
order_date,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM sales
)
SELECT orders_ranked_by_date.customer_id, menu.product_id, menu.product_name, orders_ranked_by_date.order_date
FROM orders_ranked_by_date
INNER JOIN menu ON menu.product_id = orders_ranked_by_date.product_id
WHERE rank = 1;
customer_id | product_id | product_name | order_date |
---|---|---|---|
A | 1 | sushi | 2021-01-01 |
B | 2 | curry | 2021-01-01 |
C | 3 | ramen | 2021-01-01 |
Question
What is the most purchased item on the menu and how many times was it purchased by all customers?
Solution
SELECT sales.product_id, product_name, COUNT(*) AS total_purchases
FROM sales
INNER JOIN menu m on sales.product_id = m.product_id
GROUP BY sales.product_id, product_name
ORDER BY total_purchases DESC
LIMIT 1;
product_id | product_name | total_purchases |
---|---|---|
3 | ramen | 8 |
Walkthrough
We can simply COUNT
the rows for every group of sales for a given product_id
.
SELECT product_id, COUNT(*) AS total_purchases
FROM sales
GROUP BY product_id
ORDER BY total_purchases DESC
LIMIT 1;
product_id | total_purchases |
---|---|
3 | 8 |
Now, bring in menu
to get the name of the product alongside it.
SELECT sales.product_id, product_name, COUNT(*) AS total_purchases
FROM sales
INNER JOIN menu m on sales.product_id = m.product_id
GROUP BY sales.product_id, product_name
ORDER BY total_purchases DESC
LIMIT 1;
product_id | product_name | total_purchases |
---|---|---|
3 | ramen | 8 |
Question
Which item was the most popular for each customer?
Solution
WITH number_of_purchases_per_customer_and_product AS (
SELECT customer_id, product_id, COUNT(*) num_of_purchases
FROM sales
GROUP BY customer_id, product_id
),
purchases_ranked_by_customer AS (
SELECT
customer_id,
product_id,
num_of_purchases,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY num_of_purchases DESC) AS rank
FROM number_of_purchases_per_customer_and_product
)
SELECT customer_id, menu.product_id, menu.product_name, num_of_purchases
FROM purchases_ranked_by_customer
INNER JOIN menu ON menu.product_id = purchases_ranked_by_customer.product_id
WHERE rank = 1;
customer_id | product_id | product_name | num_of_purchases |
---|---|---|---|
C | 3 | ramen | 3 |
B | 3 | ramen | 2 |
A | 3 | ramen | 3 |
Walkthrough
Similar to Question 3, this looks like another “Top N per group” instance with N=1
where rows in a group share the customer_id
, and we wish to rank the rows by the total number of purchases per product.
Let’s start by building an intermediate table that stores the number of purchases for every pair of (customer_id, product_id)
.
SELECT customer_id, product_id, COUNT(*) num_of_purchases
FROM sales
GROUP BY customer_id, product_id;
customer_id | product_id | num_of_purchases |
---|---|---|
B | 3 | 2 |
A | 3 | 3 |
A | 1 | 1 |
C | 3 | 3 |
B | 1 | 2 |
B | 2 | 2 |
A | 2 | 2 |
Next, we’d like to form groups of these rows based on the shared value of customer_id
and rank the rows in the groups by the descending order of num_of_purchases
.
WITH number_of_purchases_per_customer_and_product AS (
SELECT customer_id, product_id, COUNT(*) num_of_purchases
FROM sales
GROUP BY customer_id, product_id
)
SELECT
customer_id,
product_id,
num_of_purchases,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY num_of_purchases DESC) AS rank
FROM number_of_purchases_per_customer_and_product;
customer_id | product_id | num_of_purchases | rank |
---|---|---|---|
A | 3 | 3 | 1 |
A | 2 | 2 | 2 |
A | 1 | 1 | 3 |
B | 3 | 2 | 1 |
B | 1 | 2 | 2 |
B | 2 | 2 | 3 |
C | 3 | 3 | 1 |
Finally, the rows of interest are those where rank = 1
, so let’s filter it out. Seems like we’ll have to chain the CTE.
WITH number_of_purchases_per_customer_and_product AS (
SELECT customer_id, product_id, COUNT(*) num_of_purchases
FROM sales
GROUP BY customer_id, product_id
),
purchases_ranked_by_customer AS (
SELECT
customer_id,
product_id,
num_of_purchases,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY num_of_purchases DESC) AS rank
FROM number_of_purchases_per_customer_and_product
)
SELECT customer_id, menu.product_id, menu.product_name, num_of_purchases
FROM purchases_ranked_by_customer
INNER JOIN menu ON menu.product_id = purchases_ranked_by_customer.product_id
WHERE rank = 1;
customer_id | product_id | product_name | num_of_purchases |
---|---|---|---|
C | 3 | ramen | 3 |
B | 3 | ramen | 2 |
A | 3 | ramen | 3 |
Question
Which item was purchased first by the customer after they became a member?
Solution
WITH orders_by_customer_after_membership AS (
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date >= m.join_date
),
orders_by_customer_ranked_by_date_after_membership AS (
SELECT customer_id,
order_date,
product_id,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM orders_by_customer_after_membership
)
SELECT customer_id, order_date, menu.product_id, menu.product_name
FROM orders_by_customer_ranked_by_date_after_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_ranked_by_date_after_membership.product_id
WHERE rank = 1;
customer_id | order_date | product_id | product_name |
---|---|---|---|
B | 2021-01-11 | 1 | sushi |
A | 2021-01-07 | 2 | curry |
Walkthrough
A variant of Question 3, this is a “Top N per group” instance with N=1
where rows in a group share the customer_id
, and we wish to rank the rows by the ascending order of the order_date
but we only include those rows in the groups that have order_date >= join_date
for that member.
Let’s start by building an intermediate table representing a set of rows that will participate in the customer_id
grouping, filtering out those sales records where the user wasn’t yet a member.
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date >= m.join_date;
customer_id | order_date | product_id |
---|---|---|
A | 2021-01-07 | 2 |
A | 2021-01-10 | 3 |
A | 2021-01-11 | 3 |
A | 2021-01-11 | 3 |
B | 2021-01-11 | 1 |
B | 2021-01-16 | 3 |
B | 2021-02-01 | 3 |
Now, we’ll rank the rows in the groups by the ascending order of order_date
.
WITH orders_by_customer_after_membership AS (
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date >= m.join_date
)
SELECT customer_id,
order_date,
product_id,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM orders_by_customer_after_membership;
customer_id | order_date | product_id | rank |
---|---|---|---|
A | 2021-01-07 | 2 | 1 |
A | 2021-01-10 | 3 | 2 |
A | 2021-01-11 | 3 | 3 |
A | 2021-01-11 | 3 | 4 |
B | 2021-01-11 | 1 | 1 |
B | 2021-01-16 | 3 | 2 |
B | 2021-02-01 | 3 | 3 |
Finally, the rows of interest are those with rank = 1
. So, we’ll retain those, and bring in menu
to get the name of the products alongside.
WITH orders_by_customer_after_membership AS (
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date >= m.join_date
),
orders_by_customer_ranked_by_date_after_membership AS (
SELECT customer_id,
order_date,
product_id,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date) AS rank
FROM orders_by_customer_after_membership
)
SELECT customer_id, order_date, menu.product_id, menu.product_name
FROM orders_by_customer_ranked_by_date_after_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_ranked_by_date_after_membership.product_id
WHERE rank = 1;
customer_id | order_date | product_id | product_name |
---|---|---|---|
B | 2021-01-11 | 1 | sushi |
A | 2021-01-07 | 2 | curry |
Question
Which item was purchased just before the customer became a member?
Solution
WITH orders_by_customer_before_membership AS (
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date
),
orders_by_customer_ranked_by_date_before_membership AS (
SELECT customer_id,
order_date,
product_id,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rank
FROM orders_by_customer_before_membership
)
SELECT customer_id, order_date, menu.product_id, menu.product_name
FROM orders_by_customer_ranked_by_date_before_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_ranked_by_date_before_membership.product_id
WHERE rank = 1;
customer_id | order_date | product_id | product_name |
---|---|---|---|
B | 2021-01-04 | 1 | sushi |
A | 2021-01-01 | 1 | sushi |
Walkthrough
In the previous question, we built an intermediate table that filtered out all the sales records before the customer became a member. Then we ranked the grouped rows based on customer_id
in the ascending order of order_date
. Finally, we retained the highest ranked rows.
To answer this question, we’ll take a similar but mirrored approach (i.e. reflection across the vertical line x=0
). Our intermediate table will filter out all the sales records after the customer became a member, inclusive of the start date. Then we’ll rank the grouped rows based on customer_id
in the descending order of order_date
and retain the highest ranked rows.
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date;
customer_id | order_date | product_id |
---|---|---|
A | 2021-01-01 | 1 |
A | 2021-01-01 | 2 |
B | 2021-01-01 | 2 |
B | 2021-01-02 | 2 |
B | 2021-01-04 | 1 |
WITH orders_by_customer_before_membership AS (
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date
)
SELECT customer_id,
order_date,
product_id,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rank
FROM orders_by_customer_before_membership;
customer_id | order_date | product_id | rank |
---|---|---|---|
A | 2021-01-01 | 1 | 1 |
A | 2021-01-01 | 2 | 2 |
B | 2021-01-04 | 1 | 1 |
B | 2021-01-02 | 2 | 2 |
B | 2021-01-01 | 2 | 3 |
WITH orders_by_customer_before_membership AS (
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date
),
orders_by_customer_ranked_by_date_before_membership AS (
SELECT customer_id,
order_date,
product_id,
DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rank
FROM orders_by_customer_before_membership
)
SELECT customer_id, order_date, menu.product_id, menu.product_name
FROM orders_by_customer_ranked_by_date_before_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_ranked_by_date_before_membership.product_id
WHERE rank = 1;
customer_id | order_date | product_id | product_name |
---|---|---|---|
B | 2021-01-04 | 1 | sushi |
A | 2021-01-01 | 1 | sushi |
Question
What is the total items and amount spent for each member before they became a member?
Solution
WITH orders_by_customer_before_membership AS (
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date
)
SELECT orders_by_customer_before_membership.customer_id,
SUM(price),
COUNT(*)
FROM orders_by_customer_before_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_before_membership.product_id
GROUP BY customer_id;
customer_id | sum | count |
---|---|---|
B | 40 | 3 |
A | 25 | 2 |
Note: Since Customer C
never became a member, we won’t include them in this calculation. But if we were asked to, we’d have to change the orders_by_customer_before_membership
table to the following:
SELECT sales.customer_id, order_date, product_id
FROM sales
- INNER JOIN members m ON sales.customer_id = m.customer_id
+ LEFT OUTER JOIN members m ON sales.customer_id = m.customer_id
- WHERE order_date < m.join_date
+ WHERE order_date < m.join_date OR m.join_date IS NULL
This would produce the following as the final result:
customer_id | sum | count |
---|---|---|
B | 40 | 3 |
C | 36 | 3 |
A | 25 | 2 |
Walkthrough
Let’s start by building an intermediate table of all the sales records for customer before they became a member.
SELECT sales.customer_id, order_date, product_id, m.join_date
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date;
customer_id | order_date | product_id | join_date |
---|---|---|---|
A | 2021-01-01 | 1 | 2021-01-07 |
A | 2021-01-01 | 2 | 2021-01-07 |
B | 2021-01-01 | 2 | 2021-01-09 |
B | 2021-01-02 | 2 | 2021-01-09 |
B | 2021-01-04 | 1 | 2021-01-09 |
Now, the total number of items they purchased is the number of sales (i.e. COUNT
of the rows), and the total amount they spent is the SUM
of the price
s of the items in the sales records.
WITH orders_by_customer_before_membership AS (
SELECT sales.customer_id, order_date, product_id
FROM sales
INNER JOIN members m ON sales.customer_id = m.customer_id
WHERE order_date < m.join_date
)
SELECT orders_by_customer_before_membership.customer_id,
SUM(price),
COUNT(*)
FROM orders_by_customer_before_membership
INNER JOIN menu ON menu.product_id = orders_by_customer_before_membership.product_id
GROUP BY customer_id;
customer_id | sum | count |
---|---|---|
B | 40 | 3 |
A | 25 | 2 |
Question
If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
Solution
WITH sales_with_price AS (
SELECT customer_id, order_date, m.product_id, product_name, price
FROM sales
INNER JOIN menu m ON sales.product_id = m.product_id
)
SELECT customer_id,
SUM(
CASE product_name
WHEN 'sushi' THEN price * 10 * 2
ELSE price * 10
END
) AS total_points
FROM sales_with_price
GROUP BY customer_id;
customer_id | total_points |
---|---|
B | 940 |
C | 360 |
A | 860 |
Walkthrough
Let’s start with pulling in the amounts from the menu
table next to our sales records.
SELECT customer_id, order_date, m.product_id, product_name, price
FROM sales
INNER JOIN menu m ON sales.product_id = m.product_id
customer_id | order_date | product_id | product_name | price |
---|---|---|---|---|
B | 2021-01-04 | 1 | sushi | 10 |
A | 2021-01-01 | 1 | sushi | 10 |
B | 2021-01-11 | 1 | sushi | 10 |
B | 2021-01-01 | 2 | curry | 15 |
B | 2021-01-02 | 2 | curry | 15 |
A | 2021-01-01 | 2 | curry | 15 |
A | 2021-01-07 | 2 | curry | 15 |
A | 2021-01-11 | 3 | ramen | 12 |
A | 2021-01-11 | 3 | ramen | 12 |
A | 2021-01-10 | 3 | ramen | 12 |
B | 2021-01-16 | 3 | ramen | 12 |
B | 2021-02-01 | 3 | ramen | 12 |
C | 2021-01-01 | 3 | ramen | 12 |
C | 2021-01-01 | 3 | ramen | 12 |
C | 2021-01-07 | 3 | ramen | 12 |
Then we’ll SUM
the transformed price
s based on the values that product_name
assumes:
WITH sales_with_price AS (
SELECT customer_id, order_date, m.product_id, product_name, price
FROM sales
INNER JOIN menu m ON sales.product_id = m.product_id
)
SELECT customer_id,
SUM(
CASE product_name
WHEN 'sushi' THEN price * 10 * 2
ELSE price * 10
END
) AS total_points
FROM sales_with_price
GROUP BY customer_id;
customer_id | total_points |
---|---|
B | 940 |
C | 360 |
A | 860 |
Question
In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?
Solution
WITH sales_annotated_with_first_week_check AS (
SELECT sales.customer_id,
order_date,
menu.product_id,
product_name,
price,
join_date,
((sales.order_date < members.join_date + INTERVAL '7 DAYS') AND (sales.order_date >= members.join_date)) is_within_first_week
FROM sales
LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
INNER JOIN menu ON sales.product_id = menu.product_id
WHERE order_date <= '2021-01-31' AND sales.customer_id IN ('A', 'B')
),
sales_with_points_multiplier AS (
SELECT customer_id,
order_date,
product_name,
price,
join_date,
(
CASE is_within_first_week
WHEN TRUE THEN 2
ELSE (
CASE product_name
WHEN 'sushi' THEN 2
ELSE 1
END
)
END
) points_multiplier
FROM sales_annotated_with_first_week_check
)
SELECT customer_id, SUM(price * points_multiplier * 10) AS total_points
FROM sales_with_points_multiplier
GROUP BY customer_id;
customer_id | total_points |
---|---|
A | 1370 |
B | 820 |
Walkthrough
Let’s break this down into multiple sub-problems.
First, we’ll build an intermediate table that just stores whether the sales orders that the customer made are within the first week of them becoming a member (i.e. whether the new membership 2x
point bonus applies).
While we’re there, we’ll also retain sales records only up to January 2021 and for customers A
, and B
.
SELECT sales.customer_id,
order_date,
menu.product_id,
product_name,
price,
members.join_date,
((sales.order_date < members.join_date + INTERVAL '7 DAYS') AND (sales.order_date >= members.join_date)) is_within_first_week
FROM sales
LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
INNER JOIN menu ON sales.product_id = menu.product_id
WHERE order_date <= '2021-01-31' AND sales.customer_id IN ('A', 'B');
customer_id | order_date | product_id | product_name | price | join_date | is_within_first_week |
---|---|---|---|---|---|---|
A | 2021-01-10 | 3 | ramen | 12 | 2021-01-07 | true |
A | 2021-01-11 | 3 | ramen | 12 | 2021-01-07 | true |
A | 2021-01-11 | 3 | ramen | 12 | 2021-01-07 | true |
A | 2021-01-01 | 2 | curry | 15 | 2021-01-07 | false |
A | 2021-01-07 | 2 | curry | 15 | 2021-01-07 | true |
A | 2021-01-01 | 1 | sushi | 10 | 2021-01-07 | false |
B | 2021-01-16 | 3 | ramen | 12 | 2021-01-09 | false |
B | 2021-01-01 | 2 | curry | 15 | 2021-01-09 | false |
B | 2021-01-02 | 2 | curry | 15 | 2021-01-09 | false |
B | 2021-01-04 | 1 | sushi | 10 | 2021-01-09 | false |
B | 2021-01-11 | 1 | sushi | 10 | 2021-01-09 | true |
Next, we’ll determine a points multiplier for these sales based on whether the customer bought Sushi or the order was made within the first week of gaining membership.
WITH sales_annotated_with_first_week_check AS (
SELECT sales.customer_id,
order_date,
menu.product_id,
product_name,
price,
members.join_date,
((sales.order_date < members.join_date + INTERVAL '7 DAYS') AND (sales.order_date >= members.join_date)) is_within_first_week
FROM sales
LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
INNER JOIN menu ON sales.product_id = menu.product_id
WHERE order_date <= '2021-01-31' AND sales.customer_id IN ('A', 'B')
)
SELECT customer_id,
order_date,
product_name,
price,
join_date,
(
CASE is_within_first_week
WHEN TRUE THEN 2
ELSE (
CASE product_name
WHEN 'sushi' THEN 2
ELSE 1
END
)
END
) points_multiplier
FROM sales_annotated_with_first_week_check;
customer_id | order_date | product_name | price | join_date | points_multiplier |
---|---|---|---|---|---|
A | 2021-01-10 | ramen | 12 | 2021-01-07 | 2 |
A | 2021-01-11 | ramen | 12 | 2021-01-07 | 2 |
A | 2021-01-11 | ramen | 12 | 2021-01-07 | 2 |
A | 2021-01-01 | curry | 15 | 2021-01-07 | 1 |
A | 2021-01-07 | curry | 15 | 2021-01-07 | 2 |
A | 2021-01-01 | sushi | 10 | 2021-01-07 | 2 |
B | 2021-01-16 | ramen | 12 | 2021-01-09 | 1 |
B | 2021-01-01 | curry | 15 | 2021-01-09 | 1 |
B | 2021-01-02 | curry | 15 | 2021-01-09 | 1 |
B | 2021-01-04 | sushi | 10 | 2021-01-09 | 2 |
B | 2021-01-11 | sushi | 10 | 2021-01-09 | 2 |
Finally, we’ll SUM
up price * points_multiplier * 10
for these sales records and report it as the total points.
WITH sales_annotated_with_first_week_check AS (
SELECT sales.customer_id,
order_date,
menu.product_id,
product_name,
price,
join_date,
((sales.order_date < members.join_date + INTERVAL '7 DAYS') AND (sales.order_date >= members.join_date)) is_within_first_week
FROM sales
LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
INNER JOIN menu ON sales.product_id = menu.product_id
WHERE order_date <= '2021-01-31' AND sales.customer_id IN ('A', 'B')
),
sales_with_points_multiplier AS (
SELECT customer_id,
order_date,
product_name,
price,
join_date,
(
CASE is_within_first_week
WHEN TRUE THEN 2
ELSE (
CASE product_name
WHEN 'sushi' THEN 2
ELSE 1
END
)
END
) points_multiplier
FROM sales_annotated_with_first_week_check
)
SELECT customer_id, SUM(price * points_multiplier * 10) AS total_points
FROM sales_with_points_multiplier
GROUP BY customer_id;
customer_id | total_points |
---|---|
A | 1370 |
B | 820 |
Join All The Things
The following questions are related to creating basic data tables that Danny and his team can use to quickly derive insights without needing to join the underlying tables using SQL.
Recreate the following table output using the available data:
customer_id | order_date | product_name | price | member |
---|---|---|---|---|
A | 2021-01-01 | curry | 15 | N |
A | 2021-01-01 | sushi | 10 | N |
A | 2021-01-07 | curry | 15 | Y |
A | 2021-01-10 | ramen | 12 | Y |
A | 2021-01-11 | ramen | 12 | Y |
A | 2021-01-11 | ramen | 12 | Y |
B | 2021-01-01 | curry | 15 | N |
B | 2021-01-02 | curry | 15 | N |
B | 2021-01-04 | sushi | 10 | N |
B | 2021-01-11 | sushi | 10 | Y |
B | 2021-01-16 | ramen | 12 | Y |
B | 2021-02-01 | ramen | 12 | Y |
C | 2021-01-01 | ramen | 12 | N |
C | 2021-01-01 | ramen | 12 | N |
C | 2021-01-07 | ramen | 12 | N |
Solution
SELECT sales.customer_id,
order_date,
product_name,
price,
(
CASE join_date IS NULL
WHEN TRUE THEN 'N'
ELSE
(
CASE order_date >= join_date
WHEN TRUE THEN 'Y'
ELSE 'N'
END
)
END
) member
FROM sales
LEFT OUTER JOIN members m ON sales.customer_id = m.customer_id
INNER JOIN menu m2 ON sales.product_id = m2.product_id
ORDER BY customer_id, order_date, product_name;
customer_id | order_date | product_name | price | member |
---|---|---|---|---|
A | 2021-01-01 | curry | 15 | N |
A | 2021-01-01 | sushi | 10 | N |
A | 2021-01-07 | curry | 15 | Y |
A | 2021-01-10 | ramen | 12 | Y |
A | 2021-01-11 | ramen | 12 | Y |
A | 2021-01-11 | ramen | 12 | Y |
B | 2021-01-01 | curry | 15 | N |
B | 2021-01-02 | curry | 15 | N |
B | 2021-01-04 | sushi | 10 | N |
B | 2021-01-11 | sushi | 10 | Y |
B | 2021-01-16 | ramen | 12 | Y |
B | 2021-02-01 | ramen | 12 | Y |
C | 2021-01-01 | ramen | 12 | N |
C | 2021-01-01 | ramen | 12 | N |
C | 2021-01-07 | ramen | 12 | N |
Walkthrough
Join all the tables, ensuring a LEFT OUTER JOIN
from sales
to members
on customer_id
so that
even those customers that have never become a member show up in the table.
Additionally, CASE
out the order_date
and return Y
if the customer was a member on the day of the order, and N
otherwise. Finally, order the rows by (customer_id, order_date, product_name)
.
SELECT sales.customer_id,
order_date,
product_name,
price,
(
CASE join_date IS NULL
WHEN TRUE THEN 'N'
ELSE
(
CASE order_date >= join_date
WHEN TRUE THEN 'Y'
ELSE 'N'
END
)
END
) member
FROM sales
LEFT OUTER JOIN members m ON sales.customer_id = m.customer_id
INNER JOIN menu m2 ON sales.product_id = m2.product_id
ORDER BY customer_id, order_date, product_name;
customer_id | order_date | product_name | price | member |
---|---|---|---|---|
A | 2021-01-01 | curry | 15 | N |
A | 2021-01-01 | sushi | 10 | N |
A | 2021-01-07 | curry | 15 | Y |
A | 2021-01-10 | ramen | 12 | Y |
A | 2021-01-11 | ramen | 12 | Y |
A | 2021-01-11 | ramen | 12 | Y |
B | 2021-01-01 | curry | 15 | N |
B | 2021-01-02 | curry | 15 | N |
B | 2021-01-04 | sushi | 10 | N |
B | 2021-01-11 | sushi | 10 | Y |
B | 2021-01-16 | ramen | 12 | Y |
B | 2021-02-01 | ramen | 12 | Y |
C | 2021-01-01 | ramen | 12 | N |
C | 2021-01-01 | ramen | 12 | N |
C | 2021-01-07 | ramen | 12 | N |
Rank All The Things
Danny also requires further information about the ranking of customer products, but he purposely does not need the ranking for non-member purchases so he expects null ranking values for the records when customers are not yet part of the loyalty program.
customer_id | order_date | product_name | price | member | ranking |
---|---|---|---|---|---|
A | 2021-01-01 | curry | 15 | N | null |
A | 2021-01-01 | sushi | 10 | N | null |
A | 2021-01-07 | curry | 15 | Y | 1 |
A | 2021-01-10 | ramen | 12 | Y | 2 |
A | 2021-01-11 | ramen | 12 | Y | 3 |
A | 2021-01-11 | ramen | 12 | Y | 3 |
B | 2021-01-01 | curry | 15 | N | null |
B | 2021-01-02 | curry | 15 | N | null |
B | 2021-01-04 | sushi | 10 | N | null |
B | 2021-01-11 | sushi | 10 | Y | 1 |
B | 2021-01-16 | ramen | 12 | Y | 2 |
B | 2021-02-01 | ramen | 12 | Y | 3 |
C | 2021-01-01 | ramen | 12 | N | null |
C | 2021-01-01 | ramen | 12 | N | null |
C | 2021-01-07 | ramen | 12 | N | null |
Solution
WITH sales_with_membership AS (
SELECT sales.customer_id,
order_date,
product_name,
price,
(
CASE order_date >= join_date
WHEN TRUE THEN 'Y'
ELSE 'N'
END
) member
FROM sales
LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
INNER JOIN menu ON sales.product_id = menu.product_id
)
SELECT *,
(
CASE member
WHEN 'N' THEN NULL
ELSE
DENSE_RANK() OVER (PARTITION BY customer_id, member ORDER BY order_date)
END
) ranking
FROM sales_with_membership
ORDER BY customer_id, order_date;
customer_id | order_date | product_name | price | member | ranking |
---|---|---|---|---|---|
A | 2021-01-01 | sushi | 10 | N | null |
A | 2021-01-01 | curry | 15 | N | null |
A | 2021-01-07 | curry | 15 | Y | 1 |
A | 2021-01-10 | ramen | 12 | Y | 2 |
A | 2021-01-11 | ramen | 12 | Y | 3 |
A | 2021-01-11 | ramen | 12 | Y | 3 |
B | 2021-01-01 | curry | 15 | N | null |
B | 2021-01-02 | curry | 15 | N | null |
B | 2021-01-04 | sushi | 10 | N | null |
B | 2021-01-11 | sushi | 10 | Y | 1 |
B | 2021-01-16 | ramen | 12 | Y | 2 |
B | 2021-02-01 | ramen | 12 | Y | 3 |
C | 2021-01-01 | ramen | 12 | N | null |
C | 2021-01-01 | ramen | 12 | N | null |
C | 2021-01-07 | ramen | 12 | N | null |
Walkthrough
Most of the data remains the same from the previous bonus question, except for the addition of a new column that stores the ranking of the items per customer.
SELECT sales.customer_id,
order_date,
product_name,
price,
(
CASE order_date >= join_date
WHEN TRUE THEN 'Y'
ELSE 'N'
END
) member
FROM sales
LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
INNER JOIN menu ON sales.product_id = menu.product_id;
Now, we rank the rows by partitioning over (customer_id, member)
.
WITH sales_with_membership AS (
SELECT sales.customer_id,
order_date,
product_name,
price,
(
CASE order_date >= join_date
WHEN TRUE THEN 'Y'
ELSE 'N'
END
) member
FROM sales
LEFT OUTER JOIN members ON sales.customer_id = members.customer_id
INNER JOIN menu ON sales.product_id = menu.product_id
)
SELECT *,
(
CASE member
WHEN 'N' THEN NULL
ELSE
DENSE_RANK() OVER (PARTITION BY customer_id, member ORDER BY order_date)
END
) ranking
FROM sales_with_membership
ORDER BY customer_id, order_date;
customer_id | order_date | product_name | price | member | ranking |
---|---|---|---|---|---|
A | 2021-01-01 | sushi | 10 | N | null |
A | 2021-01-01 | curry | 15 | N | null |
A | 2021-01-07 | curry | 15 | Y | 1 |
A | 2021-01-10 | ramen | 12 | Y | 2 |
A | 2021-01-11 | ramen | 12 | Y | 3 |
A | 2021-01-11 | ramen | 12 | Y | 3 |
B | 2021-01-01 | curry | 15 | N | null |
B | 2021-01-02 | curry | 15 | N | null |
B | 2021-01-04 | sushi | 10 | N | null |
B | 2021-01-11 | sushi | 10 | Y | 1 |
B | 2021-01-16 | ramen | 12 | Y | 2 |
B | 2021-02-01 | ramen | 12 | Y | 3 |
C | 2021-01-01 | ramen | 12 | N | null |
C | 2021-01-01 | ramen | 12 | N | null |
C | 2021-01-07 | ramen | 12 | N | null |