Lab 10 - SQL

Setup

# install.packages(c("RSQLite", "DBI"))

library(RSQLite)
library(DBI)

# Initialize a temporary in memory database
con <- dbConnect(SQLite(), ":memory:")

# Download tables
actor <- read.csv("https://raw.githubusercontent.com/ivanceras/sakila/master/csv-sakila-db/actor.csv")
rental <- read.csv("https://raw.githubusercontent.com/ivanceras/sakila/master/csv-sakila-db/rental.csv")
customer <- read.csv("https://raw.githubusercontent.com/ivanceras/sakila/master/csv-sakila-db/customer.csv")
payment <- read.csv("https://raw.githubusercontent.com/ivanceras/sakila/master/csv-sakila-db/payment_p2007_01.csv")

# Copy data.frames to database
dbWriteTable(con, "actor", actor)
dbWriteTable(con, "rental", rental)
dbWriteTable(con, "customer", customer)
dbWriteTable(con, "payment", payment)
dbListTables(con)

TIP: You can use the following QUERY to see the structure of a table

PRAGMA table_info(actor)

SQL references:

https://www.w3schools.com/sql/

Exercise 1

Edit the code below to retrieve the actor ID, first name and last name for all actors using the actor table. Sort by last name and then by first name (note that the code chunk below is set up to run SQL code rather than R code).

SELECT
FROM
ORDER by

Exercise 2

Retrieve the actor ID, first name, and last name for actors whose last name equals ‘WILLIAMS’ or ‘DAVIS’.

SELECT 
FROM 
WHERE ___ IN ('WILLIAMS', 'DAVIS')

Exercise 3

Write a query against the rental table that returns the IDs of the customers who rented a film on July 5, 2005 (use the rental.rental_date column, and you can use the date() function to ignore the time component). Include a single row for each distinct customer ID.

SELECT DISTINCT 
FROM 
WHERE date(___) = '2005-07-05'

Exercise 4

Exercise 4.1

Construct a query that retrieves all rows from the payment table where the amount is either 1.99, 7.99, 9.99.

SELECT *
FROM ___
WHERE ___ IN (1.99, 7.99, 9.99)

Exercise 4.2

Construct a query that retrieves all rows from the payment table where the amount is greater then 5.

SELECT *
FROM 
WHERE 

Exercise 4.2

Construct a query that retrieves all rows from the payment table where the amount is greater then 5 and less then 8.

SELECT *
FROM ___
WHERE ___ AND ___

Exercise 5

Retrieve all the payment IDs and their amounts from the customers whose last name is ‘DAVIS’.

SELECT 
FROM 
  INNER JOIN 
WHERE 
AND 

Exercise 6

Exercise 6.1

Use COUNT(*) to count the number of rows in rental.


Exercise 6.2

Use COUNT(*) and GROUP BY to count the number of rentals for each customer_id.


Exercise 6.3

Repeat the previous query and sort by the count in descending order.


Exercise 6.4

Repeat the previous query but use HAVING to only keep the groups with 40 or more.


Exercise 7

Write a query that calculates a number of summary statistics for the payment table using MAX, MIN, AVG and SUM


Exercise 7.1

Modify the above query to do those calculations for each customer_id.


Exercise 7.2

Modify the above query to only keep the customer_ids that have more then 5 payments.


Cleanup

Run the following chunk to disconnect from the connection.

# clean up
dbDisconnect(con)