Skip to content

Latest commit

 

History

History
1138 lines (888 loc) · 23.8 KB

chapter3.md

File metadata and controls

1138 lines (888 loc) · 23.8 KB
title description
Aggregate Functions
This chapter builds on the first two by teaching you how to use aggregate functions to summarize your data and gain useful insights. Additionally, you'll learn about arithmetic in SQL, and how to use aliases to make your results more readable!

Aggregate functions

type: BulletExercise
lang: sql
xp: 100
key: b883e7079f

Often, you will want to perform some calculation on the data in a database. SQL provides a few functions, called aggregate functions, to help you out with this.

For example,

SELECT AVG(budget)
FROM films;

gives you the average value from the budget column of the films table. Similarly, the MAX function returns the highest budget:

SELECT MAX(budget)
FROM films;

The SUM function returns the result of adding up the numeric values in a column:

SELECT SUM(budget)
FROM films;

You can probably guess what the MIN function does! Now it's your turn to try out some SQL functions.

@pre_exercise_code

connect('postgresql', 'films')
set_options(visible_tables = ['films'])

@sample_code


type: NormalExercise
key: 80fd462ae1
xp: 30

@instructions Use the SUM function to get the total duration of all films. @solution

SELECT SUM(duration)
FROM films;

@hint

SELECT ___(___)
FROM ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
sum_call = temp.check_field('name').has_equal_ast('Are you calling the `SUM` function?')
sum_args = temp.check_field('args').has_equal_ast('Are you using using `SUM` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    sum_call,
    sum_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: 7993b51268
xp: 30

@instructions Get the average duration of all films. @solution

SELECT AVG(duration)
FROM films;

@hint

SELECT ___(___)
FROM ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
avg_call = temp.check_field('name').has_equal_ast('Are you calling the `AVG` function?')
avg_args = temp.check_field('args').has_equal_ast('Are you using using `AVG` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    avg_call,
    avg_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: a03aeabbc6
xp: 30

@instructions Get the duration of the shortest film. @solution

SELECT MIN(duration)
FROM films;

@hint

SELECT ___(___)
FROM ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
min_call = temp.check_field('name').has_equal_ast('Are you calling the `MIN` function?')
min_args = temp.check_field('args').has_equal_ast('Are you using using `MIN` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    min_call,
    min_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: fabbc619c6
xp: 30

@instructions Get the duration of the longest film. @solution

SELECT MAX(duration)
FROM films;

@hint

SELECT ___(___)
FROM ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
max_call = temp.check_field('name').has_equal_ast('Are you calling the `MAX` function?')
max_args = temp.check_field('args').has_equal_ast('Are you using using `MAX` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    max_call,
    max_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

Aggregate functions practice

type: BulletExercise
lang: sql
xp: 100
key: e60103b3f1

Good work. Aggregate functions are important to understand, so let's get some more practice!

@pre_exercise_code

connect('postgresql', 'films')
set_options(visible_tables = ['films'])

@sample_code


type: NormalExercise
key: c8173b7d3e
xp: 30

@instructions Use the SUM function to get the total amount grossed by all films. @solution

SELECT SUM(gross)
FROM films;

@hint

SELECT ___(___)
FROM ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
sum_call = temp.check_field('name').has_equal_ast('Are you calling the `SUM` function?')
sum_args = temp.check_field('args').has_equal_ast('Are you using using `SUM` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    sum_call,
    sum_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: 24c0ab68ad
xp: 30

@instructions Get the average amount grossed by all films. @solution

SELECT AVG(gross)
FROM films;

@hint

SELECT ___(___)
FROM ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
avg_call = temp.check_field('name').has_equal_ast('Are you calling the `AVG` function?')
avg_args = temp.check_field('args').has_equal_ast('Are you using using `AVG` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    avg_call,
    avg_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: 19838082cb
xp: 30

@instructions Get the amount grossed by the worst performing film. @solution

SELECT MIN(gross)
FROM films;

@hint

SELECT ___(___)
FROM ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
min_call = temp.check_field('name').has_equal_ast('Are you calling the `MIN` function?')
min_args = temp.check_field('args').has_equal_ast('Are you using using `MIN` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    min_call,
    min_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: a49b98de42
xp: 30

@instructions Get the amount grossed by the best performing film. @solution

SELECT MAX(gross)
FROM films;

@hint

SELECT ___(___)
FROM ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
max_call = temp.check_field('name').has_equal_ast('Are you calling the `MAX` function?')
max_args = temp.check_field('args').has_equal_ast('Are you using using `MAX` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    max_call,
    max_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

Combining aggregate functions with WHERE

type: BulletExercise
lang: sql
xp: 100
key: b44bd43288

Aggregate functions can be combined with the WHERE clause to gain further insights from your data.

For example, to get the total budget of movies made in the year 2010 or later:

SELECT SUM(budget)
FROM films
WHERE release_year >= 2010;

Now it's your turn!

@pre_exercise_code

connect('postgresql', 'films')
set_options(visible_tables = ['films'])

@sample_code


type: NormalExercise
key: b986f33a10
xp: 30

@instructions Use the SUM function to get the total amount grossed by all films made in the year 2000 or later. @solution

SELECT SUM(gross)
FROM films
WHERE release_year >= 2000;

@hint

SELECT ___(___)
FROM ___
WHERE ___ >= ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
sum_call = temp.check_field('name').has_equal_ast('Are you calling the `SUM` function?')
sum_args = temp.check_field('args').has_equal_ast('Are you using using `SUM` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

where_clause = sel.check_field('where_clause')

where_release_year = where_clause.has_equal_ast(sql='release_year >= 2000', start='expression', exact=False, msg='Did you check the `release_year` correctly?')

Ex().test_correct(check_result(), [
    from_clause,
    where_release_year,
    sum_call,
    sum_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: 65e81175c5
xp: 30

@instructions Get the average amount grossed by all films whose titles start with the letter 'A'. @solution

SELECT AVG(gross)
FROM films
where title LIKE 'A%';

@hint

SELECT ___(___)
FROM ___
WHERE ___ LIKE 'A%';

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
avg_call = temp.check_field('name').has_equal_ast('Are you calling the `AVG` function?')
avg_args = temp.check_field('args').has_equal_ast('Are you using using `AVG` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

where_clause = sel.check_field('where_clause')

left_like = where_clause.check_field('left').has_equal_ast('Are you using `title` with `LIKE`?')
op_like = where_clause.check_field('op').has_equal_ast('Are you using the `LIKE` operator in your `WHERE` clause?')

right_like = where_clause.check_field('right').has_equal_ast("Are you using `LIKE` with `'A%'`?")

Ex().test_correct(check_result(), [
    from_clause,
    left_like,
    op_like,
    right_like,
    avg_call,
    avg_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: 3ca90a8536
xp: 30

@instructions Get the amount grossed by the worst performing film in 1994. @solution

SELECT MIN(gross)
FROM films
WHERE release_year = 1994;

@hint

SELECT ___(___)
FROM ___
WHERE ___ = ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
min_call = temp.check_field('name').has_equal_ast('Are you calling the `MIN` function?')
min_args = temp.check_field('args').has_equal_ast('Are you using using `MIN` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

where_clause = sel.check_field('where_clause')

where_release_year = where_clause.has_equal_ast(sql='release_year = 1994', start='expression', exact=False, msg='Did you check the `release_year` correctly?')

Ex().test_correct(check_result(), [
    from_clause,
    where_release_year,
    min_call,
    min_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: d2e630e656
xp: 30

@instructions Get the amount grossed by the best performing film between 2000 and 2012, inclusive. @solution

SELECT MAX(gross)
FROM films
WHERE release_year BETWEEN 2000 AND 2012;

@hint

SELECT ___(___)
FROM ___
WHERE ___ BETWEEN ___ AND ___;

@sct

sel = check_node('SelectStmt')

temp = sel.check_node('Call')
max_call = temp.check_field('name').has_equal_ast('Are you calling the `MAX` function?')
max_args = temp.check_field('args').has_equal_ast('Are you using using `MAX` on the right column?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

where_clause = sel.check_field('where_clause')

between_left = where_clause.check_field('left').has_equal_ast('Are you using `release_year` with `BETWEEN`?')
between_op1 = where_clause.check_field('right', 0).has_equal_ast('Check the first part of your `BETWEEN`!')
between_op2 = where_clause.check_field('right', 1).has_equal_ast('Check the second part of your `BETWEEN`!')

Ex().test_correct(check_result(), [
    from_clause,
    between_left,
    between_op1,
    between_op2,
    max_call,
    max_args,
    test_has_columns(),
    test_ncols(),
    test_error()
])

A note on arithmetic

type: MultipleChoiceExercise
lang: sql
xp: 50
skills: 1
key: 7b8b54b64d

In addition to using aggregate functions, you can perform basic arithmetic with symbols like +, -, *, and /.

So, for example, this gives a result of 12:

SELECT (4 * 3);

However, the following gives a result of 1:

SELECT (4 / 3);

What's going on here?

SQL assumes that if you divide an integer by an integer, you want to get an integer back. So be careful when dividing!

If you want more precision when dividing, you can add decimal places to your numbers. For example,

SELECT (4.0 / 3.0) AS result;

gives you the result you would expect: 1.333.


What is the result of `SELECT (10 / 3);`?

@instructions

  • 2.333
  • 3.333
  • 3
  • 3.0

@hint Run a query in the editor to the right.

@pre_exercise_code

connect('postgresql', 'films')

@sample_code

-- You can test out queries here!

@sct

success_msg = 'Correct!'
msg2 = "Incorrect, try out the query in the editor!"

Ex().test_mc(3,[msg2, msg2, success_msg, msg2])

It's AS simple AS aliasing

type: BulletExercise
lang: sql
xp: 100
key: 9f4b026fe7

You may have noticed in the first exercise of this chapter that the column name of your result was just the name of the function you used. For example,

SELECT MAX(budget)
FROM films;

gives you a result with one column, named max. But what if you use two functions like this?

SELECT MAX(budget), MAX(duration)
FROM films;

Well, then you'd have two columns named max, which isn't very useful!

To avoid situations like this, SQL allows you to do something called aliasing. Aliasing simply means you assign a temporary name to something. To alias, you use the AS keyword, which you've already seen earlier in this course.

For example, in the above example we could use aliases to make the result clearer:

SELECT MAX(budget) AS max_budget,
       MAX(duration) AS max_duration
FROM films;

Aliases are helpful for making results more readable!

@pre_exercise_code

connect('postgresql', 'films')
set_options(visible_tables = ['films'])

@sample_code



type: NormalExercise
key: ec33c2353b
xp: 30

@instructions Get the title and net profit (the amount a film grossed, minus its budget) for all films. Alias the net profit as net_profit.

@solution

SELECT title, gross - budget AS net_profit
FROM films;

@hint

SELECT ___, ___ - ___ AS ___
FROM ___;

@sct

sel = check_node('SelectStmt')

title = test_column('title').has_equal_ast('Did you select the `title` column correctly?')

alias = test_column('net_profit', match='exact', msg='Did you alias your result as `net_profit`?')

alias_eqn = sel.check_node('AliasExpr').check_node('BinaryExpr')

left_eqn = alias_eqn.check_field('left').has_equal_ast('Are you using the `gross` column?')

right_eqn = alias_eqn.check_field('right').has_equal_ast('Are you using the `budget` column?')

op_eqn = alias_eqn.check_field('op').has_equal_ast('Are you subtracting `budget` from `gross`?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    alias_eqn,
    left_eqn,
    op_eqn,
    right_eqn,
    alias,
    title,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: 1351c6f6bb
xp: 30

@instructions Get the title and duration in hours for all films. The duration is in minutes, so you'll need to divide by 60.0 to get the duration in hours. Alias the duration in hours as duration_hours.

@solution

SELECT title, duration / 60.0 AS duration_hours
FROM films;

@hint

SELECT ___, ___ / 60.0 AS ___
FROM ___;

@sct


sel = check_node('SelectStmt')

title = test_column('title', msg='Did you select the `title` column correctly?')

alias = test_column('duration_hours', match='exact', msg='Did you alias your result as `duration_hours`?', digits=4)

alias_eqn = sel.check_node('AliasExpr').check_node('BinaryExpr')

left_eqn = alias_eqn.check_field('left').has_equal_ast('Are you using the `duration` column?')

right_eqn = alias_eqn.check_field('right').has_equal_ast('Are you dividing the `duration` column by `60.0`?')

op_eqn = alias_eqn.check_field('op').has_equal_ast('Are you dividing by `60.0`?')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(alias, [
    from_clause,
    left_eqn,
    op_eqn,
    right_eqn,
    alias,
    title,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: 497f8d2a8a
xp: 30

@instructions Get the average duration in hours for all films, aliased as avg_duration_hours.

@solution

SELECT AVG(duration) / 60.0 AS avg_duration_hours  
FROM films;

@hint

SELECT ___(___) / 60.0 AS avg_duration_hours  
FROM ___;

@sct

# TODO: come back to this with better solution once sqlwhat is patched
sel = check_node('SelectStmt')

alias = test_column('avg_duration_hours', match='exact', msg='Did you alias your result as `avg_duration_hours`?', digits=4)

avg1 = test_student_typed('AVG\(duration\)\s+\/\s+60.0', msg='Are you calling `AVG` correctly?')
avg2 = test_student_typed('AVG\(duration\s+\/\s+60.0\)', msg='Are you calling `AVG` correctly?')
avg3 = test_student_typed('AVG\(duration\/60.0\)', msg='Are you calling `AVG` correctly?')
avg4 = test_student_typed('AVG\(duration\/60.0\)', msg='Are you calling `AVG` correctly?')


avg_call = test_or(avg1, avg2, avg3, avg4)

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(alias, [
    from_clause,
    avg_call,
    alias,
    test_has_columns(),
    test_ncols(),
    test_error()
])

Even more aliasing

type: BulletExercise
lang: sql
xp: 100
key: 7e3a93209c

Let's practice your newfound aliasing skills some more before moving on!

Recall: SQL assumes that if you divide an integer by an integer, you want to get an integer back.

This means that the following will erroneously result in 400.0:

SELECT 45 / 10 * 100.0;

This is because 45 / 10 evaluates to an integer (4), and not a decimal number like we would expect.

So when you're dividing make sure at least one of your numbers has a decimal place:

SELECT 45 * 100.0 / 10;

The above now gives the correct answer of 450.0 as now the numerator of the division (45 * 100.0) is a decimal!

@pre_exercise_code

connect('postgresql', 'films')
set_options(visible_tables = ['films', 'people'])

@sample_code

-- get the count(deathdate) and multiply by 100.0
-- then divide by count(*) 

type: NormalExercise
key: e14dc7c1a2
xp: 30

@instructions Get the percentage of people who are no longer alive. Alias the result as percentage_dead. Remember to use 100.0 and not 100! @solution

-- get the count(deathdate) and multiply by 100.0
-- then divide by count(*) 
SELECT COUNT(deathdate) * 100.0 / COUNT(*) AS percentage_dead
FROM people;

@hint

SELECT ___(___) * 100.0 / ___(___) AS percentage_dead
FROM ___;

@sct

sel = check_node('SelectStmt')

alias = test_column('percentage_dead', match='exact', msg='Did you alias your result as `percentage_dead`?')

alias_eqn = sel.check_node('AliasExpr').check_node('BinaryExpr')

left_eqn = alias_eqn.check_node('BinaryExpr')

right_eqn = alias_eqn.check_node('Call').has_equal_ast('Are you dividing by `COUNT(*)`?')

temp = left_eqn.check_field('left')

count_call = temp.check_field('name').has_equal_ast('Are you using the `COUNT` function for the top of your fraction?')

count_args = temp.check_field('args').has_equal_ast('Are you using `COUNT` on the right column?')

op_eqn = left_eqn.check_field('op').has_equal_ast('Are you multiplying `COUNT(deathdate)` by `100.00`?')

right_left_eqn = left_eqn.check_field('right').has_equal_ast('Make sure to multiply the top by `100.0`!')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

Ex().test_correct(check_result(), [
    from_clause,
    count_call,
    count_args,
    op_eqn,
    right_left_eqn,
    right_eqn,
    alias,
    test_has_columns(),
    test_ncols(),
    test_error()
])

type: NormalExercise
key: c2bbd9a806
xp: 30

@instructions Get the number of years between the newest film and oldest film. Alias the result as difference.

@solution

SELECT MAX(release_year) - MIN(release_year)
AS difference
FROM films;

@hint

SELECT ___(___) - ___(___)
AS difference
FROM ___;

@sct

sel = check_node('SelectStmt')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

alias = test_column('difference', match='exact', msg='Did you alias your result as `difference`?')

alias_eqn = sel.check_node('AliasExpr').check_node('BinaryExpr')

left_eqn = alias_eqn.check_field('left')
right_eqn = alias_eqn.check_field('right')

max_call = left_eqn.check_field('name').has_equal_ast('Did you use `MAX` function to get the oldest film?')
max_args = left_eqn.check_field('args').has_equal_ast('Are you using `MAX` on the right column?')

min_call = right_eqn.check_field('name').has_equal_ast('Did you use the `MIN` function to get the newest film?')
min_args = right_eqn.check_field('args').has_equal_ast('Are you using `MIN` on the right column?')

op_eqn = alias_eqn.check_field('op').has_equal_ast('Are you subtracting the most recent year from the least recent year?')

Ex().test_correct(check_result(), [
    from_clause,
    max_call,
    max_args,
    op_eqn,
    min_call,
    min_args,
    alias,
    test_has_columns(),
    test_ncols(),
    test_error()
])


type: NormalExercise
key: f272486b68
xp: 30

@instructions Get the number of decades the films table covers. Alias the result as number_of_decades. The top half of your fraction should be enclosed in parentheses. @solution

SELECT (MAX(release_year) - MIN(release_year)) / 10.0
AS number_of_decades
FROM films;

@hint

SELECT (___(___) - ___(___)) / 10.0
AS number_of_decades
FROM ___;

@sct

sel = check_node('SelectStmt')

from_clause = sel.check_field('from_clause').has_equal_ast('Is your `FROM` clause correct?')

alias = test_column('number_of_decades', match='exact', msg='Did you alias your result as `number_of_decades`?')

alias_eqn = sel.check_node('AliasExpr').check_node('BinaryExpr')

left_eqn = alias_eqn.check_field('left')

max_node = left_eqn.check_field('left')

max_call = max_node.check_field('name').has_equal_ast('Did you use the `MAX` function to get the most recent year?')
max_args = max_node.check_field('args').has_equal_ast('Did you use `MAX` on the right column?')

min_node = left_eqn.check_field('right')

min_call = min_node.check_field('name').has_equal_ast('Did you use the `MIN` function to get the least recent year?')
min_args = min_node.check_field('args').has_equal_ast('Did you use `MIN` on the right column?')

op_eqn = left_eqn.check_field('op').has_equal_ast('Are you subtracting the newest year from the oldest year?')

other_op = alias_eqn.check_field('op').has_equal_ast("Don't forget to divide by `10.0`!")

ten = alias_eqn.check_field('right').has_equal_ast('Did you divide by `10.0`?')


Ex().test_correct(check_result(), [
    from_clause,
    max_call,
    max_args,
    op_eqn,
    min_call,
    min_args,
    other_op,
    ten,
    alias,
    test_has_columns(),
    test_ncols(),
    test_error()
])