-
Notifications
You must be signed in to change notification settings - Fork 42
/
malloy_cardio.malloynb
91 lines (80 loc) · 4.76 KB
/
malloy_cardio.malloynb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
>>>markdown
# Malloy Cardio
This set of exercises is sure to get your heartrate up and your brain firing on all cylinders.
Fire up your favorite workout playlist and dive in 👇
>>>markdown
## Step 0: Get Ready
The first step in any Malloy data modeling exercise is to declare your sources. In the following code cell, declare a source for each of the tables in the ecommerce dataset. Refer to the [documentation for the precise syntax](https://malloydata.github.io/documentation/language/source).
>>>malloy
// users table at data/users.parquet
source: users is duckdb.table('data/ecommerce/users.parquet')
// Create a source for the rest of the tables in this dataset:
// products
// order_items
// orders
>>>markdown
## Step 1: Warmup
Simple queries are easy to write in Malloy. The following query looks at the count of users by gender:
>>>malloy
run: users -> {
group_by: gender
aggregate: user_count is count()
}
>>>markdown
Now your turn to try. Write a query that calculates the count of orders by status.
>>>malloy
// Calculate the count of orders by status:
// run: orders -> {
// ...
// }
>>>markdown
## Step 2: Joins
In this data model, `order_items` and `orders` are related fact tables, and the `users` and `products` tables are dimensions that join in. Update the sources above to include these [joins](https://malloydata.github.io/documentation/language/join):
- `orders` has a one-to-many join to `order_items` via `order_id`
- `orders` joins to `users` via `user_id`
- `order_items` joins to `products` via `product_id`
>>>markdown
## Step 3: Using Joins
Now that we have the joins modeled, let's run a query that takes advantage of them. Count the number of orders by user's country.
>>>malloy
// Compute the average age of users who ordered from each category
>>>markdown
## Step 4a: Aggregate Locality
Malloy makes the promise that regardless of what joins you have defined, your aggregate calculations will always be correct. This means you no longer need to worry about SQL "gotchas" like join fan-outs or chasm traps. In Malloy, this concept is called [aggregate locality](https://malloydata.github.io/documentation/language/aggregates#aggregate-locality).
To see this in action, write a query that calculates two quantities:
- The total sales for each product category
- The average age of users who bought products in that product category
After you write the Malloy query, think about how you would accomplish this same task in SQL, and why it might be difficult.
>>>malloy
// Write one query to compute the total sales and average age of purchasing users for each product category
>>>markdown
## Step 4b: Calculating Percent of Total
Malloy makes "[percent of total](https://malloydata.github.io/documentation/language/ungrouped-aggregates)" calculations very easy. For each product category, what is the total revenue, and calculate the percent of overall revenue.
>>>malloy
// Compute revenue for each product category, as well as percent of revenue relative to overall
>>>markdown
Now that you have some useful aggregations, it might be helpful to define them as [measures](https://malloydata.github.io/documentation/language/fields#measures) in the model, so they can be easily reused.
>>>markdown
## Step 4c: Number Formatting
[Add a tag](https://malloydata.github.io/documentation/visualizations/numbers#-percent) to the percent of total field so that the values render as percentages.
>>>markdown
## Step 5a: Nested Query
The above query shows us revenue numbers per product category. Let's drill in deeper with a nested query. Can you add a `nest` to this query to show the top 5 brands by revenue for each category?
>>>malloy
// For each product category, show top 5 brands by revenue
>>>markdown
## Step 5b: Charting
Add a tag to the nested query to [render it as a bar chart](https://malloydata.github.io/documentation/visualizations/bar_charts).
>>>markdown
## Step 6: Saving this query
Update the data model to save the "top brands" query directly onto the source. Now modify what you previously wrote to use this saved query.
>>>malloy
// Update what you previously wrote to use a saved query from the Source
>>>markdown
## Step 7: Refining a Saved Query
Try using a query refinement to take the analysis further. Perhaps you can filter to only look at top brands where the `department` is Men or Women? Or perhaps you want to only look at brands with products above a certain `retail_price` threshold?
>>>malloy
// Update this query again to include a refinement
>>>markdown
## Step 8: Keep Exploring!
There are a handful of other datasets in this repository that are ready for modeling, exploration, and analysis. Take what you've learned here and try to apply it to another dataset in the `data` subfolder. Or better yet, find a dataset relevant to your day-to-day work and use Malloy to model and analyze it!