-
Notifications
You must be signed in to change notification settings - Fork 0
Home
All those examples are inspired from mongo db univversity courses
We will use the products collection as an example
NB: to import data in mongodb use this command line
> mongoimport --db dbName --collection collectionName fileName.json
To start we would like to translate this SQL request to mongo db :
select manufacturer, count(*) from products group by manufacturer
the result will be :
> db.products.aggregate([
{$group:
{
_id:"$manufacturer",
num_products:{$sum:1}
}
}
])
A second example :
select category, count(*) from products group by category
the result will be :
> db.products.aggregate([
{$group:
{
_id:"$category",
num_products:{$sum:1}
}
}
])
To Execute an aggregation query, mongo-db run through the collections that exist prior to stage of pipeline ($group) and then building a new set of documents with the _id as specified (manufacturer or category in the previous example) and then run the aggregation operator on the other fieldthat we have created (num_products in this example).
Now we would like to translate a more complicated query that combine more than one keys for grouping.
SQL Query :
select manufacturer, category, count(*) from products group by manufacturer, category
Mongo Query:
> db.products.aggregate([
{$group:
{
_id: {
"manufacturer":"$manufacturer",
"category" : "$category"},
num_products:{$sum:1}
}
}
])
We have to explain that mongo db accept an object (json) in the _id field and that what we use to execute an aggregation we multiple key.
the following are the stages in the aggregation pipeline
- $project -> reshape -> 1:1
- $match -> filter -> n:1
- $group -> aggregate -> n:1
- $sort -> sort -> 1:1
- $skip -> skips -> n:1
- $limit -> limits -> n:1
- $unwind -> normalize -> 1:n
- $out -> output -> 1:1
- $redact : security operator
- $geonear : to perform location queries
-
$sum When would like to calculate a population of a state if we know already the population on cities
$> mongoimport --db agg --collection zips
zips.json
> db.zips.aggregate([ {$group: {_id:"$state", population:{$sum:"$pop"} } }])
-
$avg This query is to calculate the population average by state
> db.zips.aggregate([{$group:{"_id":"$state" ,"average_pop":{$avg:"$pop"}}}])
-
$addToSet The goal of this query is to collect postal codes for each city
> db.zips.aggregate([{$group: { _id: "$state" , "postal_codes":{$addToSet:"$_id"}}}])
-
$push The $push expression is similar as tha the $addToSet one but there is a small difference that the push don't check if there is already the same value on the created list (duplicated value is possible)
> db.zips.aggregate([{$group: { _id: "$state" , "postal_codes":{$push:"$_id"}}}])
-
$min and $max Those expressions allow to lookup for a minimum or a maximum on a field, but it's not clear enough as a document on result.
> db.zips.aggregate([{$group: { _id: "$state" , "pop":{$max:"$pop"}}}])
-
Using Double $group To explain this we will use a students collection (you can instaniate you databse with this code) This is how it looks like a student document
{
"_id" : ObjectId("5828ccd6be23da3363e10849"),
"student_id" : 0,
"scores" : [
{
"type" : "exam",
"score" : 59.14721560654949
},
{
"type" : "quiz",
"score" : 78.62425565750493
},
{
"type" : "homework",
"score" : 31.611900078918353
},
{
"type" : "homework",
"score" : 31.227237906183625
}
],
"class_id" : 86
}
As we see, if we would like to calculate the score average by class it's not possible to do it one shot because each student has multiple score
> db.students.aggregate([{$group:{_id:{class_id:"$class_id", student_id:"$student_id"}, score:{$avg:"$scores.score"}}}, {$group:{_id:"$_id.class_id", score:{$avg:"$score"}}}])
This aggregation allow to reshape a document.
For example if we want to trasform this document
{
"city" : "ACMAR",
"loc" : [
-86.51557,
33.584132
],
"pop" : 6055,
"state" : "AL",
"_id" : "35004"
}
To
{
"city" : "acmar",
"pop" : 6055,
"state" : "AL",
"zip" : "35004"
}
This query is the solution:
> db.zips.aggregate([{$project:{'zip':'$_id','city': {$toLower:"$city"}, 'pop':1, 'state':1, _id:0 }} ])
The use of $match before $group allow to filter data before execute the grouping aggregation.
> db.zips.aggregate([{$match:{state:"NY"}},{$group:{_id: "$city", population: {$sum:"$pop"}, zip_codes: {$addToSet: "$_id"}}},{$project:{ _id: 0, city: "$_id", population: 1, zip_codes:1}}])