Home

Mongo DB

All those examples are inspired from mongo db univversity courses

Aggregation

Simple Examples

We will use the products collection as an example NB: to import data in mongodb use this command line
> mongoimport --db dbName --collection collectionName fileName.json

To start we would like to translate this SQL request to mongo db :

select manufacturer, count(*) from products group by manufacturer

the result will be : > db.products.aggregate([
{$group:
{
_id:"$manufacturer",
num_products:{$sum:1}
}
}
])

A second example :

select category, count(*) from products group by category

the result will be :

> db.products.aggregate([
{$group:
{
_id:"$category",
num_products:{$sum:1}
}
}
])

To Execute an aggregation query, mongo-db run through the collections that exist prior to stage of pipeline ($group) and then building a new set of documents with the _id as specified (manufacturer or category in the previous example) and then run the aggregation operator on the other fieldthat we have created (num_products in this example).

Aggregation with compound grouping

Now we would like to translate a more complicated query that combine more than one keys for grouping.

SQL Query :
select manufacturer, category, count(*) from products group by manufacturer, category

Mongo Query:
> db.products.aggregate([
{$group:
{
_id: {
"manufacturer":"$manufacturer",
"category" : "$category"},
num_products:{$sum:1}
}
}
])

We have to explain that mongo db accept an object (json) in the _id field and that what we use to execute an aggregation we multiple key.

Aggregation Pipeline

the following are the stages in the aggregation pipeline

$project -> reshape -> 1:1
$match -> filter -> n:1
$group -> aggregate -> n:1
$sort -> sort -> 1:1
$skip -> skips -> n:1
$limit -> limits -> n:1
$unwind -> normalize -> 1:n
$out -> output -> 1:1
$redact : security operator
$geonear : to perform location queries

Aggregation expressions

$sum When would like to calculate a population of a state if we know already the population on cities
$> mongoimport --db agg --collection zips zips.json
> db.zips.aggregate([ {$group: {_id:"$state", population:{$sum:"$pop"} } }])
$avg This query is to calculate the population average by state
> db.zips.aggregate([{$group:{"_id":"$state" ,"average_pop":{$avg:"$pop"}}}])
$addToSet The goal of this query is to collect postal codes for each city
> db.zips.aggregate([{$group: { _id: "$state" , "postal_codes":{$addToSet:"$_id"}}}])
$push The $push expression is similar as tha the $addToSet one but there is a small difference that the push don't check if there is already the same value on the created list (duplicated value is possible)
> db.zips.aggregate([{$group: { _id: "$state" , "postal_codes":{$push:"$_id"}}}])
$min and $max Those expressions allow to lookup for a minimum or a maximum on a field, but it's not clear enough as a document on result.
> db.zips.aggregate([{$group: { _id: "$state" , "pop":{$max:"$pop"}}}])
Using Double $group To explain this we will use a students collection (you can instaniate you databse with this code) This is how it looks like a student document
{
"_id" : ObjectId("5828ccd6be23da3363e10849"),
"student_id" : 0,
"scores" : [
{
"type" : "exam",
"score" : 59.14721560654949
},
{
"type" : "quiz",
"score" : 78.62425565750493
},
{
"type" : "homework",
"score" : 31.611900078918353
},
{
"type" : "homework",
"score" : 31.227237906183625
}
],
"class_id" : 86
}

As we see, if we would like to calculate the score average by class it's not possible to do it one shot because each student has multiple score > db.students.aggregate([{$group:{_id:{class_id:"$class_id", student_id:"$student_id"}, score:{$avg:"$scores.score"}}}, {$group:{_id:"$_id.class_id", score:{$avg:"$score"}}}])

Aggregation Using Projection

This aggregation allow to reshape a document. For example if we want to trasform this document
{
"city" : "ACMAR",
"loc" : [
-86.51557,
33.584132
],
"pop" : 6055,
"state" : "AL",
"_id" : "35004"
}

To

{
"city" : "acmar",
"pop" : 6055,
"state" : "AL",
"zip" : "35004"
}

This query is the solution:
> db.zips.aggregate([{$project:{'zip':'$_id','city': {$toLower:"$city"}, 'pop':1, 'state':1, _id:0 }} ])

Filtering data on Aggregation

The use of $match before $group allow to filter data before execute the grouping aggregation.
> db.zips.aggregate([{$match:{state:"NY"}},{$group:{_id: "$city", population: {$sum:"$pop"}, zip_codes: {$addToSet: "$_id"}}},{$project:{ _id: 0, city: "$_id", population: 1, zip_codes:1}}])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly