Skip to content
Ahmed Toumi edited this page Nov 14, 2016 · 27 revisions

Mongo DB

All those examples are inspired from mongo db univversity courses

Aggregation

Simple Examples

We will use the products collection as an example NB: to import data in mongodb use this command line
> mongoimport --db dbName --collection collectionName fileName.json

To start we would like to translate this SQL request to mongo db :

select manufacturer, count(*) from products group by manufacturer

the result will be : > db.products.aggregate([
{$group:
{
_id:"$manufacturer",
num_products:{$sum:1}
}
}
])

A second example :

select category, count(*) from products group by category

the result will be :

> db.products.aggregate([
{$group:
{
_id:"$category",
num_products:{$sum:1}
}
}
])

To Execute an aggregation query, mongo-db run through the collections that exist prior to stage of pipeline ($group) and then building a new set of documents with the _id as specified (manufacturer or category in the previous example) and then run the aggregation operator on the other fieldthat we have created (num_products in this example).

Aggregation with compound grouping

Now we would like to translate a more complicated query that combine more than one keys for grouping.

SQL Query :
select manufacturer, category, count(*) from products group by manufacturer, category

Mongo Query:
> db.products.aggregate([
{$group:
{
_id: {
"manufacturer":"$manufacturer",
"category" : "$category"},
num_products:{$sum:1}
}
}
])

We have to explain that mongo db accept an object (json) in the _id field and that what we use to execute an aggregation we multiple key.

Aggregation Pipeline

the following are the stages in the aggregation pipeline

  1. $project -> reshape -> 1:1
  2. $match -> filter -> n:1
  3. $group -> aggregate -> n:1
  4. $sort -> sort -> 1:1
  5. $skip -> skips -> n:1
  6. $limit -> limits -> n:1
  7. $unwind -> normalize -> 1:n
  8. $out -> output -> 1:1
  9. $redact : security operator
  10. $geonear : to perform location queries

Aggregation expressions

  1. $sum When would like to calculate a population of a state if we know already the population on cities
    $> mongoimport --db agg --collection zips zips.json
    > db.zips.aggregate([ {$group: {_id:"$state", population:{$sum:"$pop"} } }])

  2. $avg This query is to calculate the population average by state
    > db.zips.aggregate([{$group:{"_id":"$state" ,"average_pop":{$avg:"$pop"}}}])

  3. $addToSet The goal of this query is to collect postal codes for each city
    > db.zips.aggregate([{$group: { _id: "$state" , "postal_codes":{$addToSet:"$_id"}}}])

  4. $push The $push expression is similar as tha the $addToSet one but there is a small difference that the push don't check if there is already the same value on the created list (duplicated value is possible)
    > db.zips.aggregate([{$group: { _id: "$state" , "postal_codes":{$push:"$_id"}}}])

  5. $min and $max Those expressions allow to lookup for a minimum or a maximum on a field, but it's not clear enough as a document on result.
    > db.zips.aggregate([{$group: { _id: "$state" , "pop":{$max:"$pop"}}}])

  6. Using Double $group To explain this we will use a students collection (you can instaniate you databse with this code) This is how it looks like a student document
    {
    "_id" : ObjectId("5828ccd6be23da3363e10849"),
    "student_id" : 0,
    "scores" : [
    {
    "type" : "exam",
    "score" : 59.14721560654949
    },
    {
    "type" : "quiz",
    "score" : 78.62425565750493
    },
    {
    "type" : "homework",
    "score" : 31.611900078918353
    },
    {
    "type" : "homework",
    "score" : 31.227237906183625
    }
    ],
    "class_id" : 86
    }

As we see, if we would like to calculate the score average by class it's not possible to do it one shot because each student has multiple score > db.students.aggregate([{$group:{_id:{class_id:"$class_id", student_id:"$student_id"}, score:{$avg:"$scores.score"}}}, {$group:{_id:"$_id.class_id", score:{$avg:"$score"}}}])

Aggregation Using Projection

This aggregation allow to reshape a document. For example if we want to trasform this document
{
"city" : "ACMAR",
"loc" : [
-86.51557,
33.584132
],
"pop" : 6055,
"state" : "AL",
"_id" : "35004"
}

To

{
"city" : "acmar",
"pop" : 6055,
"state" : "AL",
"zip" : "35004"
}

This query is the solution:
> db.zips.aggregate([{$project:{'zip':'$_id','city': {$toLower:"$city"}, 'pop':1, 'state':1, _id:0 }} ])

Filtering data on Aggregation

The use of $match before $group allow to filter data before execute the grouping aggregation.
> db.zips.aggregate([{$match:{state:"NY"}},{$group:{_id: "$city", population: {$sum:"$pop"}, zip_codes: {$addToSet: "$_id"}}},{$project:{ _id: 0, city: "$_id", population: 1, zip_codes:1}}])