MongoDB World 2016 : Advanced Aggregation

Advanced MongoDB
Aggregation
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole

#MDBW16
MongoDB Aggregation Framework
• One of the least understood parts of MongoDB
• Many customers never get to the aggregation framework
• Had limited utility 2.6 (16MB limit)
• The aggregation framework “grew up” in 3.0
• We continue to enhance (3.2 and now 3..4)

#MDBW16
A quick reminder
• A Processing Pipeline
• Design to process large groups of documents in parallel
• Is shard aware
• Can create new data from old

#MDBW16
A Typical Pipeline
Match Project Group Sort
Find Query Select Fields
Combine Fields
Rename fields
Calculate
Group By
Execute accumulators
Rename fields
Sort results

#MDBW16
Description of Data Set
• MOT : Ministry of Transport, then
• VOSA : Vehicle and Operator Services Agency and then
• DVSA : Driver and Vehicle Services Agency
• The test is still called the MOT Test
• It is a test of road worthiness
• Nearly every motorised vehicle must pass an MOT every year
• New cars are exempt for three years

#MDBW16
Introduction to the MOT Public Data Set

#MDBW16
This is a big Collection
> db.test_results.count()
253472477
>
253 million records

#MDBW16
Example Document
{ "_id" : ObjectId("5759ee6e8684975e1098af68"),
"TestID" : 400,
"VehicleID" : "278",
"TestDate" : ISODate("2013-04-23T00:00:00Z"),
"TestClassID" : "4",
"TestType" : "N",
"TestResult" : "P",
"TestMileage" : 99284,
"Postcode" : "E",
"Make" : "AUDI",
"Model" : "A3",
"Colour" : "BLACK",
"FuelType" : "P",
"CylinderCapacity" : 1598,
"FirstUseDate" : ISODate("2003-11-11T00:00:00Z“) }

#MDBW16
We will work with one year : 2013
> db.results_2013.count()
37390457
Still 37.3 million records

#MDBW16
Use $match to filter
cars = { "$match" : { "TestClassID" : { "$eq" : "4" }}}
motorcycles = {
"$match" : {
"$or" : [
{
"TestClassID" : "1"
},
{
"TestClassID" : "2"
}
]
}
}

#MDBW16
Lets make a collection of Cars
removeNulls = { "$match" : { "FirstUseDate" : { "$ne" : "NULL" }}}
carsonly = { "$match" : { "TestClassID" : “4”}}
output = { “$out” : “cars_2013” }
db.results_2013.aggregate( [ removeNulls, carsonly, output ] )

#MDBW16
Now lets create a Summary
For each car of a given make and age:
• The total number of cars
• The average mileage for this collection of cars
• The total number of passes

#MDBW16
We $project to get the data we want
ageinusecs ={ "$subtract" :[ "$TestDate", "$FirstUseDate”]}
ageinyears = { "$divide" :[ ageinusecs,(1000*3600*24*365)]}
floorage = { "$floor" : ageinyears }
ispass = { "$cond" : [{"$eq": ["$TestResult","P"]},1,0]}

#MDBW16
The actual $projection
project =
{ "$project" : { "Make” :1,
"VehicleID" :1,
"TestResult” :1,
"TestDate” :1,
"TestMileage” :1,
"FirstUseDate” :1,
"Age” :floorage,
"pass” :ispass }}

#MDBW16
We Then $group to get the accumulated values
group = { "$group" :
{ "_id" : { "make": "$Make", "age" : "$Age" },
"count" : {"$sum":1} ,
"miles” : {"$avg":"$TestMileage"},
"passes": {"$sum":"$pass” }}}

#MDBW16
Now put it all in pipeline
db.results_2013.aggregate([project,group,out])
out = { "$out" : ”cars_summary" }
The $out operator creates a new collection

#MDBW16
Restrictions
• 16MB result limit if not using cursors or $out
• 100MB in memory limit unless $allowDiskUse is specified
• $geoNear must be the first operator in a pipeline
• $out must be the last operator in the pipeline

#MDBW16
Market Size
$36 Billion
Partners
1,000+
International Offices
15
Global Employees
575+
Downloads Worldwide
15,000,000+
Make a GIANT Impact
www.mongodb.com/careers

MongoDB World 2016 : Advanced Aggregation

MongoDB World 2016 : Advanced Aggregation

More Related Content

What's hot

Viewers also liked

Similar to MongoDB World 2016 : Advanced Aggregation

More from Joe Drumgoole

Recently uploaded

MongoDB World 2016 : Advanced Aggregation

Editor's Notes