Aggregation Framework
Senior Solutions Architect, MongoDB
Rick Houlihan
MongoDB World
Agenda
• What is theAggregation Framework?
• The Aggregation Pipeline
• Usage and Limitations
• Aggregation and Sharding
• Summary
What is the Aggregation
Framework?
Aggregation Framework
Aggregation in Nutshell
• We're storing our data in
MongoDB
• Our applications need ad-hoc
queries
• We must have a way to reshape
data easily
• You can use Aggregation Framework for
this!
• Extremely versatile, powerful
• Overkill for simple aggregation
tasks
• Averages
• Summation
• Grouping
• Reshaping
MapReduce is great, but…
• High level of complexity
• Difficult to program and debug
Aggregation Framework
• Plays nice with sharding
• Executes in native code
– Written in C++
– JSON parameters
• Flexible, functional, and simple
– Operation pipeline
– Computational expressions
Aggregation Pipeline
What is an Aggregation Pipeline?
• ASeries of Document Transformations
– Executed in stages
– Original input is a collection
– Output as a document, cursor or a collection
• Rich Library of Functions
– Filter, compute, group, and summarize data
– Output of one stage sent to input of next
– Operations executed in sequential order
$match $project $group $sort
Pipeline Operators
• $sort
• Order documents
• $limit / $skip
• Paginate documents
• $redact
• Restrict documents
• $geoNear
• Proximity sort
documents
• $let, $map
• Subexpression variables
• $match
• Filter documents
• $project
• Reshape documents
• $group
• Summarize documents
• $unwind
• Expand documents
{
_id: 375,
title: "The Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
chapters: 9,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
Our Example Data
$match
• Filter documents
– Uses existing query syntax
– Can facilitate shard exclusion
– No $where (server side Javascript)
Matching Field Values
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{ $match: {
language: "Russian"
}}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
Matching with Query Operators
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{ $match: {
pages: {$gt:100}
}}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{
title: ”Atlas Shrugged",
pages: 1088,
language: “English"
}
$project
• Reshape Documents
– Include, exclude or rename
fields
– Inject computed fields
– Create sub-document fields
Including and Excluding Fields
{
_id: 375,
title: "Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
{ $project: {
_id: 0,
title: 1,
language: 1
}}
{
title: "Great Gatsby",
language: "English"
}
Renaming and Computing Fields
{
_id: 375,
title: "Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
chapters: 9,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
{ $project: {
avgChapterLength: {
$divide: ["$pages",
"$chapters"]
},
lang: "$language"
}}
{
_id: 375,
avgChapterLength: 24.2222,
lang: "English"
}
Creating Sub-Document Fields
{
_id: 375,
title: "Great Gatsby",
ISBN: "9781857150193",
available: true,
pages: 218,
chapters: 9,
subjects: [
"Long Island",
"New York",
"1920s"
],
language: "English"
}
{ $project: {
title: 1,
stats: {
pages: "$pages",
language: "$language",
}
}}
{
_id: 375,
title: "Great Gatsby",
stats: {
pages: 218,
language: "English"
}
}
$group
• Group documents by value
– Field reference, object, constant
– Other output fields are computed
• $max, $min, $avg, $sum
• $addToSet, $push
• $first, $last
– Processes all data in memory by
default
Calculating An Average
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{ $group: {
_id: "$language",
avgPages: { $avg:
"$pages" }
}}
{
_id: "Russian",
avgPages: 1440
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
_id: "English",
avgPages: 653
}
Summing Fields and Counting
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{ $group: {
_id: "$language",
pages: { $sum: "$pages" },
books: { $sum: 1 }
}}
{
_id: "Russian",
pages: 1440,
books: 1
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
_id: "English",
pages: 1316,
books: 2
}
Collecting Distinct Values
{
title: "The Great Gatsby",
pages: 218,
language: "English"
}
{ $group: {
_id: "$language",
titles: { $addToSet: "$title" }
}}
{
_id: "Russian",
titles: [“War and Peace”]
}
{
title: "War and Peace",
pages: 1440,
language: "Russian"
}
{
title: "Atlas Shrugged",
pages: 1088,
language: "English"
}
{
_id: "English",
titles: [
"Atlas Shrugged",
"The Great Gatsby” ]
}
$unwind
• Operate on an array field
– Create documents from array elements
• Array replaced by element value
• Missing/empty fields → no output
• Non-array fields → error
– Pipe to $group to aggregate
Collecting Distinct Values
{
title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: [
"Long Island",
"New York",
"1920s"
]
}
{ title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "Long Island” }
{ title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "New York” }
{ title: "The Great Gatsby",
ISBN: "9781857150193",
subjects: "1920s” }
{ $unwind: "$subjects" }
$sort, $limit, $skip
• Sort documents by one or more fields
– Same order syntax as cursors
– Waits for earlier pipeline operator to return
– In-memory unless early and indexed
• Limit and skip follow cursor behavior
Sort All the Documents in the
Pipeline
{ title: “Animal Farm” }
{ $sort: {title: 1} }
{ title: “Brave New World” }
{ title: “Great Gatsby” }
{ title: “Grapes of Wrath, The” }
{ title: “Lord of the Flies” }
{ title: “Great Gatsby, The” }
{ title: “Brave New World” }
{ title: “Grapes of Wrath” }
{ title: “Animal Farm” }
{ title: “Lord of the Flies” }
Limit Documents Through the
Pipeline
{ title: “Great Gatsby, The” }
{ $limit: 5 }
{ title: “Brave New World” }
{ title: “Grapes of Wrath” }
{ title: “Animal Farm” }
{ title: “Lord of the Flies” }
{ title: “Great Gatsby, The” }
{ title: “Brave New World” }
{ title: “Grapes of Wrath” }
{ title: “Animal Farm” }
{ title: “Lord of the Flies” }
{ title: “Fathers and Sons” }
{ title: “Invisible Man” }
Skip Documents in the Pipeline
{ title: “Animal Farm” }
{ $skip: 3 }
{ title: “Lord of the Flies” }
{ title: “Fathers and Sons” }
{ title: “Invisible Man” }
{ title: “Great Gatsby, The” }
{ title: “Brave New World” }
{ title: “Grapes of Wrath” }
{ title: “Animal Farm” }
{ title: “Lord of the Flies” }
{ title: “Fathers and Sons” }
{ title: “Invisible Man” }
$redact
• Restrict access to Documents
– Use document fields to define privileges
– Apply conditional queries to validate users
• Field LevelAccess Control
– $$DESCEND, $$PRUNE, $$KEEP
– Applies to root and subdocument fields
{
_id: 375,
item: "Sony XBR55X900A 55Inch 4K Ultra High Definition TV",
Manufacturer: "Sony",
security: 0,
quantity: 12,
list: 4999,
pricing: {
security: 1,
sale: 2698,
wholesale: {
security: 2,
amount: 2300 }
}
}
$redact Example Data
Query by Security Level
security =
0
db.catalog.aggregate([
{
$match: {item: /^.*XBR55X900A*/}
},
{
$redact: {
$cond: {
if: { $lte: [ "$security", ?? ] },
then: "$$DESCEND",
else: "$$PRUNE"
}
}
}])
{
"_id" : 375,
"item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV",
"Manufacturer" : "Sony”,
"security" : 0,
"quantity" : 12,
"list" : 4999
}
{
"_id" : 375,
"item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition
TV",
"Manufacturer" : "Sony",
"security" : 0,
"quantity" : 12,
"list" : 4999,
"pricing" : {
"security" : 1,
"sale" : 2698,
"wholesale" : {
"security" : 2,
"amount" : 2300
}
}
}
security =
2
$geoNear
• Order/Filter Documents by Location
– Requires a geospatial index
– Output includes physical distance
– Must be first aggregation stage
{
"_id" : 10021,
"city" : “NEW YORK”,
"loc" : [
-73.958805,
40.768476
],
"pop" : 106564,
"state" : ”NY”
}
$geonear Example Data
Query by Proximity
db.catalog.aggregate([
{
$geoNear : {
near: [ -86.000, 33.000 ],
distanceField: "dist",
maxDistance: .050,
spherical: true,
num: 3
}
}])
{
"_id" : "35089",
"city" : "KELLYTON",
"loc" : [ -86.048397, 32.979068 ],
"pop" : 1584,
"state" : "AL",
"dist" : 0.0007971432165364155
},
{
"_id" : "35010",
"city" : "NEW SITE",
"loc" : [ -85.951086, 32.941445 ],
"pop" : 19942,
"state" : "AL",
"dist" : 0.0012479615347306806
},
{
"_id" : "35072",
"city" : "GOODWATER",
"loc" : [ -86.078149, 33.074642 ],
"pop" : 3813,
"state" : "AL",
"dist" : 0.0017333719627032555
}
$let / $map
• Bind variables to subexpressions
– Apply conditional logic
– Define complex calculations
– Operate on array field values
{
"_id" : 1,
”price" : 10,
”tax" : 0.50,
”discount" : true
}
$let Example Data
Subexpression Calculations
db.sales.aggregate( [
{
$project: {
finalPrice: {
$let: {
vars: {
total: { $cond: {
if: '$applyDiscount',
then: { $multiply: [0.9, '$price’] },
else: '$price'
}
}
},
in: { $add: [ "$$total", '$tax'] }
}}}}])
{ "_id" : 1, "finalPrice" : 9.5 }
{ "_id" : 2, "finalPrice" : 10.25 }
{
"_id" : 1,
”price" : 10,
”tax" : 0.50,
”discount" : true,
”units" : [ 1, 0, 3, 4, 0, 0, 10, 12, 6, 5 ]
}
$map Example Data
Subexpressions on Arrays
db.sales.aggregate( [ {
$project: {
finalPrice: {
$map: {
input: "$units",
as: "unit",
in: {
$multiply: [ “$$unit”, {
$cond: {
if: '$applyDiscount', then: {
$add : [
{ $multiply: [ 0.9, '$price'] }, '$tax’ ] },
else: { $add: [ '$price', '$tax’ ] }
} } ] } } } } } ] )
{
"_id" : 1,
"finalPrice" :
[ 9.5, 0, 28.5, 38, 0, 0, 95, 114, 57, 47.5 ]
}
{
"_id" : 2,
"finalPrice" :
[ 51.25, 30.75, 20.5, 51.25, 0, 0, 0, 30.75, 41, 71.75 ]
}
Aggregation and Sharding
Sharding
Result
mongos
Shard 1
(Primary)
$match,
$project, $group
Shard 2
$match,
$project, $group
Shard 3
excluded
Shard 4
$match,
$project, $group
• Workload split between shards
– Shards execute pipeline up to a point
– Primary shard merges cursorsand
continues processing*
– Use explain to analyze pipeline split
– Early $match may excuse shards
– Potential CPU and memory implications
for primary shard host
* Priortov2.6secondstagepipelineprocessingwasdonebymongos
Usage and Limitations
Usage
• collection.aggregate([…], {<options>})
– Returns a cursor
– Takes an optional document to specify aggregation options
• allowDiskUse, explain
– Use $out to send results to a Collection
• db.runCommand({aggregate:<collection>, pipeline:[…]})
– Returns a document, limited to 16 MB
Collection
db.books.aggregate([
{ $project: { language: 1 }},
{ $group: { _id: "$language", numTitles: { $sum: 1 }}}
])
{ _id: "Russian", numTitles: 1 },
{ _id: "English", numTitles: 2 }
Database Command
db.runCommand({
aggregate: "books",
pipeline: [
{ $project: { language: 1 }},
{ $group: { _id: "$language", numTitles: { $sum: 1
}}}
]
})
{
result : [
{ _id: "Russian", numTitles: 1 },
{ _id: "English", numTitles: 2 }
],
“ok” : 1
}
Limitations
• Pipeline operator memory limits
– Stages limited to 100 MB
– “allowDiskUse” for larger data sets
• Some BSON types unsupported
– Symbol, MinKey, MaxKey, DBRef, Code, and
CodeWScope
Summary
Aggregation Use Cases
Ad-hoc reporting
Real-timeAnalytics
Transforming Data
Enabling Developers and DBA’s
• Do more with MongoDB and do it
faster
• Eliminate MapReduce
– Replace pages of JavaScript
– More efficient data processing
• Not just a nice feature
– Enabler for real time big data analytics
Thank You

The Aggregation Framework

  • 1.
    Aggregation Framework Senior SolutionsArchitect, MongoDB Rick Houlihan MongoDB World
  • 2.
    Agenda • What istheAggregation Framework? • The Aggregation Pipeline • Usage and Limitations • Aggregation and Sharding • Summary
  • 3.
    What is theAggregation Framework?
  • 4.
  • 5.
    Aggregation in Nutshell •We're storing our data in MongoDB • Our applications need ad-hoc queries • We must have a way to reshape data easily • You can use Aggregation Framework for this!
  • 6.
    • Extremely versatile,powerful • Overkill for simple aggregation tasks • Averages • Summation • Grouping • Reshaping MapReduce is great, but… • High level of complexity • Difficult to program and debug
  • 7.
    Aggregation Framework • Playsnice with sharding • Executes in native code – Written in C++ – JSON parameters • Flexible, functional, and simple – Operation pipeline – Computational expressions
  • 8.
  • 9.
    What is anAggregation Pipeline? • ASeries of Document Transformations – Executed in stages – Original input is a collection – Output as a document, cursor or a collection • Rich Library of Functions – Filter, compute, group, and summarize data – Output of one stage sent to input of next – Operations executed in sequential order $match $project $group $sort
  • 10.
    Pipeline Operators • $sort •Order documents • $limit / $skip • Paginate documents • $redact • Restrict documents • $geoNear • Proximity sort documents • $let, $map • Subexpression variables • $match • Filter documents • $project • Reshape documents • $group • Summarize documents • $unwind • Expand documents
  • 11.
    { _id: 375, title: "TheGreat Gatsby", ISBN: "9781857150193", available: true, pages: 218, chapters: 9, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } Our Example Data
  • 12.
    $match • Filter documents –Uses existing query syntax – Can facilitate shard exclusion – No $where (server side Javascript)
  • 13.
    Matching Field Values { title:"Atlas Shrugged", pages: 1088, language: "English" } { title: "The Great Gatsby", pages: 218, language: "English" } { title: "War and Peace", pages: 1440, language: "Russian" } { $match: { language: "Russian" }} { title: "War and Peace", pages: 1440, language: "Russian" }
  • 14.
    Matching with QueryOperators { title: "Atlas Shrugged", pages: 1088, language: "English" } { title: "The Great Gatsby", pages: 218, language: "English" } { title: "War and Peace", pages: 1440, language: "Russian" } { $match: { pages: {$gt:100} }} { title: "War and Peace", pages: 1440, language: "Russian" } { title: ”Atlas Shrugged", pages: 1088, language: “English" }
  • 15.
    $project • Reshape Documents –Include, exclude or rename fields – Inject computed fields – Create sub-document fields
  • 16.
    Including and ExcludingFields { _id: 375, title: "Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } { $project: { _id: 0, title: 1, language: 1 }} { title: "Great Gatsby", language: "English" }
  • 17.
    Renaming and ComputingFields { _id: 375, title: "Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, chapters: 9, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } { $project: { avgChapterLength: { $divide: ["$pages", "$chapters"] }, lang: "$language" }} { _id: 375, avgChapterLength: 24.2222, lang: "English" }
  • 18.
    Creating Sub-Document Fields { _id:375, title: "Great Gatsby", ISBN: "9781857150193", available: true, pages: 218, chapters: 9, subjects: [ "Long Island", "New York", "1920s" ], language: "English" } { $project: { title: 1, stats: { pages: "$pages", language: "$language", } }} { _id: 375, title: "Great Gatsby", stats: { pages: 218, language: "English" } }
  • 19.
    $group • Group documentsby value – Field reference, object, constant – Other output fields are computed • $max, $min, $avg, $sum • $addToSet, $push • $first, $last – Processes all data in memory by default
  • 20.
    Calculating An Average { title:"The Great Gatsby", pages: 218, language: "English" } { $group: { _id: "$language", avgPages: { $avg: "$pages" } }} { _id: "Russian", avgPages: 1440 } { title: "War and Peace", pages: 1440, language: "Russian" } { title: "Atlas Shrugged", pages: 1088, language: "English" } { _id: "English", avgPages: 653 }
  • 21.
    Summing Fields andCounting { title: "The Great Gatsby", pages: 218, language: "English" } { $group: { _id: "$language", pages: { $sum: "$pages" }, books: { $sum: 1 } }} { _id: "Russian", pages: 1440, books: 1 } { title: "War and Peace", pages: 1440, language: "Russian" } { title: "Atlas Shrugged", pages: 1088, language: "English" } { _id: "English", pages: 1316, books: 2 }
  • 22.
    Collecting Distinct Values { title:"The Great Gatsby", pages: 218, language: "English" } { $group: { _id: "$language", titles: { $addToSet: "$title" } }} { _id: "Russian", titles: [“War and Peace”] } { title: "War and Peace", pages: 1440, language: "Russian" } { title: "Atlas Shrugged", pages: 1088, language: "English" } { _id: "English", titles: [ "Atlas Shrugged", "The Great Gatsby” ] }
  • 23.
    $unwind • Operate onan array field – Create documents from array elements • Array replaced by element value • Missing/empty fields → no output • Non-array fields → error – Pipe to $group to aggregate
  • 24.
    Collecting Distinct Values { title:"The Great Gatsby", ISBN: "9781857150193", subjects: [ "Long Island", "New York", "1920s" ] } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "Long Island” } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "New York” } { title: "The Great Gatsby", ISBN: "9781857150193", subjects: "1920s” } { $unwind: "$subjects" }
  • 25.
    $sort, $limit, $skip •Sort documents by one or more fields – Same order syntax as cursors – Waits for earlier pipeline operator to return – In-memory unless early and indexed • Limit and skip follow cursor behavior
  • 26.
    Sort All theDocuments in the Pipeline { title: “Animal Farm” } { $sort: {title: 1} } { title: “Brave New World” } { title: “Great Gatsby” } { title: “Grapes of Wrath, The” } { title: “Lord of the Flies” } { title: “Great Gatsby, The” } { title: “Brave New World” } { title: “Grapes of Wrath” } { title: “Animal Farm” } { title: “Lord of the Flies” }
  • 27.
    Limit Documents Throughthe Pipeline { title: “Great Gatsby, The” } { $limit: 5 } { title: “Brave New World” } { title: “Grapes of Wrath” } { title: “Animal Farm” } { title: “Lord of the Flies” } { title: “Great Gatsby, The” } { title: “Brave New World” } { title: “Grapes of Wrath” } { title: “Animal Farm” } { title: “Lord of the Flies” } { title: “Fathers and Sons” } { title: “Invisible Man” }
  • 28.
    Skip Documents inthe Pipeline { title: “Animal Farm” } { $skip: 3 } { title: “Lord of the Flies” } { title: “Fathers and Sons” } { title: “Invisible Man” } { title: “Great Gatsby, The” } { title: “Brave New World” } { title: “Grapes of Wrath” } { title: “Animal Farm” } { title: “Lord of the Flies” } { title: “Fathers and Sons” } { title: “Invisible Man” }
  • 29.
    $redact • Restrict accessto Documents – Use document fields to define privileges – Apply conditional queries to validate users • Field LevelAccess Control – $$DESCEND, $$PRUNE, $$KEEP – Applies to root and subdocument fields
  • 30.
    { _id: 375, item: "SonyXBR55X900A 55Inch 4K Ultra High Definition TV", Manufacturer: "Sony", security: 0, quantity: 12, list: 4999, pricing: { security: 1, sale: 2698, wholesale: { security: 2, amount: 2300 } } } $redact Example Data
  • 31.
    Query by SecurityLevel security = 0 db.catalog.aggregate([ { $match: {item: /^.*XBR55X900A*/} }, { $redact: { $cond: { if: { $lte: [ "$security", ?? ] }, then: "$$DESCEND", else: "$$PRUNE" } } }]) { "_id" : 375, "item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", "Manufacturer" : "Sony”, "security" : 0, "quantity" : 12, "list" : 4999 } { "_id" : 375, "item" : "Sony XBR55X900A 55Inch 4K Ultra High Definition TV", "Manufacturer" : "Sony", "security" : 0, "quantity" : 12, "list" : 4999, "pricing" : { "security" : 1, "sale" : 2698, "wholesale" : { "security" : 2, "amount" : 2300 } } } security = 2
  • 32.
    $geoNear • Order/Filter Documentsby Location – Requires a geospatial index – Output includes physical distance – Must be first aggregation stage
  • 33.
    { "_id" : 10021, "city": “NEW YORK”, "loc" : [ -73.958805, 40.768476 ], "pop" : 106564, "state" : ”NY” } $geonear Example Data
  • 34.
    Query by Proximity db.catalog.aggregate([ { $geoNear: { near: [ -86.000, 33.000 ], distanceField: "dist", maxDistance: .050, spherical: true, num: 3 } }]) { "_id" : "35089", "city" : "KELLYTON", "loc" : [ -86.048397, 32.979068 ], "pop" : 1584, "state" : "AL", "dist" : 0.0007971432165364155 }, { "_id" : "35010", "city" : "NEW SITE", "loc" : [ -85.951086, 32.941445 ], "pop" : 19942, "state" : "AL", "dist" : 0.0012479615347306806 }, { "_id" : "35072", "city" : "GOODWATER", "loc" : [ -86.078149, 33.074642 ], "pop" : 3813, "state" : "AL", "dist" : 0.0017333719627032555 }
  • 35.
    $let / $map •Bind variables to subexpressions – Apply conditional logic – Define complex calculations – Operate on array field values
  • 36.
    { "_id" : 1, ”price": 10, ”tax" : 0.50, ”discount" : true } $let Example Data
  • 37.
    Subexpression Calculations db.sales.aggregate( [ { $project:{ finalPrice: { $let: { vars: { total: { $cond: { if: '$applyDiscount', then: { $multiply: [0.9, '$price’] }, else: '$price' } } }, in: { $add: [ "$$total", '$tax'] } }}}}]) { "_id" : 1, "finalPrice" : 9.5 } { "_id" : 2, "finalPrice" : 10.25 }
  • 38.
    { "_id" : 1, ”price": 10, ”tax" : 0.50, ”discount" : true, ”units" : [ 1, 0, 3, 4, 0, 0, 10, 12, 6, 5 ] } $map Example Data
  • 39.
    Subexpressions on Arrays db.sales.aggregate([ { $project: { finalPrice: { $map: { input: "$units", as: "unit", in: { $multiply: [ “$$unit”, { $cond: { if: '$applyDiscount', then: { $add : [ { $multiply: [ 0.9, '$price'] }, '$tax’ ] }, else: { $add: [ '$price', '$tax’ ] } } } ] } } } } } ] ) { "_id" : 1, "finalPrice" : [ 9.5, 0, 28.5, 38, 0, 0, 95, 114, 57, 47.5 ] } { "_id" : 2, "finalPrice" : [ 51.25, 30.75, 20.5, 51.25, 0, 0, 0, 30.75, 41, 71.75 ] }
  • 40.
  • 41.
    Sharding Result mongos Shard 1 (Primary) $match, $project, $group Shard2 $match, $project, $group Shard 3 excluded Shard 4 $match, $project, $group • Workload split between shards – Shards execute pipeline up to a point – Primary shard merges cursorsand continues processing* – Use explain to analyze pipeline split – Early $match may excuse shards – Potential CPU and memory implications for primary shard host * Priortov2.6secondstagepipelineprocessingwasdonebymongos
  • 42.
  • 43.
    Usage • collection.aggregate([…], {<options>}) –Returns a cursor – Takes an optional document to specify aggregation options • allowDiskUse, explain – Use $out to send results to a Collection • db.runCommand({aggregate:<collection>, pipeline:[…]}) – Returns a document, limited to 16 MB
  • 44.
    Collection db.books.aggregate([ { $project: {language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ]) { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 }
  • 45.
    Database Command db.runCommand({ aggregate: "books", pipeline:[ { $project: { language: 1 }}, { $group: { _id: "$language", numTitles: { $sum: 1 }}} ] }) { result : [ { _id: "Russian", numTitles: 1 }, { _id: "English", numTitles: 2 } ], “ok” : 1 }
  • 46.
    Limitations • Pipeline operatormemory limits – Stages limited to 100 MB – “allowDiskUse” for larger data sets • Some BSON types unsupported – Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope
  • 47.
  • 48.
    Aggregation Use Cases Ad-hocreporting Real-timeAnalytics Transforming Data
  • 49.
    Enabling Developers andDBA’s • Do more with MongoDB and do it faster • Eliminate MapReduce – Replace pages of JavaScript – More efficient data processing • Not just a nice feature – Enabler for real time big data analytics
  • 50.