Aggregation
Indexing
Profiling
Rohit Kumar
rohit.kumar@tothenew.com
Manish Kapoor
manish.kapoor@tothenew.com
Agenda
● Recap of insertion and finding documents
● Aggregation Framework
o The Aggregation Pipeline
o Aggregation Operators
o Mapping between SQL and Aggregation
o Limitations of the Aggregation Framework
● Map-reduce Queries
● Performance
o Indexes
o Logging slow queries
o Profiling
RECAP
First things first
Aggregation Pipeline
The MongoDB aggregation pipeline consists of stages. Documents pass
through each stage and output of each stage is input of next stage.
Some stages may generate new documents or filter out documents.
Pipeline stages can appear multiple times in the pipeline.
What is aggregation pipeline?
Aggregation Pipeline
Aggregation Pipeline Stages
Pipeline Stages $group
{$group: {_id: '$city', names: {$push:”$name”}}}
Group
Pipeline Stages $group
{$group: {_id: '$city', count: {'$sum':1}}}
Group
Pipeline Stages $group
{$group: {_id:”$city”, maxAge:{$max:’$age’}}}
Some more accumulator operators
Pipeline Stages $match
{$match: {age:{$gt:20}}}
Pipeline Stages $limit
{$limit: 10}
Pipeline Stages $sort
{$sort: {age:-1}}
Pipeline Stages $unwind
Limitations of aggregation framework
● Result Size Restrictions
● Memory Restrictions
● Limited use of index
● Sharded collections
Exercises
1. Count number of students in group by age
2. Find name of students group by age
3. Find average age of students group by city
4. Count number of students in group by age belonging to
Chennai.
Map Reduce
What Documentation says?
Map-reduce is a data processing paradigm for
condensing large volumes of data into useful
aggregated results. For map-reduce operations,
MongoDB provides the mapReduce database
command.
Learn Map-Reduce by playing cards
How it works in mongo?
Map-Reduce Command
map function
The map function is responsible for transforming each
input document into zero or more documents.
map function
• In the map function, reference the current document as this within the function.
• The map function should not access the database for any reason.
• The map function should be pure, or have no impact outside of the function (i.e. side
effects.)
• The map function may optionally call emit(key,value) any number of times to create
an output document associating key with value.
The following example will call emit either 0 or 1 time.
The
reduce function
reduce function has the following prototype
reduce function
● The reduce function should not access the database, even to perform read
operations.
● The reduce function should not affect the outside system.
● MongoDB will not call the reduce function for a key that has only a single value. The
values argument is an array whose elements are the value objects that are “mapped”
to the key.
● MongoDB can invoke the reduce function more than once for the same key. In this
case, the previous output from the reduce function for that key will become one of
the input values to the next reduce function invocation for that key.
● The reduce function must return an object whose type must be identical to the type
of the value emitted by the map function.
● The reduce function must be idempotent.
Idempotent function
finalize function
The finalize function has the following prototype
It receives as its arguments a key value and the
reducedValue from the reduce function.
Exercises
Index
Query Performance
db.collection.find().explain(verbosity)
Verbosity: "queryPlanner", "executionStats", and "allPlansExecution".
Creating Index:
db.collectionName.createIndex({name: 1})
db.collection.createIndex( <key and index type specification>, <options> )
Index
Single field Index:
db.collectionName.createIndex({name: 1})
db.collection.createIndex( <key and index type specification>, <options> )
Given the following document in the friends collection:
{
"_id" : ObjectId(...),
"name" : "Alice",
"age" : 27
}
The following command creates an index on the name field:
db.friends.createIndex( { "name" : 1 })
Index
Compound Index:
Index structure holds references to multiple fields within a collection’s documents.
Consider following document in people collection:
{ "_id" : ObjectId(...),
"name" : "John Doe",
"dateOfBirth": Date(..)
"address": {
"street": "Main",
"zipcode": "53511",
"state": "WI"
}
}
db.people.createIndex({'name':1, 'dateOfBirth':1})
MongoDB imposes a limit of 31 fields for any compound index.
Index
Compound Index(cntd):
Prefixes in Compound indexes.
Index prefixes are the beginning subsets of indexed fields. For example, consider the following compound index:
{ "item": 1, "location": 1, "stock":1 }
The index has the following index prefixes:
{ item: 1 }
{ item: 1, location: 1 }
What about find({item: ‘Pen’, ‘stock’:{‘$gt’: 20}})
what about fields item and stock?
MongoDB can also use the index to support a query on item and stock fields since item field corresponds to a prefix.
However, the index would not be as efficient in supporting the query as would be an index on only item and stock.
However, MongoDB cannot use the index to support queries that include the following fields since without the item field, none
of the listed fields correspond to a prefix index:
the location field,
the stock field,
the location and stock field,
.
Index
Index on Embedded field:
Consider following document in people collection:
{
"_id" : ObjectId(...),
"name" : "John Doe",
"address": {
"street": "Main",
"zipcode": "53511",
"state": "WI"
}
}
db.people.createIndex({address.zipcode:1})
Index
Index on Embedded document:
Consider following document in people collection:
{
"_id" : ObjectId("5587afde264fb5b4e25b1556"),
"name" : "John Doe",
"address" : {
"city" : "New York",
"state" : "NY"
}
}
{
"_id" : ObjectId("5587affd264fb5b4e25b1557"),
"name" : "John",
"address" : {
"city" : "New Delhi",
"state" : "ND"
}
}
db.people.createIndex({address:1})
Index
Index on Embedded document(cntd.):
Important: When performing equality matches on embedded documents, field order matters.
i.e.
db.people.find({address:{'city': 'New York', state: 'NY'}}) will find result
{ "_id" : ObjectId("5587afde264fb5b4e25b1556"), "name" : "John Doe", "address" : { "city" :
"New York", "state" : "NY" } }
but
db.people.find({address:{'state':'NY', 'city': 'New York'}}) will not find any result
Logging and Killing Slow Queries
● db.currentOp()
● db.killOp(opId)
Profiling
Enabling Profiling:
db.setProfilingLevel(level)
db.setProfilingLevel(level, slowOpThresholdInMillis)
Profiling Levels:
The following profiling levels are available:
0 - the profiler is off, does not collect any data. mongod always writes operations longer than the slowOpThresholdMs
threshold to its log. This is the default profiler level.
1 - collects profiling data for slow operations only. By default slow operations are those slower than 100 milliseconds.
You can modify the threshold for “slow” operations with the slowOpThresholdMs runtime option or the setParameter
command. See the Specify the Threshold for Slow Operations section for more information.
2 - collects profiling data for all database operations.
Profiling
Viewing Profiling Data:
db.system.profile.find()
ns(namespace)
ts(timestamp)
millis
user
To return operations from ‘test; collection for ‘mydb’
db.system.profile.find( { ns : 'mydb.test' } ).pretty()
To return operations slower than 5 milliseconds, run a query similar to the following:
db.system.profile.find( { millis : { $gt : 5 } } ).pretty()
Profiling
Viewing Profiling Data(cntd.):
To show five slowest queries run after a certain timestamp
db.system.profile.find({"ts": {"$gt": ISODate("2016-04-29T02:48:42.019Z")}},{ts:1, millis:1, command:1,
query:1, ns:1}).sort({millis:-1}).limit(5).pretty()
Questions
Thanks
References
• http://blog.nahurst.com/visual-guide-to-nosql-systems
• https://docs.mongodb.org/manual/reference/database-profiler/
• http://www.slideshare.net/sebprunier/mongodb-aggregation-framework-in-action
• http://image.slidesharecdn.com/nantesmug-mongodbaggregationframework-
150121023633-conversion-gate02/95/mongodb-aggregation-framework-in-action-9-
638.jpg?cb=1421829478
• https://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
• DB script: https://gist.github.com/rohitbishnoi/6e6d9556ba0569c18f805a585029f5f8

MongoDB Aggregations Indexing and Profiling

  • 1.
  • 2.
    Agenda ● Recap ofinsertion and finding documents ● Aggregation Framework o The Aggregation Pipeline o Aggregation Operators o Mapping between SQL and Aggregation o Limitations of the Aggregation Framework ● Map-reduce Queries ● Performance o Indexes o Logging slow queries o Profiling
  • 3.
  • 4.
    Aggregation Pipeline The MongoDBaggregation pipeline consists of stages. Documents pass through each stage and output of each stage is input of next stage. Some stages may generate new documents or filter out documents. Pipeline stages can appear multiple times in the pipeline. What is aggregation pipeline?
  • 5.
  • 6.
  • 7.
    Pipeline Stages $group {$group:{_id: '$city', names: {$push:”$name”}}} Group
  • 8.
    Pipeline Stages $group {$group:{_id: '$city', count: {'$sum':1}}} Group
  • 9.
    Pipeline Stages $group {$group:{_id:”$city”, maxAge:{$max:’$age’}}} Some more accumulator operators
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
    Limitations of aggregationframework ● Result Size Restrictions ● Memory Restrictions ● Limited use of index ● Sharded collections
  • 15.
    Exercises 1. Count numberof students in group by age 2. Find name of students group by age 3. Find average age of students group by city 4. Count number of students in group by age belonging to Chennai.
  • 16.
  • 17.
    What Documentation says? Map-reduceis a data processing paradigm for condensing large volumes of data into useful aggregated results. For map-reduce operations, MongoDB provides the mapReduce database command.
  • 18.
    Learn Map-Reduce byplaying cards
  • 19.
    How it worksin mongo?
  • 20.
  • 21.
    map function The mapfunction is responsible for transforming each input document into zero or more documents.
  • 22.
    map function • Inthe map function, reference the current document as this within the function. • The map function should not access the database for any reason. • The map function should be pure, or have no impact outside of the function (i.e. side effects.) • The map function may optionally call emit(key,value) any number of times to create an output document associating key with value. The following example will call emit either 0 or 1 time. The
  • 23.
    reduce function reduce functionhas the following prototype
  • 24.
    reduce function ● Thereduce function should not access the database, even to perform read operations. ● The reduce function should not affect the outside system. ● MongoDB will not call the reduce function for a key that has only a single value. The values argument is an array whose elements are the value objects that are “mapped” to the key. ● MongoDB can invoke the reduce function more than once for the same key. In this case, the previous output from the reduce function for that key will become one of the input values to the next reduce function invocation for that key. ● The reduce function must return an object whose type must be identical to the type of the value emitted by the map function. ● The reduce function must be idempotent.
  • 25.
  • 26.
    finalize function The finalizefunction has the following prototype It receives as its arguments a key value and the reducedValue from the reduce function.
  • 27.
  • 28.
    Index Query Performance db.collection.find().explain(verbosity) Verbosity: "queryPlanner","executionStats", and "allPlansExecution". Creating Index: db.collectionName.createIndex({name: 1}) db.collection.createIndex( <key and index type specification>, <options> )
  • 29.
    Index Single field Index: db.collectionName.createIndex({name:1}) db.collection.createIndex( <key and index type specification>, <options> ) Given the following document in the friends collection: { "_id" : ObjectId(...), "name" : "Alice", "age" : 27 } The following command creates an index on the name field: db.friends.createIndex( { "name" : 1 })
  • 30.
    Index Compound Index: Index structureholds references to multiple fields within a collection’s documents. Consider following document in people collection: { "_id" : ObjectId(...), "name" : "John Doe", "dateOfBirth": Date(..) "address": { "street": "Main", "zipcode": "53511", "state": "WI" } } db.people.createIndex({'name':1, 'dateOfBirth':1}) MongoDB imposes a limit of 31 fields for any compound index.
  • 31.
    Index Compound Index(cntd): Prefixes inCompound indexes. Index prefixes are the beginning subsets of indexed fields. For example, consider the following compound index: { "item": 1, "location": 1, "stock":1 } The index has the following index prefixes: { item: 1 } { item: 1, location: 1 } What about find({item: ‘Pen’, ‘stock’:{‘$gt’: 20}}) what about fields item and stock? MongoDB can also use the index to support a query on item and stock fields since item field corresponds to a prefix. However, the index would not be as efficient in supporting the query as would be an index on only item and stock. However, MongoDB cannot use the index to support queries that include the following fields since without the item field, none of the listed fields correspond to a prefix index: the location field, the stock field, the location and stock field, .
  • 32.
    Index Index on Embeddedfield: Consider following document in people collection: { "_id" : ObjectId(...), "name" : "John Doe", "address": { "street": "Main", "zipcode": "53511", "state": "WI" } } db.people.createIndex({address.zipcode:1})
  • 33.
    Index Index on Embeddeddocument: Consider following document in people collection: { "_id" : ObjectId("5587afde264fb5b4e25b1556"), "name" : "John Doe", "address" : { "city" : "New York", "state" : "NY" } } { "_id" : ObjectId("5587affd264fb5b4e25b1557"), "name" : "John", "address" : { "city" : "New Delhi", "state" : "ND" } } db.people.createIndex({address:1})
  • 34.
    Index Index on Embeddeddocument(cntd.): Important: When performing equality matches on embedded documents, field order matters. i.e. db.people.find({address:{'city': 'New York', state: 'NY'}}) will find result { "_id" : ObjectId("5587afde264fb5b4e25b1556"), "name" : "John Doe", "address" : { "city" : "New York", "state" : "NY" } } but db.people.find({address:{'state':'NY', 'city': 'New York'}}) will not find any result
  • 35.
    Logging and KillingSlow Queries ● db.currentOp() ● db.killOp(opId)
  • 36.
    Profiling Enabling Profiling: db.setProfilingLevel(level) db.setProfilingLevel(level, slowOpThresholdInMillis) ProfilingLevels: The following profiling levels are available: 0 - the profiler is off, does not collect any data. mongod always writes operations longer than the slowOpThresholdMs threshold to its log. This is the default profiler level. 1 - collects profiling data for slow operations only. By default slow operations are those slower than 100 milliseconds. You can modify the threshold for “slow” operations with the slowOpThresholdMs runtime option or the setParameter command. See the Specify the Threshold for Slow Operations section for more information. 2 - collects profiling data for all database operations.
  • 37.
    Profiling Viewing Profiling Data: db.system.profile.find() ns(namespace) ts(timestamp) millis user Toreturn operations from ‘test; collection for ‘mydb’ db.system.profile.find( { ns : 'mydb.test' } ).pretty() To return operations slower than 5 milliseconds, run a query similar to the following: db.system.profile.find( { millis : { $gt : 5 } } ).pretty()
  • 38.
    Profiling Viewing Profiling Data(cntd.): Toshow five slowest queries run after a certain timestamp db.system.profile.find({"ts": {"$gt": ISODate("2016-04-29T02:48:42.019Z")}},{ts:1, millis:1, command:1, query:1, ns:1}).sort({millis:-1}).limit(5).pretty()
  • 39.
  • 40.
  • 41.
    References • http://blog.nahurst.com/visual-guide-to-nosql-systems • https://docs.mongodb.org/manual/reference/database-profiler/ •http://www.slideshare.net/sebprunier/mongodb-aggregation-framework-in-action • http://image.slidesharecdn.com/nantesmug-mongodbaggregationframework- 150121023633-conversion-gate02/95/mongodb-aggregation-framework-in-action-9- 638.jpg?cb=1421829478 • https://docs.mongodb.org/manual/reference/sql-aggregation-comparison/ • DB script: https://gist.github.com/rohitbishnoi/6e6d9556ba0569c18f805a585029f5f8