Beyond The Basics : Part 2
Analytics and the Aggregation Framework
Joe Drumgoole
Director of Developer Advocacy, EMEA
@jdrumgoole
V1.1
3
Beyond The Basics
– Storage Engines
• What storage engines are and how to pick them
– Aggregation Framework
• How to deploy advanced analytics processing right inside the database
– The BI Connector
• How to create visualisations and dashboards from your MongoDB data
– Authentication and Authorisation
• How to secure MongoDB, both on-premise and in the cloud
4
The Aggregation Framework
• An analytics engine for MongoDB
• What is analytics?
• Think of the two types of database, OLTP, OLAP
• OLTP : Online Transaction Processing
– Airline booking,
– ATMs,
– Taxi booking
• OLAP : Online Analytical Processing
– Which tickets make us most money?
– When do we need to refill our ATMs?
– How many cabs do we need to service the West End of London?
5
OLTP – Online Transaction processing
6
OLTP – Online Transaction processing
sum() avg()
7
The Aggregation Framework – A Processing Pipeline
Match Project Group SortLimit
• Think unix pipeline
• The output of one stage is passed to the input of the next stage
• Each stage performs one job
• Stages can be repeated
• Output is a cursor, a new collection or a view
8
Typical Goals of Aggregation Framework
• Columnar Analytics
• Reshaping data
• Unwinding arrays into individual documents
• Linking collections together
• Generating new data from old (collections and views)
9
Pipeline Operators
• $match
Filter documents
• $project
Reshape documents
• $group
Summarize documents
• $out
Create new collections
• $sort
Order documents
• $limit/$skip
Paginate documents
• $lookup
Join two collections together
• $unwind
Expand an array
10
Example Pipeline
Match Project Group Sort Out
• Find content
• Standard query
• Uses indexes
• Reduce doc
count
• Use first
• Select content
• Remove fields
• Add fields
• Reduce doc
size
• Looks at every
doc
• Collect content
• Sum, Avg etc.
• Rewrite _id
• Reduce doc
count
• Looks at every
doc
• Sort on fields
• Several sorts
allowed
• Ascending or
descending
• 100mb limit
• Allow Disk Use
• New collection
• $out overwrites
• Only one per
aggregate
• Last member
11
Example Document
MongoDB Enterprise > db.members.find( { "batchID" : 138,
"member.member_name" : "Joe Drumgoole" },
{ "_id" : 0, "member.chapters" : 0 } ).pretty()
{
"member" : {
"city" : "Dublin",
"events_attended" : 19,
"last_access_time" : ISODate("2017-04-25T12:40:55Z"),
"country" : "Ireland",
"member_id" : 99473492,
"is_organizer" : true,
"photo_thumb_url" : "https://secure.meetupstatic.com/photos/member/e/5/0/1/thumb_255178625.jpeg",
"location" : {
"type" : "Point",
"coordinates" : [
-6.25,
53.33000183105469
]
},
"member_name" : "Joe Drumgoole",
"join_time" : ISODate("2013-10-30T17:05:31Z")
},
"timestamp" : ISODate("2017-04-26T10:13:54.079Z"),
"batchID" : 138
}
12
Meetup Data
13
Group Document
MongoDB Enterprise > db.groups.findOne({ "batchID": 138 },
{ "group.photos" : 0, "group.topics" : 0, "group.location" : 0, "group.description" : 0, "group.organizers" : 0,
"group.category" : 0, "_id" : 0 } )
{
"batchID" : 138,
"timestamp" : ISODate("2017-04-26T10:12:06.388Z"),
"group" : {
"rsvps_per_event" : 39.285701751708984,
"repeat_rsvpers" : 62,
"upcoming_events" : 0,
"gender_female" : 0.039500001817941666,
"pro_join_date" : ISODate("2017-04-10T18:11:49Z"),
"id" : 10209022,
"city" : "Gent",
"member_count" : 399,
"average_age" : 35.15570068359375,
"status" : "Active",
"founded_date" : ISODate("2013-09-11T14:05:30Z"),
"urlname" : "mongodb-belgium",
"gender_male" : 0.9473999738693237,
"name" : "MongoDB Belgium",
"last_event" : ISODate("2015-06-16T18:00:00Z"),
"country" : "Belgium",
"gender_unknown" : 0.013199999928474426,
"past_events" : 7,
"gender_other" : 0,
"past_rsvps" : 275
}
}
14
Let’s Query for Groups
JD10Gen:apps jdrumgoole$ ./mug_analytics_main.py --stats groups --url mongodb-belgium DublinMUG
Processing : ['mongodb-belgium', 'DublinMUG']
pro
db.groups.aggregate( [
{"$match": {"batchID": 138, "group.urlname": {"$in": ["mongodb-belgium", "DublinMUG"]}}},
{"$project": {"founded": "$group.founded_date", "urlname": "$group.urlname", "_id": 0, "members":
"$group.member_count"}},
])
{'founded': '11-Sep-2013 14:05', 'members': 399, 'urlname': u'mongodb-belgium'}
{'founded': '14-Mar-2012 17:40', 'members': 847, 'urlname': u'DublinMUG'}
Wrote 2 records
JD10Gen:apps jdrumgoole$
15
Attendee Doc
MongoDB Enterprise > db.attendees.findOne( { "batchID" : 138 },
{ "info.event.description" : 0, "_id" : 0, "info.event.group" : 0,
"info.event.venue" : 0, "info.event.rating" : 0 } )
{
"info" : {
"attendee" : {
"status" : "attended",
"member" : {
"name" : "Former member"
},
"rsvp" : {
"response" : "yes",
}
},
"event" : {
"status" : "past",
"event_url" : "https://www.meetup.com/mongodb-belgium/events/162104572/",
"created" : ISODate("2014-01-23T09:10:20Z"),
"rsvp_limit" : 75,
"updated" : ISODate("2014-03-20T11:49:00Z"),
"visibility" : "public",
"yes_rsvp_count" : 75,
"time" : ISODate("2014-03-18T18:00:00Z"),
"headcount" : 0,
"id" : "162104572",
"name" : "MongoDB Belgium #1: the kickoff"
}
},
"timestamp" : ISODate("2017-04-26T10:14:30.129Z"),
"batchID" : 138
}
16
Search for New Members
JD10Gen:apps jdrumgoole$ ./mug_analytics_main.py --stats newmembers --url DublinMUG --sort join_date --format
csv --direction ascending --limit 10
Processing : ['DublinMUG']
Sorting on 'join_date' direction = 'ascending'
db.members.aggregate([
{"$match": {"batchID": 138}},
{"$unwind": "$member.chapters"},
{"$match": {"member.chapters.urlname": {"$in": ["DublinMUG"]}}},
{"$project": {"join_date": "$member.join_time", "_id": 0, "group": "$member.chapters.urlname", "name":
"$member.member_name"}},
{"$limit": 10}])
group,name,join_date
DublinMUG,Gosia,17-Apr-2017 12:51
DublinMUG,Luke Shiels,15-Apr-2017 14:00
DublinMUG,Silvia Sirbu,11-Apr-2017 12:00
DublinMUG,Steeve P.,04-Apr-2017 09:47
DublinMUG,Dafei W,30-Mar-2017 11:36
DublinMUG,Ross Norman,13-Mar-2017 11:30
DublinMUG,Grzegorz F.,08-Mar-2017 10:25
DublinMUG,Lucas Sacramento,07-Mar-2017 11:05
DublinMUG,David Blount,06-Mar-2017 12:33
DublinMUG,Luca Ballerini,06-Mar-2017 10:41
Wrote 10 records
JD10Gen:apps jdrumgoole$
17
Search for New Members This Year
JD10Gen:apps jdrumgoole$ ./mug_analytics_main.py --stats newmembers --url DublinMUG --sort join_date --format csv --
direction ascending --limit 10 --start 1-Jan-2017
Processing : ['DublinMUG']
Sorting on 'join_date' direction = 'ascending'
db.members.aggregate([
{"$match": {"batchID": 138}},
{"$unwind": "$member.chapters"},
{"$match": {"member.chapters.urlname": {"$in": ["DublinMUG"]}}},
{"$match": {"member.join_time": {"$gte": "2017-01-01T00:00:00"}}},
{"$project": {"join_date": "$member.join_time", "_id": 0, "group": "$member.chapters.urlname", "name":
"$member.member_name"}},
{"$limit": 10}])
group,name,join_date
DublinMUG,Gosia,17-Apr-2017 12:51
DublinMUG,Luke Shiels,15-Apr-2017 14:00
DublinMUG,Silvia Sirbu,11-Apr-2017 12:00
DublinMUG,Steeve P.,04-Apr-2017 09:47
DublinMUG,Dafei W,30-Mar-2017 11:36
DublinMUG,Ross Norman,13-Mar-2017 11:30
DublinMUG,Grzegorz F.,08-Mar-2017 10:25
DublinMUG,Lucas Sacramento,07-Mar-2017 11:05
DublinMUG,David Blount,06-Mar-2017 12:33
DublinMUG,Luca Ballerini,06-Mar-2017 10:41
Wrote 10 records
18
Turn an Aggregation into a View
• Only supported on MongoDB 3.4
• Views are a non-materialised view on a collection
MongoDB Enterprise > db.createView( "batch138",
"members",
[ { "$match" : { "batchID" : 138 }} ] )
{ "ok" : 1 }
MongoDB Enterprise >
• A view persists and will return new results each time a find is run
• A view looks just like a collection
• Must turn 3.4 compatibility on
MongoDB Enterprise > db.adminCommand( { setFeatureCompatibilityVersion: "3.4"} )
19
Useful Links
• The Aggregation Python class
https://github.com/jdrumgoole/mongodb_utils/blob/master/mongodb_utils/agg.py
• Aggregation docs
https://docs.mongodb.com/manual/aggregation/
• MongoDB Views in 3.4
https://docs.mongodb.com/manual/core/views/
20
Q&A

Beyond the Basics 2: Aggregation Framework

  • 2.
    Beyond The Basics: Part 2 Analytics and the Aggregation Framework Joe Drumgoole Director of Developer Advocacy, EMEA @jdrumgoole V1.1
  • 3.
    3 Beyond The Basics –Storage Engines • What storage engines are and how to pick them – Aggregation Framework • How to deploy advanced analytics processing right inside the database – The BI Connector • How to create visualisations and dashboards from your MongoDB data – Authentication and Authorisation • How to secure MongoDB, both on-premise and in the cloud
  • 4.
    4 The Aggregation Framework •An analytics engine for MongoDB • What is analytics? • Think of the two types of database, OLTP, OLAP • OLTP : Online Transaction Processing – Airline booking, – ATMs, – Taxi booking • OLAP : Online Analytical Processing – Which tickets make us most money? – When do we need to refill our ATMs? – How many cabs do we need to service the West End of London?
  • 5.
    5 OLTP – OnlineTransaction processing
  • 6.
    6 OLTP – OnlineTransaction processing sum() avg()
  • 7.
    7 The Aggregation Framework– A Processing Pipeline Match Project Group SortLimit • Think unix pipeline • The output of one stage is passed to the input of the next stage • Each stage performs one job • Stages can be repeated • Output is a cursor, a new collection or a view
  • 8.
    8 Typical Goals ofAggregation Framework • Columnar Analytics • Reshaping data • Unwinding arrays into individual documents • Linking collections together • Generating new data from old (collections and views)
  • 9.
    9 Pipeline Operators • $match Filterdocuments • $project Reshape documents • $group Summarize documents • $out Create new collections • $sort Order documents • $limit/$skip Paginate documents • $lookup Join two collections together • $unwind Expand an array
  • 10.
    10 Example Pipeline Match ProjectGroup Sort Out • Find content • Standard query • Uses indexes • Reduce doc count • Use first • Select content • Remove fields • Add fields • Reduce doc size • Looks at every doc • Collect content • Sum, Avg etc. • Rewrite _id • Reduce doc count • Looks at every doc • Sort on fields • Several sorts allowed • Ascending or descending • 100mb limit • Allow Disk Use • New collection • $out overwrites • Only one per aggregate • Last member
  • 11.
    11 Example Document MongoDB Enterprise> db.members.find( { "batchID" : 138, "member.member_name" : "Joe Drumgoole" }, { "_id" : 0, "member.chapters" : 0 } ).pretty() { "member" : { "city" : "Dublin", "events_attended" : 19, "last_access_time" : ISODate("2017-04-25T12:40:55Z"), "country" : "Ireland", "member_id" : 99473492, "is_organizer" : true, "photo_thumb_url" : "https://secure.meetupstatic.com/photos/member/e/5/0/1/thumb_255178625.jpeg", "location" : { "type" : "Point", "coordinates" : [ -6.25, 53.33000183105469 ] }, "member_name" : "Joe Drumgoole", "join_time" : ISODate("2013-10-30T17:05:31Z") }, "timestamp" : ISODate("2017-04-26T10:13:54.079Z"), "batchID" : 138 }
  • 12.
  • 13.
    13 Group Document MongoDB Enterprise> db.groups.findOne({ "batchID": 138 }, { "group.photos" : 0, "group.topics" : 0, "group.location" : 0, "group.description" : 0, "group.organizers" : 0, "group.category" : 0, "_id" : 0 } ) { "batchID" : 138, "timestamp" : ISODate("2017-04-26T10:12:06.388Z"), "group" : { "rsvps_per_event" : 39.285701751708984, "repeat_rsvpers" : 62, "upcoming_events" : 0, "gender_female" : 0.039500001817941666, "pro_join_date" : ISODate("2017-04-10T18:11:49Z"), "id" : 10209022, "city" : "Gent", "member_count" : 399, "average_age" : 35.15570068359375, "status" : "Active", "founded_date" : ISODate("2013-09-11T14:05:30Z"), "urlname" : "mongodb-belgium", "gender_male" : 0.9473999738693237, "name" : "MongoDB Belgium", "last_event" : ISODate("2015-06-16T18:00:00Z"), "country" : "Belgium", "gender_unknown" : 0.013199999928474426, "past_events" : 7, "gender_other" : 0, "past_rsvps" : 275 } }
  • 14.
    14 Let’s Query forGroups JD10Gen:apps jdrumgoole$ ./mug_analytics_main.py --stats groups --url mongodb-belgium DublinMUG Processing : ['mongodb-belgium', 'DublinMUG'] pro db.groups.aggregate( [ {"$match": {"batchID": 138, "group.urlname": {"$in": ["mongodb-belgium", "DublinMUG"]}}}, {"$project": {"founded": "$group.founded_date", "urlname": "$group.urlname", "_id": 0, "members": "$group.member_count"}}, ]) {'founded': '11-Sep-2013 14:05', 'members': 399, 'urlname': u'mongodb-belgium'} {'founded': '14-Mar-2012 17:40', 'members': 847, 'urlname': u'DublinMUG'} Wrote 2 records JD10Gen:apps jdrumgoole$
  • 15.
    15 Attendee Doc MongoDB Enterprise> db.attendees.findOne( { "batchID" : 138 }, { "info.event.description" : 0, "_id" : 0, "info.event.group" : 0, "info.event.venue" : 0, "info.event.rating" : 0 } ) { "info" : { "attendee" : { "status" : "attended", "member" : { "name" : "Former member" }, "rsvp" : { "response" : "yes", } }, "event" : { "status" : "past", "event_url" : "https://www.meetup.com/mongodb-belgium/events/162104572/", "created" : ISODate("2014-01-23T09:10:20Z"), "rsvp_limit" : 75, "updated" : ISODate("2014-03-20T11:49:00Z"), "visibility" : "public", "yes_rsvp_count" : 75, "time" : ISODate("2014-03-18T18:00:00Z"), "headcount" : 0, "id" : "162104572", "name" : "MongoDB Belgium #1: the kickoff" } }, "timestamp" : ISODate("2017-04-26T10:14:30.129Z"), "batchID" : 138 }
  • 16.
    16 Search for NewMembers JD10Gen:apps jdrumgoole$ ./mug_analytics_main.py --stats newmembers --url DublinMUG --sort join_date --format csv --direction ascending --limit 10 Processing : ['DublinMUG'] Sorting on 'join_date' direction = 'ascending' db.members.aggregate([ {"$match": {"batchID": 138}}, {"$unwind": "$member.chapters"}, {"$match": {"member.chapters.urlname": {"$in": ["DublinMUG"]}}}, {"$project": {"join_date": "$member.join_time", "_id": 0, "group": "$member.chapters.urlname", "name": "$member.member_name"}}, {"$limit": 10}]) group,name,join_date DublinMUG,Gosia,17-Apr-2017 12:51 DublinMUG,Luke Shiels,15-Apr-2017 14:00 DublinMUG,Silvia Sirbu,11-Apr-2017 12:00 DublinMUG,Steeve P.,04-Apr-2017 09:47 DublinMUG,Dafei W,30-Mar-2017 11:36 DublinMUG,Ross Norman,13-Mar-2017 11:30 DublinMUG,Grzegorz F.,08-Mar-2017 10:25 DublinMUG,Lucas Sacramento,07-Mar-2017 11:05 DublinMUG,David Blount,06-Mar-2017 12:33 DublinMUG,Luca Ballerini,06-Mar-2017 10:41 Wrote 10 records JD10Gen:apps jdrumgoole$
  • 17.
    17 Search for NewMembers This Year JD10Gen:apps jdrumgoole$ ./mug_analytics_main.py --stats newmembers --url DublinMUG --sort join_date --format csv -- direction ascending --limit 10 --start 1-Jan-2017 Processing : ['DublinMUG'] Sorting on 'join_date' direction = 'ascending' db.members.aggregate([ {"$match": {"batchID": 138}}, {"$unwind": "$member.chapters"}, {"$match": {"member.chapters.urlname": {"$in": ["DublinMUG"]}}}, {"$match": {"member.join_time": {"$gte": "2017-01-01T00:00:00"}}}, {"$project": {"join_date": "$member.join_time", "_id": 0, "group": "$member.chapters.urlname", "name": "$member.member_name"}}, {"$limit": 10}]) group,name,join_date DublinMUG,Gosia,17-Apr-2017 12:51 DublinMUG,Luke Shiels,15-Apr-2017 14:00 DublinMUG,Silvia Sirbu,11-Apr-2017 12:00 DublinMUG,Steeve P.,04-Apr-2017 09:47 DublinMUG,Dafei W,30-Mar-2017 11:36 DublinMUG,Ross Norman,13-Mar-2017 11:30 DublinMUG,Grzegorz F.,08-Mar-2017 10:25 DublinMUG,Lucas Sacramento,07-Mar-2017 11:05 DublinMUG,David Blount,06-Mar-2017 12:33 DublinMUG,Luca Ballerini,06-Mar-2017 10:41 Wrote 10 records
  • 18.
    18 Turn an Aggregationinto a View • Only supported on MongoDB 3.4 • Views are a non-materialised view on a collection MongoDB Enterprise > db.createView( "batch138", "members", [ { "$match" : { "batchID" : 138 }} ] ) { "ok" : 1 } MongoDB Enterprise > • A view persists and will return new results each time a find is run • A view looks just like a collection • Must turn 3.4 compatibility on MongoDB Enterprise > db.adminCommand( { setFeatureCompatibilityVersion: "3.4"} )
  • 19.
    19 Useful Links • TheAggregation Python class https://github.com/jdrumgoole/mongodb_utils/blob/master/mongodb_utils/agg.py • Aggregation docs https://docs.mongodb.com/manual/aggregation/ • MongoDB Views in 3.4 https://docs.mongodb.com/manual/core/views/
  • 20.