Data as Documents: Overview and intro to MongoDB

DATAASDOCUMENTS
Mitch Pirtle
BigDive 2013
Turin, Italy

ABOUTME
•Moved from NYC to TO in 2011
•Recovering Joomla! founder
•CTO @soundaymusic
•Use primarily PHP (Lithium), Node.js
•MongoDB Master

ABOUTTHISTALK
•Background on database history
•Impact from the Web
•Emerging solutions and technologies
•Hands-on session
•Close with Q&A

INTHEBEGINNING
• Data was simple.
• Performance was
simpler.
• Scale was a rare need.

BIRTHOFRELATIONALDATA
• Applications got more
complex.
• Many apps, one
database pushed logic
into the data tier.
• “Business rules” was
the king buzzword.

BIRTHOFWEB
• Very complex
architecture
• Very high scale
requirements
• Rapid application
development

WRONGTOOLRIGHTJOB?
•Was great for data consistency and
features, but...
•Impossible to scale
•Impedance mismatch with modern
apps

ALTERNATIVES
• Key / Value
• Documents
• Memory-only*

KEYVALUE
•EXAMPLES: Memcache, Voldemort,
Cassandra, Dynamo, Hibari, Riak
•No schema needed
•Blazing fast
•Minimal features

DOCUMENT
•EXAMPLES: MongoDB, SimpleDB,
ElasticSearch, OrientDB
•Rich datatypes matching modern apps
•More features
•Mostly JSON based

MONGODB
•Document database, uses JSON
•Many user/developer features
•Many deployment features
•Designed speciﬁcally for modern scale
challenges and programming
languages

REDIS
•Key-value database
•Extended data types
•Many features
•Similar facilities for scale and
performance

VOLDEMORT
•Key-value
•Extreme scale

HADOOP
•Framework, not really a database
•Born from Google’s map reduce and
distributed ﬁle system efforts

DEDICATEDSYSTEMS
•Low cost, simple to setup
•Great performance
•Difﬁcult to scale
•Require constant management

TETHEREDCLOUD
•Takes dedicated environment and
extends with cloud infrastructure for
scale
•Extremely ﬂexible
•Even more management and
administration

FULLCLOUD
•High initial effort
•Much simpler to manage long term
•Extreme scale
•Possibility for equally extreme cost
savings*

WORKINGWITHSQL
• Crap, now I need an
ORM!
• Disconnect between
relational data and
object languages
• Tons of debugging
fun!

WORKINGWITHMONGODB
• Simpliﬁes data access
• Simpliﬁes code
• Fewer execution steps
make faster and
lighter apps

COMMONTERMS
•database <-> database
•table <-> collection
•result <-> document
•column <-> property

DOCUMENTDESIGN
• strings
• integers
• arrays
• objects
• dates
• boolean
• regex
• symbol
• javascript
• ObjectID
• timestamps
• GridFS
MongoDB documents are BSON:

DATATYPE:OBJECTID
•MongoDB’s ObjectID is a 12-byte
BSON type, comprised of unix seconds
from epoch (4 bytes), machine identiﬁer
(3 bytes), process id (2 bytes), and
random counter (3 bytes).

DATATYPE:OBJECTID
ObjectId(
"4ee75a9c318b9d2c640001a6"
}

DATATYPE:OBJECTID
•ObjectID is not a string. Always
reference them as ObjectId(“...”) as
your comparisons will not work if you
do not.

DATATYPE:OBJECTID
> x = ObjectId()
ObjectId("51b73dff884498553b746046")
> x.getTimestamp()
ISODate("2013-06-11T15:10:55Z")

DATATYPE:DATE
•MongoDB’s Date is a 64-bit integer
that represents the Unix epoch in
milliseconds. It is signed, negative
values represents dates before 1970.

DATATYPE:DATE
> when = new Date()
ISODate("2013-06-11T15:18:30.241Z")
> when.toString()
Tue Jun 11 2013 17:18:30 GMT+0200 (CEST)
> when.getMonth()
5

DATATYPE:GRIDFS
•MongoDB’s GridFS is a facility that
allows you to store binary ﬁles within
the database, and allows you to extend
them with JSON metadata.

(ok this part is easier on the
command line. more on this
later in this class.)

COMMONTASKS
• find(), findOne()
• findAndModify()
• ensureIndex()
• drop()
• insert()
• update()
• upsert()
• save()
• remove()
• stats()

INDEXES
•MongoDB’s indexes support a variety
of types and needs
•Indexing overview

INDEXTYPES
• Standard (_id)
• Secondary
• Subdocuments
• Embedded ﬁelds
• Compound
• ASC and DESC keys
• Multikeys
• Unique
• Sparse
• Hash

INDEXCREATION
•Standard:
db.people.ensureIndex( { zipcode: 1} )

INDEXCREATION
•Standard:
•Background:

INDEXCREATION
•Standard:
•Background:
db.people.ensureIndex( { zipcode: 1},
{background: true } )

INDEXCREATION
•Standard:
•Background:
•Background Sparse:

INDEXCREATION
•Standard:
•Background:
•Background Sparse:
{background: true, sparse: true } )

GRIDFS
•Drivers support GridFS with helper
methods, as well as the mongoﬁles
command line tool that is distributed
with MongoDB.
•Crazy, whack-daddy fast.
•Dead simple to use.

NOTE: MongoDB provides many
command line tools to work with your
database. They are listed and
documented in great detail online.

HOWMONGODBSCALES
•Vertically: Replication
•Horizontally: Sharding

REPLICATION
•MongoDB’s Replica Sets allow you to
add multiple masters for write
performance, slaves for read
performance
•Many tutorials and procedures

REPLICATION
M1 M2 M3
H1 D1 D2
(M)ember
(H)idden
(D)elayed

AGGREGATIONFRAMEWORK
•Aggregation Framework provides
GROUP BY like functionality without
map reduce
•Many examples
•Detailed reference

{
! "_id" : ObjectId("51b833cd884498553b746047"),
! "title" : "Book 1",
! "author" : "Ima Writer",
! "tags" : [
! ! "awesome",
! ! "ok",
! ! "lousy",
! ! "ok",
! ! "meh",
! ! "meh"
! ]
}
{
! "_id" : ObjectId("51b833ee884498553b746048"),
! "title" : "Book 2",
! "author" : "Heesan Author",
! "tags" : [
! ! "awesome",
! ! "ok",
! ! "lousy",
! ! "awesome",
! ! "good",
! ! "good"
! ]
}

db.articles.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : { tags : "$tags" },
authors : { $addToSet : "$author" }
} }
);

{
! "result" : [
! ! {
! ! ! "_id" : {
! ! ! ! "tags" : "good"
! ! ! },
! ! ! "authors" : [
! ! ! ! "Heesan Author"
! ! ! ]
! ! },
! ! {
! ! ! "_id" : {
! ! ! ! "tags" : "meh"
! ! ! },
! ! ! "authors" : [
! ! ! ! "Sheesan Author",
! ! ! ! "Ima Writer"
! ! ! ]
! ! }
! ],
! "ok" : 1
}

SHARDING
•MongoDB’s Sharding allows you to
scale your data beyond one physical
machine:
- need more RAM
- need more CPU
- need more disk

SHARDINGDEPLOYMENT
S1 S2 S3
M1 M2 M3
(C)onﬁg
(S)hard server (mongos)
(M)ongo shard node (mongod)
C1

MAPREDUCE
•MongoDB’s mapReduce performs
complex aggregation operations
•Many examples
•Even more fun than regex!

Map Reduce is covered in
detail in a later class at
BIGDIVE

Data as Documents: Overview and intro to MongoDB

More Related Content

What's hot

Viewers also liked

Similar to Data as Documents: Overview and intro to MongoDB

More from Mitch Pirtle

Recently uploaded

Data as Documents: Overview and intro to MongoDB