DATAASDOCUMENTS
Mitch Pirtle
BigDive 2013
Turin, Italy
ABOUTME
•Moved from NYC to TO in 2011
•Recovering Joomla! founder
•CTO @soundaymusic
•Use primarily PHP (Lithium), Node.js
•MongoDB Master
ABOUTTHISTALK
•Background on database history
•Impact from the Web
•Emerging solutions and technologies
•Hands-on session
•Close with Q&A
Are you done with lunch?
INTHEBEGINNING
• Data was simple.
• Performance was
simpler.
• Scale was a rare need.
BIRTHOFRELATIONALDATA
• Applications got more
complex.
• Many apps, one
database pushed logic
into the data tier.
• “Business rules” was
the king buzzword.
BIRTHOFWEB
• Very complex
architecture
• Very high scale
requirements
• Rapid application
development
WRONGTOOLRIGHTJOB?
•Was great for data consistency and
features, but...
•Impossible to scale
•Impedance mismatch with modern
apps
ALTERNATIVES
• Key / Value
• Documents
• Memory-only*
KEYVALUE
•EXAMPLES: Memcache, Voldemort,
Cassandra, Dynamo, Hibari, Riak
•No schema needed
•Blazing fast
•Minimal features
DOCUMENT
•EXAMPLES: MongoDB, SimpleDB,
ElasticSearch, OrientDB
•Rich datatypes matching modern apps
•More features
•Mostly JSON based
EXAMPLEPLATFORMS
MONGODB
•Document database, uses JSON
•Many user/developer features
•Many deployment features
•Designed specifically for modern scale
challenges and programming
languages
REDIS
•Key-value database
•Extended data types
•Many features
•Similar facilities for scale and
performance
VOLDEMORT
•Key-value
•Extreme scale
HADOOP
•Framework, not really a database
•Born from Google’s map reduce and
distributed file system efforts
DEPLOYMENTOVERVIEW
DEDICATEDSYSTEMS
•Low cost, simple to setup
•Great performance
•Difficult to scale
•Require constant management
TETHEREDCLOUD
•Takes dedicated environment and
extends with cloud infrastructure for
scale
•Extremely flexible
•Even more management and
administration
FULLCLOUD
•High initial effort
•Much simpler to manage long term
•Extreme scale
•Possibility for equally extreme cost
savings*
DEVELOPERS!
(hang on a minute)
DEVELOPERS!
(much better)
(ok now to get serious)
WORKINGWITHSQL
• Crap, now I need an
ORM!
• Disconnect between
relational data and
object languages
• Tons of debugging
fun!
WORKINGWITHMONGODB
• Simplifies data access
• Simplifies code
• Fewer execution steps
make faster and
lighter apps
COMMONTERMS
•database <-> database
•table <-> collection
•result <-> document
•column <-> property
WHATISJSON?
DOCUMENTDESIGN
• strings
• integers
• arrays
• objects
• dates
• boolean
• regex
• symbol
• javascript
• ObjectID
• timestamps
• GridFS
MongoDB documents are BSON:
DOCUMENTDESIGN
• strings
• integers
• arrays
• objects
• dates
• boolean
• regex
• symbol
• javascript
• ObjectID
• timestamps
• GridFS
MongoDB documents are BSON:
DATATYPE:OBJECTID
•MongoDB’s ObjectID is a 12-byte
BSON type, comprised of unix seconds
from epoch (4 bytes), machine identifier
(3 bytes), process id (2 bytes), and
random counter (3 bytes).
DATATYPE:OBJECTID
ObjectId(
"4ee75a9c318b9d2c640001a6"
}
DATATYPE:OBJECTID
•ObjectID is not a string. Always
reference them as ObjectId(“...”) as
your comparisons will not work if you
do not.
DATATYPE:OBJECTID
> x = ObjectId()
ObjectId("51b73dff884498553b746046")
> x.getTimestamp()
ISODate("2013-06-11T15:10:55Z")
DATATYPE:DATE
•MongoDB’s Date is a 64-bit integer
that represents the Unix epoch in
milliseconds. It is signed, negative
values represents dates before 1970.
DATATYPE:DATE
> when = new Date()
ISODate("2013-06-11T15:18:30.241Z")
> when.toString()
Tue Jun 11 2013 17:18:30 GMT+0200 (CEST)
> when.getMonth()
5
DATATYPE:GRIDFS
•MongoDB’s GridFS is a facility that
allows you to store binary files within
the database, and allows you to extend
them with JSON metadata.
(ok this part is easier on the
command line. more on this
later in this class.)
COMMONTASKS
• find(), findOne()
• findAndModify()
• ensureIndex()
• drop()
• insert()
• update()
• upsert()
• save()
• remove()
• stats()
INDEXES
•MongoDB’s indexes support a variety
of types and needs
•Indexing overview
INDEXTYPES
• Standard (_id)
• Secondary
• Subdocuments
• Embedded fields
• Compound
• ASC and DESC keys
• Multikeys
• Unique
• Sparse
• Hash
INDEXCREATION
INDEXCREATION
•Standard:
db.people.ensureIndex( { zipcode: 1} )
INDEXCREATION
•Standard:
db.people.ensureIndex( { zipcode: 1} )
•Background:
INDEXCREATION
•Standard:
db.people.ensureIndex( { zipcode: 1} )
•Background:
db.people.ensureIndex( { zipcode: 1},
{background: true } )
INDEXCREATION
•Standard:
db.people.ensureIndex( { zipcode: 1} )
•Background:
db.people.ensureIndex( { zipcode: 1},
{background: true } )
•Background Sparse:
INDEXCREATION
•Standard:
db.people.ensureIndex( { zipcode: 1} )
•Background:
db.people.ensureIndex( { zipcode: 1},
{background: true } )
•Background Sparse:
db.people.ensureIndex( { zipcode: 1},
{background: true, sparse: true } )
GRIDFS
•Drivers support GridFS with helper
methods, as well as the mongofiles
command line tool that is distributed
with MongoDB.
•Crazy, whack-daddy fast.
•Dead simple to use.
(drop to console)
NOTE: MongoDB provides many
command line tools to work with your
database. They are listed and
documented in great detail online.
HOWMONGODBSCALES
•Vertically: Replication
•Horizontally: Sharding
REPLICATION
•MongoDB’s Replica Sets allow you to
add multiple masters for write
performance, slaves for read
performance
•Many tutorials and procedures
REPLICATION
M1 M2 M3
H1 D1 D2
(M)ember
(H)idden
(D)elayed
AGGREGATIONFRAMEWORK
•Aggregation Framework provides
GROUP BY like functionality without
map reduce
•Many examples
•Detailed reference
{
! "_id" : ObjectId("51b833cd884498553b746047"),
! "title" : "Book 1",
! "author" : "Ima Writer",
! "tags" : [
! ! "awesome",
! ! "ok",
! ! "lousy",
! ! "ok",
! ! "meh",
! ! "meh"
! ]
}
{
! "_id" : ObjectId("51b833ee884498553b746048"),
! "title" : "Book 2",
! "author" : "Heesan Author",
! "tags" : [
! ! "awesome",
! ! "ok",
! ! "lousy",
! ! "awesome",
! ! "good",
! ! "good"
! ]
}
db.articles.aggregate(
{ $project : {
author : 1,
tags : 1,
} },
{ $unwind : "$tags" },
{ $group : {
_id : { tags : "$tags" },
authors : { $addToSet : "$author" }
} }
);
{
! "result" : [
! ! {
! ! ! "_id" : {
! ! ! ! "tags" : "good"
! ! ! },
! ! ! "authors" : [
! ! ! ! "Heesan Author"
! ! ! ]
! ! },
! ! {
! ! ! "_id" : {
! ! ! ! "tags" : "meh"
! ! ! },
! ! ! "authors" : [
! ! ! ! "Sheesan Author",
! ! ! ! "Ima Writer"
! ! ! ]
! ! }
! ],
! "ok" : 1
}
SHARDING
•MongoDB’s Sharding allows you to
scale your data beyond one physical
machine:
- need more RAM
- need more CPU
- need more disk
SHARDINGDEPLOYMENT
S1 S2 S3
M1 M2 M3
(C)onfig
(S)hard server (mongos)
(M)ongo shard node (mongod)
C1
MAPREDUCE
•MongoDB’s mapReduce performs
complex aggregation operations
•Many examples
•Even more fun than regex!
Map Reduce is covered in
detail in a later class at
BIGDIVE
QUESTIONSANDANSWERS
THANKYOU

Data as Documents: Overview and intro to MongoDB