Python andMongoDBThe perfect MatchAndreas Jung, www.zopyx.com
Trainer Andreas JungPython developersince 1993Python, Zope & PlonedevelopmentSpecialized in Electronic PublishingDirectoroftheZopeFoundationAuthorofdozensadd-onsfor Python, ZopeandPloneCo-Founderofthe German Zope User Group (DZUG)Member ofthePloneFoundationusingMongoDBsince 2009
Agenda (45 minutes per slot)IntroductiontoMongoDBUsingMongoDBUsingMongoDBfrom Python withPyMongo(PyMongoextensions/ORM-ishlayersor Q/A)
Things not coveredin thistutorialGeospatialindexingMap-reduceDetails on scaling (Sharding, Replicasets)
Part I/4IntroductiontoMongoDB:ConceptsofMongoDBArchitectureHowMongoDBcompareswith relational databasesScalability
MongoDBis...an open-source,high-performance,schema-less,document-orienteddatabase
Let‘sagree on thefollowingorleave...MongoDBis coolMongoDBis not the multi-purpose-one-size-fits-all databaseMongoDBisanotheradditionaltoolforthesoftwaredeveloperMongoDBis not a replacementfor RDBMS in generalUsetherighttoolforeachtask
And.....Don‘taskmeabouthowto do JOINs in MongoDB
Oh, SQL – let‘shavesomefunfirstA SQL statementwalksinto a bar andseestwotables. He walksandsays: „Hello, may I joinyou“A SQL injectionwalksinto a bar andstartstoquotesomething but suddenlystops, drops a tableanddashes out.
The historyofMongoDB10gen founded in 2007Startedascloud-alternative GAEApp-engineedDatabase pJavascriptasimplementationlanguage2008: focusing on thedatabasepart: MongoDB2009: firstMongoDBrelease2011: MongoDB 1.8:Major deploymentsA fast growingcommunityFast adoptationfor large projects10gen growing
Major MongoDBdeployments
MongoDBis schema-lessJSON-style datastoreEachdocumentcanhaveitsownschemaDocumentsinside a collectionusuallyshare a commonschemabyconvention{‚name‘ : ‚kate‘, ‚age‘:12, }{‚name‘ : ‚adam‘, ‚height‘ : 180}{‚q‘: 1234, ‚x‘ = [‚foo‘, ‚bar‘]}
Terminology: RDBMS vs. MongoDB
CharacteristicsofMongoDB (I)High-performanceRich querylanguage (similarto SQL)Map-Reduce (ifyoureallyneedit)SecondaryindexesGeospatialindexingReplicationAuto-sharing (partitioningofdata)Manyplatforms, driversformanylanguages
CharacteristicsofMongoDB (II)Notransactionsupport, onlyatomicoperationsDefault: „fire-and-forget“ modefor high throughput„Safe-Mode“: waitforserverconfirmation, checkingforerrors
TypicalperformancecharacteristicsDecentcommoditiyhardware:Upto 100.000 read/writes per second (fire-and-forget)Upto 50.000 reads/writes per second (safemode)Yourmileagemayvary– depending onRAMSpeed IO systemCPUClient-sidedriver& application
Functionality vs. Scability
MongoDB: Pros & Cons
DurabilityDefault: fire-and-forget (usesafe-mode)Changesarekept in RAM (!)Fsynctodiskevery 60 seconds (default)Deploymentoptions:Standaloneinstallation: usejournaling (V 1.8+)Replicated: usereplicasets(s)
Differences from Typical RDBMSMemory mapped dataAll data in memory (if it fits), synced to disk periodicallyNo joinsReads have greater data localityNo joins between serversNo transactionsImproves performance of various operationsNo transactions between servers
Replica SetsCluster of N serversOnly one node is ‘primary’ at a timeThis is equivalent to masterThe node where writes goPrimary is elected by concensusAutomatic failoverAutomatic recovery of failed nodes
Replica Sets - WritesA write is only ‘committed’ once it has been replicated to a majority of nodes in the setBefore this happens, reads to the set may or may not see the writeOn failover, data which is not ‘committed’ may be dropped (but not necessarily)If dropped, it will be rolled back from all servers which wrote itFor improved durability, use getLastError/wOther criteria – block writes when nodes go down or slaves get too far behindOr, to reduce latency, reduce getLastError/w
Replica Sets - NodesNodes monitor each other’s heartbeatsIf primary can’t see a majority of nodes, it relinquishes primary statusIf a majority of nodes notice there is no primary, they elect a primary using criteriaNode priorityNode data’s freshness
Replica Sets - NodesMember 1Member 2Member 3
Replica Sets - Nodes{a:1}Member 1SECONDARY{a:1}{b:2}Member 2SECONDARY{a:1}{b:2}{c:3}Member 3PRIMARY
Replica Sets - Nodes{a:1}Member 1SECONDARY{a:1}{b:2}Member 2PRIMARY{a:1}{b:2}{c:3}Member 3DOWN
Replica Sets - Nodes{a:1}{b:2}Member 1SECONDARY{a:1}{b:2}Member 2PRIMARY{a:1}{b:2}{c:3}Member 3RECOVERING
Replica Sets - Nodes{a:1}{b:2}Member 1SECONDARY{a:1}{b:2}Member 2PRIMARY{a:1}{b:2}Member 3SECONDARY
Replica Sets – Node TypesStandard – can be primary or secondaryPassive – will be secondary but never primaryArbiter – will vote on primary, but won’t replicate data
SlaveOkdb.getMongo().setSlaveOk();Syntax varies by driverWrites to master, reads to slaveSlave will be picked arbitrarily
Sharding Architecture
ShardA replica setManages a well defined range of shard keys
ShardDistribute data across machinesReduce data per machineBetter able to fit in RAMDistribute write load across shardsDistribute read load across shards, and across nodes within shards
Shard Key{ user_id: 1 }{ lastname: 1, firstname: 1 }{ tag: 1, timestamp: -1 }{ _id: 1 }This is the default
MongosRoutes data to/from shardsdb.users.find( { user_id: 5000 } )db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } )db.users.find( { hometown: ‘Seattle’ } )db.users.find( { hometown: ‘Seattle’ } ).sort( { user_id: 1 } )
Differences from Typical RDBMSMemory mapped dataAll data in memory (if it fits), synced to disk periodicallyNo joinsReads have greater data localityNo joins between serversNo transactionsImproves performance of various operationsNo transactions between serversA weak authentication and authorization model
Part 2/4UsingMongoDBStartingMongoDBUsingtheinteractive Mongo consoleBasic databaseoperations
Gettingstarted...theserverwget http://fastdl.mongodb.org/osx/mongodb-osx-x86_64-1.8.1.tgztarxfzmongodb-osx-x86_64-1.8.1.tgzcd mongodb-osx-x86_64-1.8.1mkdir /tmp/dbbin/mongod –dbpath /tmp/dbPick upyour  OS-specificpackagefromhttp://www.mongodb.org/downloadsTake careof 32 bitbs. 64 bitversion
Gettingstarted...theconsolebin/mongodmongodlistenstoport 27017 bydefaultHTTP interface on port 28017> help> db.help()> db.some_collection.help()
Datatypes...Remember: MongoDBis schema-lessMongoDBsupports JSON + some extra types
A smalladdressdatabasePerson:firstnamelastnamebirthdaycityphone
Inserting> db.foo.insert(document)> db.foo.insert({‚firstname‘ : ‚Ben‘})everydocumenthas an „_id“ field„_id“ insertedautomaticallyif not present
Querying> db.foo.find(query_expression)> db.foo.find({‚firstname‘ : ‚Ben‘})Queriesareexpressedusing JSON notationwith JSON/BSON objectsqueryexpressionscombinedusing AND (bydefault)http://www.mongodb.org/display/DOCS/Querying
Queryingwithsorting> db.foo.find({}).sort({‚firstname‘ :1, ‚age‘: -1})sortingspecification in JSON notation1 = ascending, -1 = descending
Advancedquerying$all$exists$mod$ne$in$nin$nor$or$size$typehttp://www.mongodb.org/display/DOCS/Advanced+Queries
Updating> db.foo.update(criteria, obj, multi, upsert)update() updatesonlyonedocumentbydefault (specifymulti=1)upsert=1: ifdocumentdoes not exist, insertit
Updating – modifieroperations$inc$set$unset$push$pushAll$addToSet$pop$pull$pullAll$rename$bithttp://www.mongodb.org/display/DOCS/Updating
Updating> db.foo.update(criteria, obj, multi, upsert)update() updatesonlyonedocumentbydefault (specifymulti=1)upsert=1: ifdocumentdoes not exist, insertit
Removingdb.foo.remove({})                             // remove alldb.foo.remove({‚firstname‘ : ‚Ben‘})  // removebykeydb.foo.remove({‚_id‘ : ObjectId(...)}) // removeby _idAtomicremoval(locksthedatabase)db.foo.remove( { age: 42, $atomic : true } )http://www.mongodb.org/display/DOCS/Removing
Indexesworkingsimilartoindex in relational databasesdb.foo.ensureIndex({age: 1}, {background: true})onequery– oneindexCompoundIndexesdb.foo.ensureIndex({age: 1, firstname:-1}Orderingofqueryparametersmattershttp://www.mongodb.org/display/DOCS/Indexes
Embedded documentsMongoDBdocs = JSON/BSON-likeEmbeededdocumentssimilarnesteddicts in Pythondb.foo.insert({firstname:‘Ben‘, data:{a:1, b:2, c:3})db.foo.find({‚data.a‘:1})DottednotationforreachingintoembeddedocumentsUsequotesarounddottednamesIndexes work on embeddesdocuments
Arrays (1/2)Like (nested) lists in Pythondb.foo.insert({colors: [‚green‘, ‚blue‘, ‚red‘]})db.foo.find({colors: ‚red‘})Useindexes
Arrays (2/2) – matchingarraysdb.bar.insert({users: [                         {name: ‚Hans‘, age:42},                         {name:‘Jim‘, age: 30 },                      ]})db.bar.find({users : {‚$elemMatch‘: {age : {$gt:42}}}})
Part 3/4UsingMongoDBfrom Python PyMongoInstallingPyMongoUsingPyMongo
InstallingandtestingPyMongoInstallpymongovirtualenv –no-site-packagespymongobin/easy_installpymongoStart MongoDBmkdir /tmp/dbmongod –dbpath /tmp/dbStart Pythonbin/python> importpymongo> conn = pymongo.Connection(‚localhost‘, 27127)
Part 4/4? High-level PyMongoframeworksMongokitMongoengineMongoAlchemy? Migration SQL toMongoDB? Q/A? Lookingat a real worldprojectdonewithPyramidandMongoDB?? Let‘stalkabout..
Mongokit (1/3)schemavalidation (wich usesimple pythontype forthedeclaration)dotednotationnestedandcomplexschemadeclarationuntypedfieldsupportrequiredfieldsvalidationdefaultvaluescustomvalidatorscrossdatabasedocumentreferencerandomquerysupport (whichreturns a randomdocumentfromthedatabase)inheritanceandpolymorphismesupportversionizeddocumentsupport (in betastage)partial authsupport (itbrings a simple User model)operatorforvalidation (currently : OR, NOT and IS)simple web frameworkintegrationimport/exporttojsoni18n supportGridFSsupportdocumentmigrationsupport
Mongokit (2/3)classBlogPost(Document):structure = {        'title': unicode,        'body': unicode,        'author': pymongo.objectid.ObjectId,        'created_at': datetime.datetime,        'tags': [unicode],    }required_fields = ['title','author', 'date_creation']blog_post = BlogPost()blog_post['title'] = 'myblogpost'blog_post['created_at'] = datetime.datetime.utcnow()blog_post.save()
Mongokit (3/3)Speed andperformanceimpactMongokitisalwaysbehindthemostcurrentpymongoversionsone-man developershowhttp://namlook.github.com/mongokit/
Mongoengine (1/2)MongoEngineis a Document-Object Mapper (think ORM, but fordocumentdatabases) forworkingwithMongoDBfrom Python. Ituses a simple declarative API, similartotheDjango ORM.http://mongoengine.org/
Mongokit (2/2)classBlogPost(Document):    title = StringField(required=True)body = StringField()author = ReferenceField(User)created_at = DateTimeField(required=True)    tags = ListField(StringField())blog_post = BlogPost(title='myblogpost', created_at=datetime.datetime.utcnow())blog_post.save()
MongoAlchemy (1/2)MongoAlchemyis a layer on top ofthe Python MongoDBdriverwhichadds client-sideschemadefinitions, an easiertoworkwithandprogrammaticquerylanguage, and a Document-Objectmapperwhichallowspythonobjectstobesavedandloadedintothedatabase in a type-safe way.An explicit goalofthisprojectistobeabletoperformasmanyoperationsaspossiblewithouthavingtoperform a load/save cyclesincedoing so isbothsignificantlyslowerandmorelikelytocausedataloss.http://mongoalchemy.org/
MongoAlchemy(2/2)frommongoalchemy.documentimportDocument, DocumentFieldfrommongoalchemy.fieldsimport *fromdatetimeimportdatetimefrompprintimportpprintclass Event(Document):name = StringField()children = ListField(DocumentField('Event'))begin = DateTimeField()    end = DateTimeField()def __init__(self, name, parent=None):Document.__init__(self, name=name)self.children = []ifparent != None:parent.children.append(self)
From SQL toMongoDB
The CAP theoremConsistencyAvailablityTolerancetonetworkPartitionsPick two...
ACID versus BaseAtomicityConsistencyIsolationDurabilityBasicallyAvailableSoft stateEventuallyconsistent

Python mongo db-training-europython-2011