Upload
boxed-ice
View
1.301
Download
5
Embed Size (px)
DESCRIPTION
Building a queueing system in MongoDB and monitoring your cluster. Presentation by David Mytton at MongoSF May 2011 and MongoDB London User Group July 2011.
Citation preview
MongoDB Queuing & Monitoring
www.flickr.com/photos/triplexpresso/496995086/
Queuing
www.flickr.com/photos/triplexpresso/496995086/
Queuing
☺ Redundancy
www.flickr.com/photos/triplexpresso/496995086/
Queuing
☺ Redundancy
☺ Atomicity
www.flickr.com/photos/triplexpresso/496995086/
Queuing
☺ Redundancy
☺ Atomicity
☃ Speed
www.flickr.com/photos/triplexpresso/496995086/
Queuing
☺ Redundancy
☺ Atomicity
☃ Speed
☹ GC
www.flickr.com/photos/triplexpresso/496995086/
Queuing
☺ Redundancy
www.flickr.com/photos/triplexpresso/496995086/
Queuing
☺ Redundancy
☺ Known
It’s a little different,but not entirely new.
www.flickr.com/photos/comedynose/4388430444/
Keep it in RAM. Obviously.
http://www.flickr.com/photos/comedynose/4388430444/
How do you know?
> db.stats(){! "collections" : 3,! "objects" : 379970142,! "avgObjSize" : 146.4554114991488,! "dataSize" : 55648683504,! "storageSize" : 61795435008,! "numExtents" : 64,! "indexes" : 1,! "indexSize" : 21354514128,! "fileSize" : 100816388096,! "ok" : 1}
51GB
19GB
http://www.flickr.com/photos/comedynose/4388430444/
Where should it go?
What? Should it be in memory?
Indexes Always
Data If you can
How you’ll know
1) Slow queries
Thu Oct 14 17:01:11 [conn7410] update sd.apiLog query: { c: "android/setDeviceToken", a: 1466, u: "blah", ua: "Server Density Android" } 51926ms
www.flickr.com/photos/tonivc/2283676770/
How you’ll know
2) Timeouts
cursor timed out (20000 ms)
How you’ll know
3) Disk i/o spikes
www.flickr.com/photos/daddo83/3406962115/
Watch your storage
1) Pre-alloc
Watch your storage
2) Sharding maxSize
Watch your storage
3) Logging
--quiet
db.runCommand("logRotate");
killall -SIGUSR1 mongod
Watch your storage
4) Journaling
david@rs2b ~: ls -alh /mongodbdata/journal/total 538Mdrwxrwxr-x 2 david david 29 Mar 20 16:50 .drwx------ 4 david david 4.0K Mar 13 09:50 ..-rw------- 1 david david 538M Mar 20 17:00 j._862-rw------- 1 david david 88 Mar 20 17:00 lsn
db.serverStatus()
1) Used connections
db.serverStatus()
www.flickr.com/photos/armchaircaver/2061231069/
2) Available connections
db.serverStatus()
Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [mongosMain] Listener: accept() returns -1 errno:24 Too many open files Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2335] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:32 [conn2268] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2d:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2c:27018 socket exception Fri Nov 19 17:24:32 [conn2268] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:32 [conn2268] checkmaster: caught exception rs2a:27018 socket exception Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2330] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2327] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2126] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:33 [conn2343] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs1c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2b") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2332] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2d:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2d") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2d:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2c:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2c") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2c:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:34 [conn2343] trying reconnect to rs2a:27018 Fri Nov 19 17:24:34 [conn2343] getaddrinfo("rs2a") failed: No address associated with hostname Fri Nov 19 17:24:34 [conn2343] reconnect rs2a:27018 failed Fri Nov 19 17:24:34 [conn2343] MessagingPort say send() errno:9 Bad file descriptor (NONE) Fri Nov 19 17:24:35 [conn2343] checkmaster: rs2b:27018 { setName: "set2", ismaster: false, secondary: true, hosts: [ "rs2b:27018", "rs2d:27018", "rs2c:27018", "rs2a:27018" ], arbiters: [ "rs2arbiter:27018" ], primary: "rs2a:27018", maxBsonObjectSize: 8388608, ok: 1.0 } MessagingPort say send() errno:9 Bad file descriptor (NONE)
connPoolStats> db.runCommand("connPoolStats"){! "hosts" : {! ! "config1:27019" : {! ! ! "available" : 2,! ! ! "created" : 6! ! },! ! "set1/rs1a:27018,rs1b:27018" : {! ! ! "available" : 1,! ! ! "created" : 249! ! },
...! },! "totalAvailable" : 5,! "totalCreated" : 1002,! "numDBClientConnection" : 3490,! "numAScopedConnection" : 3,}
3) Index counters
db.serverStatus()
"indexCounters" : {! ! "btree" : {! ! ! "accesses" : 15180175,! ! ! "hits" : 15178725,! ! ! "misses" : 1450,! ! ! "resets" : 0,! ! ! "missRatio" : 0.00009551932! ! }! },
4) Op counters
db.serverStatus()
www.flickr.com/photos/cosmic_bandita/2395369614/
5) Background flushing
db.serverStatus()
Picture is unrelated! Mmm, ice cream.
6) Dur
db.serverStatus()
rs.status()
www.ex-astris-scientia.org/inconsistencies/ent_vs_tng.htm (yes it’s a replicator from Star Trek)
{! "_id" : 1,! "name" : "rs3b:27018",! "health" : 1,! "state" : 2,! "stateStr" : "SECONDARY",! "uptime" : 1886098,! "optime" : {! ! "t" : 1291252178000,! ! "i" : 13! },! "optimeDate" : ISODate("2010-12-02T01:09:38Z"), "lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")},
1) myState
rs.status()
Value Meaning0 Starting up (phase 1)1 Primary2 Secondary3 Recovering4 Fatal error5 Starting up (phase 2)6 Unknown state7 Arbiter8 Down
en.wikipedia.org/wiki/State_of_matter
2) Optime
rs.status()
www.flickr.com/photos/robbie73/4244846566/
"optimeDate" : ISODate("2010-12-02T01:09:38Z")
3) Heartbeat
rs.status()
www.flickr.com/photos/drawblindfaith/3400981091/
"lastHeartbeat" : ISODate("2010-12-02T01:09:38Z")
mongostat
1) faults
mongostat
Picture is unrelated! Snowmobile in Norway.
2) locked
mongostat
www.flickr.com/photos/bbusschots/4541573665/
3) index miss
mongostat
www.flickr.com/photos/gareandkitty/276471187/
4) queues
mongostat
5) Diagnostics
mongostat
Current operations
www.flickr.com/photos/jeffhester/2784666811/
db.currentOp();{! ! ! "opid" : "shard1:299939199",! ! ! "active" : true,! ! ! "lockType" : "write",! ! ! "waitingForLock" : false,! ! ! "secs_running" : 15419,! ! ! "op" : "remove",! ! ! "ns" : "sd.metrics",! ! ! "query" : {! ! ! ! "accId" : 1391,! ! ! ! "tA" : {! ! ! ! ! "$lte" : ISODate("2010-11-24T19:53:00Z")! ! ! ! }! ! ! },! ! ! "client" : "10.121.12.228:44426",! ! ! "desc" : "conn"! ! },
Monitoring tools
Server Density
Monitoring tools
www.mongomonitor.com
plugins.serverdensity.com
App store for sysadmins
Recap
Keep it in RAM
Recap
Keep it in RAM
Watch your storage
Recap
Keep it in RAM
Watch your storage
db.serverStatus()
rs.status()
Recap