collectd & PostgreSQL

collectd & PostgreSQL

Mark Wong
markwkm@postgresql.org
mark.wong@myemma.com
PDXPUG

November 17, 2011

My Story

• How did I get to collectd?
• What is collectd
• Hacking collectd
• Using collectd with Postgres
• Visualizing the data

markwkm (PDXPUG) collectd & PostgreSQL November 17, 2011 2 / 43

Brief background

• Working at a little company called Emma http://myemma.com
• Collect performance data from production systems


What did we have?

• A database with over 1 million database objects
• >500,000 tables
• >1,000,000 indexes

• Tables alone generate 11,000,000 data point per sample


What did we try?

Only free things:
• Cacti http://www.cacti.net/
• Ganglia http://ganglia.info/
• Munin http://munin-monitoring.org/
• Reconnoiter https://labs.omniti.com/labs/reconnoiter
• Zenoss http://community.zenoss.org/


What doesn’t work

Dependency on RRDtool; can’t handle more than hundreds of thousands of
metrics (Application Buﬀer-Cache Management for Performance: Running the
World’s Largest MRTG by David Plonka, Archit Gupta and Dale Carder, LISA
2007):
• Cacti
• Ganglia
• Munin
• Reconnoiter
• Zenoss


Reconnoiter almost worked for us

Pro’s:
• Write your own SQL queries to collect data from Postgres
• Used Postgres instead of RRDtool for storing data
• JavaScript based on-the-ﬂy charting
• Support for integrating many other health and stats collection solutions
Con’s:
• Data collection still couldn’t keep up; maybe needed more tuning
• Faster hardware? (using VM’s)
• More hardware? (scale out MQ processes)


Couldn’t bring myself to try anything else

• Hands were tied, no resources available to help move forward.
• Can we build something light weight?
• Played with collectd (http://collectd.org/) while evaluating
Reconnoiter


What is collectd?

collectd is a daemon which collects system performance
statistics periodically and provides mechanisms to store the
values in a variety of ways, for example in RRD ﬁles.

http://collectd.org/


Does this look familiar?

Note: RRDtool is an option, not a requirement

What is special about collectd?

From their web site:
• it’s written in C for performance and portability
• includes optimizations and features to handle hundreds
of thousands of data sets
• PostgreSQL plugin enables querying the database
• Can collect most operating systems statistics (I say “most” because I
don’t know if anything is missing)
• Over 90 total plugins
http://collectd.org/wiki/index.php/Table_of_Plugins


collectd data description

• time - when the data was collected
• interval - frequency of data collection
• host - server hostname
• plugin - collectd plugin used
• plugin instance - additional plugin information
• type - type of data collected for set of values
• type instance - unique identiﬁer of the metric
• dsnames - names for the values collected
• dstypes - type of data for values collected (e.g. counter, gauge, etc.)
• values - array of values collected


PostgreSQL plugin conﬁguration
Deﬁne custom queries in collectd.conf:

LoadPlugin postgresql
<Plugin postgresql>
<Query magic>
Statement "SELECT magic FROM wizard;"
<Result>
Type gauge
InstancePrefix "magic"
ValuesFrom magic
</Result>
</Query>
...


. . . per database.

...
<Database bar>
Interval 60
Service "service_name"
Query backend # predefined
Query magic_tickets
</Database>
</Plugin>

Full details at
http://collectd.org/wiki/index.php/Plugin:PostgreSQL


Hurdles

More meta data:
• Need a way to save schema, table, and index names; can’t diﬀerentiate
stats between tables and indexes
• Basic support of meta data in collectd but mostly unused
• How to store data in something other than RRDtool


Wanted: additional meta data

Hack the PostgreSQL plugin to create meta data for:
• database - database name (maybe not needed, same as
plugin instance)
• schemaname - schema name
• tablename - table name
• indexname - index name
• metric - e.g. blks hit, blks read, seq scan, etc.


Another database query for collecting a table statistic

<Query table_stats>
SELECT schemaname, relname, seq_scan
FROM pg_stat_all_tables;
<Query>


Identify the data

<Result>
Type counter
InstancePrefix "seq_scan"
InstancesFrom "schemaname" "relname"
ValuesFrom "seq_scan"
</Result>


Meta data speciﬁc parameters

<Database postgres>
Host "localhost"
Query table_stats
SchemanameColumn 0
TablenameColumn 1
</Database>

Note: The database name is set by what is speciﬁed in the <Database>tag, if
it is not retrieved by the query.


Example data

• time: 2011-10-20 18:04:17-05
• interval: 300
• host: pong.int
• plugin: postgresql
• plugin instance: sandbox
• type: counter
• type instance: seq scan-pg catalog-pg class
• dsnames: {value}
• dstypes: {counter}
• values: {249873}


Example meta data

• database: sandbox
• schemaname: pg catalog
• tablename: pg class
• indexname:
• metric: seq scan


Now what?

Hand’s were tied (I think I mentioned that earlier); open sourced work to date:

• collectd forked with patches
https://github.com/mwongatemma/collectd
• YAMS https://github.com/myemma/yams


Yet Another Monitoring System


Switching hats and boosting code

Using extracurricular time working on equipment donated to Postgres from
SUN, IBM, and HP to continue prooﬁng collectd changes.


How am I going to move the data?

Options from available write plugins; guess which I used:
• Carbon - Graphite’s storage API to Whisper
http://collectd.org/wiki/index.php/Plugin:Carbon
• CSV http://collectd.org/wiki/index.php/Plugin:CSV
• Network - Send/Receive to other collectd daemons
http://collectd.org/wiki/index.php/Plugin:Network
• RRDCacheD http://collectd.org/wiki/index.php/Plugin:RRDCacheD
• RRDtool http://collectd.org/wiki/index.php/Plugin:RRDtool
• SysLog http://collectd.org/wiki/index.php/Plugin:SysLog
• UnixSock http://collectd.org/wiki/index.php/Plugin:UnixSock
• Write HTTP - PUTVAL (plain text), JSON
http://collectd.org/wiki/index.php/Plugin:Write_HTTP


Process of elimination

If RRDtool (wriiten in C) can’t handle massive volumes of data, a Python
RRD like database probably can’t either:
• Carbon
• CSV
• Network
• RRDCacheD
• RRDtool
• SysLog
• UnixSock



Writing to other collectd daemons or just locally doesn’t seem useful at the
moment:
• CSV
• Network
• SysLog
• UnixSock



Let’s try CouchDB’s RESTful JSON API!
• CSV
• SysLog


Random: What Write HTTP PUTVAL data looks like
Note: Each PUTVAL is a single line but is broken up into two lines to ﬁt onto
the slide.

PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_octets
interval=10 1251533299:197141504:175136768
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_ops
interval=10 1251533299:10765:12858
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_time
interval=10 1251533299:5:140
PUTVAL leeloo.lan.home.verplant.org/disk-sda/disk_merged
interval=10 1251533299:4658:29899


Random: What the Write HTTP JSON data looks like
Note: Write HTTP packs as much data as it can into a 4KB buﬀer.
[ {
"values": [197141504, 175136768],
"dstypes": ["counter", "counter"],
"dsnames": ["read", "write"],
"time": 1251533299,
"interval": 10,
"host": "leeloo.lan.home.verplant.org",
"plugin": "disk",
"plugin_instance": "sda",
"type": "disk_octets",
"type_instance": ""
}, ... ]

I didn’t know anything about CouchDB at the time

• Query interface not really suited for retrieving data to visualize
• Insert performance not suited for millions of metrics of data over short
intervals (can insert same data into Postgres several orders of
magnitude faster)


Now where am I going to put the data?

Hoping that using the Write HTTP is still a good choice:
• Write an ETL
• Table partitioning logic; creation of partition tables
• Transform JSON data into INSERT statements
• Use Postgres


Take advantage of partitioning

At least table inheritance in Postgres’ case; partition data by plugin


How much partitioning?

Lots of straightforward options:
• Date
• Database
• Schema
• Table
• Index
• Metric


Back to the ETL

Parameters set for fastest path to working prototype:
• Keeping using HTTP POST (Write HTTP plugin) for HTTP protocol
and JSON
• Use Python for built in HTTP Server and JSON parsing (Emma is
primarily a Python shop)
• Use SQLAlchemy/psycopg2


Back again to the ETL

Python didn’t perform; combination of JSON parsing, data transformation,
and INSERT performance still several orders of magnitude below acceptable
levels:
• redis to queue data to transform
• lighttpd for the HTTP interface
• fastcgi C program to push things to redis
• multi-threaded C program using libpq for Postgres API
• pop data out of redis
• table partitioning creation logic
• transform JSON data into INSERT statements


Success?

• Table statistics for 1 million tables collect in approximately 12 minutes.
• Is that acceptable?
• Can we go faster?


If you don’t have millions of data

Easier ways to visualize the data:
• RRDtool
• RRDtool compatible front-ends
http://collectd.org/wiki/index.php/List_of_front-ends
• Graphite with the Carbon and Whisper combo
http://graphite.wikidot.com/
• Reconnoiter


__ __
/ ~~~/ . o O ( Thank you! )
,----( oo )
/ __ __/
/| ( |(
^ /___ / |
|__| |__|-"


Acknowledgements

Hayley Jane Wakenshaw

__ __
/ ~~~/
,----( oo )
/ __ __/
/| ( |(
^ /___ / |
|__| |__|-"


License

This work is licensed under a Creative Commons Attribution 3.0 Unported
License. To view a copy of this license, (a) visit
http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a
letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco,
California, 94105, USA.


collectd & PostgreSQL

More Related Content

What's hot

Viewers also liked

Similar to collectd & PostgreSQL

More from Mark Wong

Recently uploaded

collectd & PostgreSQL