Debian Conference 2013 - Munin
In doubt, just graph it !
Steve Schnepp
Munin Project Lead
Agenda
- A Brief History
- Design principles
- New features in 2.0
- Scalability (master/nodes/data)
- Limitations of 2.0
- Roadmap of 2.2
A Brief History
- 2002 - Born as LRRD
- 2007 - Hacked zooming for 1.2
- 2010-2011 - Slowly took over leadership
- 2012 - Released 2.0
- 2.0 for its 10 years !
- In wheezy since Sept 2012
- 2013 - Released 2.1
- 2.1 is unstable
- Means internals will change with minor versions
- Oct 2013 is the target for 2.2
Design Principles
"Simple things should be simple, complex things should be possible." -- Alan Kay
- Very easy to use
- Sane out-of-box behaviour
- Complete plug-and-play
- Our users: mostly the 1 server+node type...
- ... but some are running bigger installs
- These are the growing market
New features in 2.0
- Full CGI implementation
- Native SSH transport
- Avoids opening new ports
- Secure, usually even more integrated than TLS
- Async proxy
- Loose connections
- Speeds up polls
- Various update rates
Scalability
- Scaling the master
- Handling more munin-nodes
- Scaling the nodes
- Handling a huge number of plugins
- Handling slow plugins
- Scaling the data
- Keep more RRD data
- Increase RRD precision (sub 5 min)
Scaling the master (1/2)
- Use FastCGI
- Use RRDcacheD
- to escape the I/O hell
- ... even on SSD !
- never read from the RRD files in cron
- Have RAM. Lots of it.
- RRDcacheD can make use of big buffers
- Multiply the number of workers...
- ... but do not swap. Just limit the workers
Scaling the master (2/2)
- Beware of shared hardware
- Munin loves to annihilate any hardware
- It is designed to be highly scalable...
- ... but not in a very efficient manner
- Use the async proxy
- Enables a very fast collection
- Lowers the number of update workers needed
- Avoids data loss when munin-update is too slow
Scaling the node
- Handling a huge number of plugins
- async proxy has the --fork option. It enables to fetch all
- Handling slow plugins
- The plugins can poll themselves ...
- ... or just use the --fork option :)
Scaling the data
- Keep more data in the RRD
- Configured via custom graph_data_size (on RRD create)
- Handled automatically by RRD
- Very fast, but can use quite a lot of space.
- Increase RRD precision (sub 5 min)
- Called supersampling, it's the plugin that polls itself, and sends the whole data back each poll
- The async proxy can also be used for that, it should just work out-of-box by just setting a different update_rate
- Always use the default RRAs precision, to have 1 px in the default graphs that maps to 1 RRA step
Limitations of 2.0
- CGI of HTML is still very ugly
- Usage of a Storable is very slow on reload
- The UI itself doesn't really scale
- The node "namespace" is essentially "flat"
- The UI is very static, not what one expects in 2013
- Comparison pages are useless on large installs
- It also lack proper ACL. No filter either.
Roadmap of 2.2
- Move from Storable to SQL
- DBI-based : SQLite by default, PostgreSQL possible
- Enables dynamic HTML UI, and ACL
- ... will require a deep rewrite of core code
- RRD will stay as RRD. Only meta-data is concerned
- Full async-aware updates
- No more 5 min mandatory polls, but it will still be default.
- Nodes can push directly their data to the master
- Real time monitoring !
- Collectd, beware ! We are coming your way.
- New, full HTML5 UI
- Grouping of nodes. Custom and auto-hinted
- Node & graphs aliasing
Feels like a Xmas list. Let's make it happen.