Plugin:Write Redis/Design

From collectd Wiki
Jump to: navigation, search

Now that we have a C-based Redis plugin (it queries information about the server), how would we go about storing collectd metrics in a Redis database? A first version of a Write Redis plugin is available from the ff/redis branch, in case you want to try different schemata for yourself.

Problem description

collectd collects performance metrics periodically and stores them. Each performance value is identified by an identifier which can easily be converted to a string. This string could serve as a unique key. Typical installation sizes handle between 100 (a single server) and 100,000 (a data center worth of servers) metrics.

For each identifier data will be received / written periodically. The standard update-interval is ten seconds, so there will be between ten and 10,000 updates per second.

Usually the problem isn't the amount of data to be written (1–2 MByte/s) but the quasi random access pattern that results from updating the appropriate files on disk. The de-facto standard method for storing data, the RRDtool plugin, has this problem a lot, see Inside the RRDtool plugin.

Consumers (graphing front-ends) want to use an identifier and a time-range to query all values in that range to draw a graph of the performance in the requested interval.

Possible solutions

I've identified a couple of possible methods for storing metrics using Redis. Unfortunately I'm not an expert on Redis, so I'm unsure which is the best overall method. Any feedback is highly welcome.

Unique keys

Redis is essentially a key-value-store. So the easiest way would be to create unique keys to store each value_list_t:

"collectd/host.example.com/cpu-0/cpu-idle/1281432171" = "593333"

This would probably be very efficient since each value is only written once and never updated. Apparently Redis is very good at handling a high number of keys. However, handling this "schema" is probably a nightmare, so extracting the data again or discarding old values is probably next to impossible.

Lists

A better schema for extraction and maintenance could be to use lists as value (rather than strings). So we could use for example:

"collectd/host.example.com/cpu-0/cpu-idle" = ["1281432171:593333", "1281432554:629530", …]

This would reduce the number of keys to the number of values actually handled by collectd but I'm uncertain about Redis' performance when storing lists.[0]

Sorted sets

An interesting alternative to lists might be sorted sets (zset). The score required for sorting the entries would be the timestamp. The huge advantage of using a sorted set is the ZRANGEBYSCORE command which could be used to query all values within a given time-range, making it trivial to request data for graphing. Example (adding a value):

ZADD "collectd/host.example.com/cpu-0/cpu-idle" 1281432171 "593333"

Footnotes

  • [0] The wiki talks about a \mathcal{O}(1) complexity for RPUSH, but I doubt the operations will be CPU-bound.

Feedback

Thomas S.:

The redis lists are implemented with double linked list so insertion/deletion are very fast. And it makes sense to see that functions like LINDEX are O(n).

I've to do something similar and I've found out a solution with sorted set. The problem with others solution is to request redis to get a range of values.

Here is what I'll try to implement:

hosts        -> [host]
host         -> (metric_available)
metrics      -> {timestamp value}

Example:

hosts = [example.com, example2.com]
example.com -> (example.com/cpu-0/cpu-idle, example.com/cpu-0/cpu-blabla)
example2.com -> (example2.com/memory/used, example2.com/memory/free)
example.com/cpu-0/cpu-idle -> {123456789: 1, 12345688: 2}
example.com/cpu-0/cpu-blabla  -> {123456789: 100, 12345688: 150}
example2.com/memory/used -> {123456789: 512, 12345688: 1024}
example2.com/memory/free -> {123456789: 1024, 12345688: 512}


I think this schema is relatively easy to implement.

EDIT: Sorted set are unusable because you cannot have a value twice. So i'll try with a list like your example here

Hi Thomas and thanks for your input :)
I have implemented a basic plugin which uses ZSETs (sorted sets) to store the values. Each entry is a string consisting of the time and the value(s), for example "1281685461:123:234". Including the time serves two purposes: For one the floating point "score" doesn't need to be converted back to integer and second the entries become unique.
Currently there is no additional "index" of the values, but I guess that adding them so a set wouldn't be a problem and might help front-ends. Regards, —octo 07:50, 13 August 2010 (UTC)

Jason D:

Would anyone else find value in supporting pub/sub? We currently use the write_http output plugin to send metrics to an aggregator service (which in turn feeds them to Graphite). Unfortunately we have performance issues with that plugin and would just as soon write out to Redis (which is also consumed by said aggregator).

  • Have you considered the AMQP plugin? It should do pubsub very efficiently and flexible. —octo 06:48, 10 August 2012 (UTC)

Goran:

Jason what kind of performance problems? I am also planing to use something similar. I am planning to forward collectd data to batsd (ruby version of statsd) that uses redis. So I figured out to try directly with redis first.

Please leave your comments here.