It's not working. This page tries to help with the most common pitfalls:
No data appears on the server
- Are RRD-files created on the server at all?
- If so, the data was successfully received by the server at least once. You should probably continue with #Graphs are empty below.
- Are the packets actually sent by the client and received by the server?
tcpdump -i eth0 -p -n -s 1500 udp port 25826
- On the client, you should see outgoing packets with roughly the frequency specified by the interval setting. Collectd does not want to waste bandwidth with packet overhead and will buffer multiple measurements until an UDP-packet is close to full so if you have few sensors it might take multiple intervals before a packet is sent.
- In case you only have a few active sensors and a long interval, restarting collectd after at least one(1) interval has passed should cause a flush of any gathered measurements giving you a packet.
- If you do not see incoming data on the server, this is a problem with your network, not with collectd. Check that firewalls allow communication on the appropriate port. See that servers use the correct interface to send data.
- Is the receiving socket opened by the server?
netstat -lnp | grep collectd
- If not, the configuration of the Network plugin is incorrect or something went wrong during start-up.
Graphs are empty
I'm using the RRDtool plugin to record incoming data in RRD-files on the server side. The files are created by the daemon but when I create graphs from the files they are empty.
- Is the last modification timestamp (mtime) of the RRD-files changing?
watch -n 10 'ls -l $RRD_FILE'
- Are the files re-created if you delete or rename the files?
- Is the last update field in the RRD-file changing?
while sleep 10; do rrdtool info $RRD_FILE | grep last_update; done
- If the RRD-files are getting modified and (re)created correctly, check your types.db(5) file. If it is inconsistent between clients and server – especially if the data source type differs – then the server may not record any data. The Network plugin uses the types.db to parse network traffic.
- The interval of received packets might be too big, causing the server side graph software to consider the node offline. Check UDP packet loss and/or graph software settings - for testing, set the client interval configuration option to a small value (e.g., two seconds).
- Make sure that only one client sends data for any given RRD file. The most common cause for two clients updating the same file is that they are using the same host name. In all likelihood this is a very bad idea.
(Huge) Spikes in RRD files and graphs
This is most commonly associated with the
COUNTER data source type. Both, the
COUNTER data source types (DSTs) divide the change between two reads by the time between the reads. See the data source page for a full discussion of the topic.
The main difference between the
COUNTER data source types is how they handle the case when the new value is smaller than the old value. The
DERIVE DST will interpret this decrease as a negative rate and – if the minimum value is set to zero – discard this value. The
COUNTER DST on the other hand assumes that the 32bit or 64bit value overflowed and will calculate a (positive) rate accordingly.
The problem is that sometimes the
COUNTER DST is too clever for its own good: If the counter is reset (i.e. forced back to zero), the false assumption that an overflow had taken place will result in huge values to be computed. Assume, for example, that the old value is 5 Billion (5 ⋅ 109). Then the counter was reset to zero and has increased to 42 since then. Because the old value is greater than 232 – 1, an 64bit overflow is assumed. Thus, the new rate is calculated as 42 + 264 – 5 ⋅ 109 which is roughly 18 ⋅ 1018 (18 Quintillion).
Just like the minimum value of zero prevents negative rates to be allowed when using
DERIVE, a maximum value can be configured. Huge spikes are only allowed into the RRD file if no maximum value has been set or if the maximum value was set too high. The trick is to set this maximum value and forcing RRDtool to throw away all offending rates.
file="/path/to/your.rrd" rrdtool tune "$file" --maximum value:1234 rrdtool dump "$file" | rrdtool restore --range-check - "$file--FIXED" mv "$file--FIXED" "$file"
The above uses rrdtune(1) to set the new maximum rate of 1234. Then the offending values are removed using the export / import with range check trick.
"Value is too old" errors using perl via EXEC plugin
You may get a lot of "value is too old" errors when using perl script via Exec plugin, if script is proving values via PUTVAL method.
The issues turned out to be output buffering.
This one line in the beginning of the perl script fixes the issue:
# If set to nonzero, forces a flush after every write or print: $| = 1;