Plugin:mcelog
Mcelog plugin | |
---|---|
Type: | read |
Callbacks: | config, init, read, shutdown |
Status: | supported |
First version: | 5.8 |
Copyright: | 2016–2017 Intel Corporation |
License: | MIT license |
Manpage: | collectd.conf(5) |
List of Plugins |
The purpose of mcelog plugin is to send notifications and stats relevant to Machine Check Exceptions (MCE) when they occur. The plugin leverages the mcelog Linux utility to detect that an exception has occurred. mcelog supports a client server model and does the logging and accounting of exceptions when they occur. The plugin simply leverages the client protocol of mcelog to detect when an exception has occurred. The goal of this equivalence feature is to expose Reliability, Availability and Serviceability (RAS) features metrics and events provided by the platform to higher level fault management applications. The plugin does the following:
- Checks mcelog server liveliness, reports a failure if it’s not running or if it fails.
- Retrieve aggregated Memory Corrected and Uncorrected Errors from the client protocol (Submit event/stat).
Mcelog must be configured to run on the platform in daemon mode and logging capabilities must be enabled. For a full description of available options please refer to the collectd.conf(5) manual page.
Synopsis
<Plugin mcelog> <Memory> McelogClientSocket "/var/run/mcelog-client" PersistentNotification false </Memory> </Plugin>
Will be changed after branch "feat_mcelog_mem_notification_level" is merged (default if all commented for now is socket):
# <Plugin mcelog> # <Memory> # McelogClientSocket "/var/run/mcelog-client" # PersistentNotification false # </Memory> # McelogLogfile "/var/log/mcelog" # </Plugin>
Parameters
None yet
Metrics
Metric/Feature Name |
Date Type |
Format Example |
Internal Collectd Version |
Description |
Dependencies |
Limitations |
Comments |
---|---|---|---|---|---|---|---|
Memory corrected errors |
Int |
51522 |
None |
Number of Corrected memory errors since the system boot |
|
|
gets metrics from mcelog daemon. |
Memory corrected errors in 24 Hours |
Int |
51522 |
None |
Number of Corrected memory errors since previous 24 hours |
|
|
gets metrics from mcelog daemon. |
Memory Uncorrected errors |
Int |
51522 |
None |
Number of Corrected memory errors since the system boot |
|
|
gets metrics from mcelog daemon. |
Memory Uncorrected errors in 24 Hours |
Int |
51522 |
None |
Number of Corrected memory errors since previous 24 hours |
|
|
gets metrics from mcelog daemon. |
Socket |
Int |
0 |
None |
Socker number error occurred on |
|
|
gets metrics from mcelog daemon. |
Channel |
Char |
0 |
None |
Memory channel each channel represents a DIMM module |
|
|
gets metrics from mcelog daemon. |
Memory DIMM |
Char |
B1 |
None |
Memory DIMM corresponding the memory used by the cores errors occurred on |
|
|
gets metrics from mcelog daemon. |
Memory Slot |
Char |
1 |
None |
Memory slot corresponding the memory used by the cores errors occurred on |
|
|
gets metrics from mcelog daemon. |
CPU ID |
Int |
0 |
Future |
CPU ID of the cores errors occurred on. Will be added to new EDAC plugin |
|
|
|
Memory Page |
Hex |
0x12345 |
Future |
Memory page corresponding the memory used by the cores errors occurred on. Will be added to new EDAC plugin |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
Memory Offset |
Hex |
0x0 |
Future |
Memory offset in the page. Will be added to new EDAC plugin |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
Memory Row |
Hex |
0x12345 |
|
|
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
Memory Grain |
Int |
8 |
Future |
The byte granularity or the error grain. Will be added to new EDAC plugin |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
Error Syndrome |
Hex |
0x6ce3 |
Future |
Memory syndrome corresponding the memory used by the cores errors occurred on. Will be added to new EDAC plugin |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
Error Type |
Text |
|
Future |
Error type. Will be added to new EDAC plugin |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
Error code |
Integer |
0101:0090 |
Future |
Error code put out by EDAC. Will be added to new EDAC plugin |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
Logging |
Log path |
|
|
Configurable logging path |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
dimmX or rankX directory info |
Varying |
|
Future |
Expose interface files provided by sysfs through mcX/dimmX or rankX directories |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
csrowX directory info |
Varying |
|
Future |
Expose interface files provided by sysfs through mcX/csrowX directories |
|
|
Not part of Collectd. Currently available with kernel EDAC logs |
RAS interrupts |
Count on each core |
[CoreID]:[InterruptCont] |
Future |
Expose the RAS related interrupts on cores of interest via Collectd |
|
|
Discussion open to see if this info can be exposed through the plugin. |
Example Graph
None yet. Add one now!