Difference between revisions of "Plugin:DCPMM"

From collectd Wiki
Jump to: navigation, search
(Add DCPMM plugin page)
 
(Update DCMPP plugin page with info on metrics)
Line 14: Line 14:
  
 
== Synopsis ==
 
== Synopsis ==
 +
{{See|[[Plugin:DCPMM/Config]]}}
  
 
  <Plugin "dcpmm">
 
  <Plugin "dcpmm">
Line 22: Line 23:
 
  </Plugin>
 
  </Plugin>
  
For a description of all options, see {{Manpage|collectd.conf|5|plugin_dcpmm}}
+
=== Parameters ===
  
 +
{|class="wikitable"
 +
! Name
 +
! Description
 +
! Comment
 +
|-
 +
| Interval
 +
| The collection interval in seconds at which the metric counts are collected
 +
| Defaults to global Interval value. This will override the global Interval value for dcpmm plugin. None of the other plugins will be affected.
 +
|-
 +
| CollectHealth
 +
| Health information metrics will be collected if set to true
 +
| Default value is false.
 +
|-
 +
| CollectdPerfMetrics
 +
| Memory performance metrics will be collected if set to true
 +
| Default value is true.
 +
|-
 +
| EnableDispatchAll
 +
| This parameter helps to seamlessly enable simultaneous health and memory performance metrics collection in future.
 +
| This is unused at the moment and must always be false.
 +
|}
 +
 +
== Metrics ==
 +
The DCMPP plugin collects health metrics or performance metrics (currently doesn't support collecting both sets of metrics simultaneously).
 +
 +
=== Health Information Metrics ===
 +
The health information metrics are the following:
 +
 +
{|class="wikitable"
 +
! Metric
 +
! Description
 +
|-
 +
| health_status
 +
| <nowiki>Overall health summary (0: normal | 1: non-critical | 2: critical | 3: fatal).</nowiki>
 +
|-
 +
| lifespan_remaining
 +
| The module’s remaining life as a percentage value of factory expected life span.
 +
|-
 +
| lifespan_used
 +
| The module’s used life as a percentage value of factory expected life span.
 +
|-
 +
| power_on_time
 +
| The lifetime the DIMM has been powered on in seconds.
 +
|-
 +
| uptime
 +
| The current uptime of the DIMM for the current power cycle in seconds.
 +
|-
 +
| last_shutdown_time
 +
| The time the system was last shutdown. The time is represented in epoch (seconds).
 +
|-
 +
| media_temperature
 +
| The media’s current temperature in degree Celsius.
 +
|-
 +
| controller_temperature
 +
| The controller’s current temperature in degree Celsius.
 +
|-
 +
| max_media_temperature
 +
| The media’s the highest temperature reported in degree Celsius.
 +
|-
 +
| max_controller_temperature
 +
| The controller’s highest temperature reported in degree Celsius.
 +
|-
 +
| tsc_cycles
 +
| The number of tsc cycles during each interval.
 +
|-
 +
| epoch
 +
| The timestamp in seconds at which the metrics are collected from DCPMM DIMMs.
 +
|}
 +
 +
=== Memory Performance Metrics ===
 +
The Health information metrics are the following:
 +
{|class="wikitable"
 +
! Metric
 +
! Description
 +
|-
 +
| total_bytes_read
 +
| Number of bytes transacted by the read operations.
 +
|-
 +
| total_bytes_written
 +
| Number of bytes transacted by the write operations.
 +
|-
 +
| read_64B_ops_rcvd
 +
| Number of read operations performed to the physical media in 64 bytes granularity.
 +
|-
 +
| write_64B_ops_rcvd
 +
| Number of write operations performed to the physical media in 64 bytes granularity.
 +
|-
 +
| media_read_ops
 +
| Number of read operations performed to the physical media.
 +
|-
 +
| media_write_ops
 +
| Number of write operations performed to the physical media.
 +
|-
 +
| host_reads
 +
| Number of read operations received from the CPU (memory controller).
 +
|-
 +
| host_writes
 +
| Number of write operations received from the CPU (memory controller).
 +
|-
 +
| read_hit_ratio
 +
| Measures the efficiency of the buffer in the read path. Range of 0.0 - 1.0.
 +
|-
 +
| write_hit_ratio
 +
| Measures the efficiency of the buffer in the write path. Range of 0.0 - 1.0.
 +
|-
 +
| tsc_cycles
 +
| The number of tsc cycles during each interval.
 +
|-
 +
| epoch
 +
| The timestamp in seconds at which the metrics are collected from DCPMM DIMMs.
 +
|}
 
== Example Graph ==
 
== Example Graph ==
 
{{No Example Graph}}
 
{{No Example Graph}}
Line 30: Line 142:
  
 
* [https://github.com/intel/intel-pmwatch libpmwapi]   
 
* [https://github.com/intel/intel-pmwatch libpmwapi]   
 +
 +
== Caveats ==
 +
* Health metrics and performance metrics cannot be collected simultaneously.
 +
 +
== History ==
 +
* {{Version|5.11}} New plugin for Intel Optane DC Presistent Memory (DCPMM) added.
  
 
== See also ==
 
== See also ==
  
 
* [[Plugin:DCPMM/tests]]
 
* [[Plugin:DCPMM/tests]]
* [https://wiki.opnfv.org/display/fastpath/DCPMM DCPMM plugin high level design document]
+
* [https://wiki.opnfv.org/display/fastpath/DCPMM DCPMM plugin metric list]
 +
* [https://wiki.opnfv.org/display/fastpath/Collectd+DCPMM+Plugin+HLD DCPMM plugin high level design document]
  
 
[[Category:Plugins]]
 
[[Category:Plugins]]
 
[[Category:Plugins requiring privileges]]
 
[[Category:Plugins requiring privileges]]
 
{{DEFAULTSORT:DCPMM}}
 
{{DEFAULTSORT:DCPMM}}

Revision as of 16:30, 18 March 2020

DCPMM plugin
Type: read
Callbacks: init, config, read, shutdown
Status: supported
First version: 5.11
Copyright: 2019 Intel Corporation
Hari TG
License: MIT license
Manpage: collectd.conf(5)
List of Plugins

The dcpmm plugin will collect Intel(R) Optane(TM) DC Persistent Memory (DCPMM) related performance and health statistics. The plugin requires root privileges to perform the statistics collection.

Synopsis

→ See: Plugin:DCPMM/Config
<Plugin "dcpmm">
  Interval 10.0
  CollectHealth false
  CollectPerfMetrics true
  EnableDispatchAll false
</Plugin>

Parameters

Name Description Comment
Interval The collection interval in seconds at which the metric counts are collected Defaults to global Interval value. This will override the global Interval value for dcpmm plugin. None of the other plugins will be affected.
CollectHealth Health information metrics will be collected if set to true Default value is false.
CollectdPerfMetrics Memory performance metrics will be collected if set to true Default value is true.
EnableDispatchAll This parameter helps to seamlessly enable simultaneous health and memory performance metrics collection in future. This is unused at the moment and must always be false.

Metrics

The DCMPP plugin collects health metrics or performance metrics (currently doesn't support collecting both sets of metrics simultaneously).

Health Information Metrics

The health information metrics are the following:

Metric Description
health_status Overall health summary (0: normal | 1: non-critical | 2: critical | 3: fatal).
lifespan_remaining The module’s remaining life as a percentage value of factory expected life span.
lifespan_used The module’s used life as a percentage value of factory expected life span.
power_on_time The lifetime the DIMM has been powered on in seconds.
uptime The current uptime of the DIMM for the current power cycle in seconds.
last_shutdown_time The time the system was last shutdown. The time is represented in epoch (seconds).
media_temperature The media’s current temperature in degree Celsius.
controller_temperature The controller’s current temperature in degree Celsius.
max_media_temperature The media’s the highest temperature reported in degree Celsius.
max_controller_temperature The controller’s highest temperature reported in degree Celsius.
tsc_cycles The number of tsc cycles during each interval.
epoch The timestamp in seconds at which the metrics are collected from DCPMM DIMMs.

Memory Performance Metrics

The Health information metrics are the following:

Metric Description
total_bytes_read Number of bytes transacted by the read operations.
total_bytes_written Number of bytes transacted by the write operations.
read_64B_ops_rcvd Number of read operations performed to the physical media in 64 bytes granularity.
write_64B_ops_rcvd Number of write operations performed to the physical media in 64 bytes granularity.
media_read_ops Number of read operations performed to the physical media.
media_write_ops Number of write operations performed to the physical media.
host_reads Number of read operations received from the CPU (memory controller).
host_writes Number of write operations received from the CPU (memory controller).
read_hit_ratio Measures the efficiency of the buffer in the read path. Range of 0.0 - 1.0.
write_hit_ratio Measures the efficiency of the buffer in the write path. Range of 0.0 - 1.0.
tsc_cycles The number of tsc cycles during each interval.
epoch The timestamp in seconds at which the metrics are collected from DCPMM DIMMs.

Example Graph

None yet. Add one now!

Dependencies

Caveats

  • Health metrics and performance metrics cannot be collected simultaneously.

History

  • 5.11 New plugin for Intel Optane DC Presistent Memory (DCPMM) added.

See also