Minutes from 2020-07-06
From collectd Wiki
Revision as of 01:36, 7 July 2020 by Sranganath
Collectd 6.0 & Open Metrics work (by Octo)
- Changed the core data structures in Collectd 6.0 branch that caused good amount of work
- Data structures are copy of Prometheus protobuf, it has structures called metric family that has metric name, metric_type that indicates gauge or counter, etc. Metric family can contain many metrics where metrics hold actual value, sampling time & interval (addition to prometheus format), holds labels (in meta data). Implications:
- DispatchValue is taking metric family instead
- It allows plugins like CPU/memory, send unrelated metrics to daemon in one call so they can have same timestamp.
- Currently plugins can populate the time field to the function they sent, if not dispatch function will send the timestamp if its unset. This might result in a difference in millisecond range of when the metric value was read from and when it is sent out by the dispatch function.
- Two differences with this apprach:
- Openmetrics uses timestamp in milliseconds since epoch, collectd is much more accurate compared to this approach. We need to divide by 1000 to send to prometheus
- Collectd used to store 'interval', but its not something we can export to Prometheus
- Functions that work today already:
- dispatch logic, read/write plugin logic, converting counters to rates
- Metadata associated with cache entry, that allows cache to be read, works
- Ported CPU plugin, write stack driver plugin & write_log plugin, includes formatting json, graphite, stackdriver
- Still need to update tests for these to ensure it works
- Code that are looking up metrics is still to be finalized with design approach, this affects aggregation plugin
- Quite a few decisions to be taken -
- csv plugin is easy to migrate but need to figure out filesystem path for metric labels.
- Write plugins are more or less converted but read plugins should need to be look into for naming schema
- Need to check for accuracy of compatibility layer that converts valuelist to metric family type
- About 173 plugins potentially build in today's Collectd, some of them are barely used and could be removed, for example accent plugin
- Octo to put together a single document on design decisions, trade off with one huge document vs. many small ones is that all the info is consolidated into one place.
- Collectd 6 branch: merging
- Will clean up git history, once its in collectd 6 branch more people can migrate plugins. Then will ask everyone to contribute - either collaborate in a doc or code
- Looking into Memory plugins. Figuring out way to expose lot of metrics at same time and how to deal with them
- Ensuring backward compatibility is subtle and expect changes
- Release end of the year might be realistic, depends on how many are open to migrate the 173 plugins.
- Tried to identify as many plugins as possible, need to fix lot of libraries. Updating write_http plugins is fairly straightforward with changes to metrics formatting. Not quite ready for everyone to jump in port plugins, but would be a nice thing to consider
Porting effort for 6.0
- Continued maintaining 4.x branch relatively long time, eventually stopped maintaining about 5 years
- But 4.11 bug fixes kept coming in for about 6 months after 5.0 release when it ended up dead
- So 6.0 will be released but 5.x will be still supported taking only major security releases.
- For 6.0 release, create separate directory for not as frequently used plugins (ex. teamspeak plugin, etc.). CPU/memory plugins could be with core plugin list
- Is 6.0 release without all plugins need to be ported?
- Need to be open for people to port additional plugins to 6.0 version, only accept plugins where people are invested in porting the plugins
- Florian will send a doc for 6.0 features being written and to be done
Go collectd changes for 6.0
- 6.0 changes will break go-collectd framework. Will maintain different branches, will check to see if the packages could be kept backward compatible
- API needs to be stabilized, currently porting key plugins in C for 6.0, need to come up with stable API for go-collectd. Once there, don’t expect huge changes, then 6.0 branch in go-collectd could be created which passes metrics family. Go data structures with plain text protocol need to be updated
Interns on distribution metrics
- 3 interns started today in Google (2nd or 4th semester of bachelors), looking to research solution and write design document on distribution metrics. They will present doc in next call. Working on 6.0 branch, as a new feature in 6.0.
- Example usage of distribution metrics: considering latency of web service metrics, 1 metric every 10 sec, every metric with 2000 requests, naïve approach is calculate all requests happening in 10 seconds. If we want to use latency as Service Level Indicator, we want to calculate 95th percentile of all metrics. This is what distribution metrics allows us to do, a distribution over certain range of metrics.
- Sunku put together a google doc that details the feature list of go-collectd that are completed and yet to be written
- To send out the google doc to the mailing list
- Volunteers are welcome to join the efforts.