BossaBox

This is the playbook for engineering-playbook

Metrics

Overview

Metrics provide a near real-time stream of data, informing operators and stakeholders about the functions the system is performing as well as its health. Unlike logging and tracing, metric data tends to be more efficient to transmit and store.

Collection Methods

Metric collection approaches fall into two broad categories: push metrics & pull metrics. Push metrics means that the originating component sends the data to a remote service or agent. Azure Monitor and Etsy’s statsd are examples of push metrics. Some strengths with push metrics include:

Some trade-offs with this approach:

In the case of pull metrics, each originating component publishes an endpoint for the metric agent to connect to and gather measurements. Prometheus and its ecosystem of tools are an example of pull style metrics. Benefits experienced using a pull metrics setup may involve:

Items of concern to some may include:

Best Practices

When should I use metrics instead of logs?

Logs vs Metrics vs Traces covers some high level guidance on when to utilize metric data and when to use log data. Both have a valuable part to play in creating observable systems.

What should be tracked?

System critical measurements that relate to the application/machine health, which are usually excellent alert candidates. Work with your engineering and devops peers to identify the metrics, but they may include:

Important business-related measurements, which drive reporting to stakeholders. Consult with the various stakeholders of the component, but some examples may include:

Dimension Labels

Modern metric systems today usually define a single time series metric as the combination of the name of the metric and its dictionary of dimension labels. Labels are an excellent way to distinguish one instance of a metric, from another while still allowing for aggregation and other operations to be performed on the set for analysis. Some common labels used in metrics may include:

Note: Since dimension labels are used for aggregations and grouping operations, do not use unique strings or those with high cardinality as the value of a label. The value of the label is significantly diminished for reporting and in many cases has a negative performance hit on the metric system used to track it.