They are numerical values quantifying a certain behavioral aspect of your software, which is saved in a time series storage for seeing over a period of time. Most software emit metrics, whether your service runs on a pod or the k8s cluster itself.
To put it into context, the throughput (Requests per minute) or average response time of your API calls per minute are some metrics that you'll be familiar with and must have noticed in dashboards.
Metrics in the observability and monitoring context, are the primary indicators of the performance of any system/ application, software, or tool. It quantifies system performance, resource usage, errors, and other key aspects to provide insights into a system's behavior and health.
What are the common metrics in the Observability and Monitoring context?
Latency
Measures the time it takes for a system to respond to requests, helping assess performance and user experience. It tells us how quickly the system does things, helping us see if it's fast or slow.
Error-Rate
Tracks the percentage of requests or transactions that result in errors, indicating system reliability. It shows how many mistakes the system makes, which helps us know if it's working well or not.
Throughput
Measures the number of requests or transactions processed per unit of time, reflecting system workload. This tells us how many concurrent tasks the system can do in a certain time, showing how busy it is.
Resource Utilisation
Monitors the usage of system resources like CPU, memory, and disk space to prevent bottlenecks.
Saturation
Reflects the degree to which a resource is being used, helping ensure optimal resource allocation and capacity planning.