How to improve the efficiency of infrastructure monitoring

ServerBee Blog
2 min readJun 13, 2024

--

Full monitoring coverage. What is it for?

Insufficient monitoring coverage can lead to unnoticed issues in infrastructure performance, security, and reliability for a significant period. Even with well-configured monitoring, certain metrics may temporarily lose visibility due to specific circumstances. It creates additional challenges. The critical situation is when an incident occurs, but we discover it post facto, and have minimal information about the event or a significant time gap. It takes considerable time to resolve issues without accurate and necessary data, affecting application quality and, in some cases, causing substantial losses. How can we address and prevent this?

Internal monitoring of Kubernetes

If you are working with Kubernetes, you likely have installed Prometheus and set up several exporters to monitor the status of essential services and nodes within K8s:

a) Node exporter (measures machine resources, memory, disk, and CPU usage, extracts metrics from hardware and the operating system provided by the system’s kernel);

b) Postgres or MongoDB exporters (monitors and displays metrics of the respective DB);

c) IPsec exporter (determines the state of configured IPsec tunnels);

d) S.M.A.R.T exporter (allows tracking the status of disk drives by obtaining S.M.A.R.T data), and so on;

e) Also blackbox exporter, which constantly checks all or specific URLs.

External monitoring — third-party services for application performance

While Prometheus offers various exporters, external monitoring might be a better option. Since all the exporters run within the same Kubernetes cluster as the application, a cluster failure would halt not only the application but also Prometheus. To avoid this and receive immediate notifications, external endpoint monitoring tools are recommended. Services like Pingdom, Uptime Robot, Freshping, and others can ensure continuous monitoring in such scenarios.

--

--

ServerBee Blog

We specialize in scalable DevOps solutions. We help companies in supporting critical software applications and infrastructure on AWS, GCP, Azure even BareMetal.