Overlimiting infrastructure budget: what eats your money?

ServerBee Blog
3 min readJun 21, 2024

--

Image by freepik

Creating new versions of application code or its modules typically occurs in development or testing environments. These environments differ from production in frequent changes but lower loads and diversity. Therefore, a new version of code may work stably in the testing environment, even withstand certain speed and load tests, but have a high frequency of failures in production. Such behavior of applications may be observed due to suboptimal code, insufficient testing under high loads, or other reasons. It often leads to failures and consumes a significant part of the budget due to high system requirements and powerful infrastructure over an extended period.

From our experience, the most common reasons for exceeding the budget are: failing to follow best programming practices (suboptimal code) and inadequate testing under high loads. It can and should be optimized. Moreover, it’s about something other than something global but rather about smart resource management.

For example, sometimes an application can create many connections to the database opening new sessions without closing the old ones. This may not pose a problem on a local computer or in developer or testing environments, but in real conditions, such connections could reach hundreds of thousands or even millions. The database operation will be blocked if this happens on-premises and the resource limit is reached. However, using scalable cloud services will lead to considerable overprice, which frequently happens.

A similar situation may arise if query parameters are not limited. A high-performance database without limits will perform quickly, consuming as many resources as it needs. If you don’t use parallel queries, and the parameters are not limited, they(because of high frequency)will also lead to exceeding infrastructure power limits.

Even a single unsuccessful commit can keep the CPU load almost at maximum. It could be caused by choosing the wrong sorting algorithm or adding functions without considering their impact on system resources, increasing CPU usage, if the new features involve complex computing resource-intensive tasks or memory management issues that cause excessive garbage collection and reduce overall system performance, etc.

Insufficient limits on certain resources and incorrect configuration also significantly increase infrastructure budget costs.

For example, a particular database solution requires a certain amount of RAM. If there isn’t enough memory in the configuration, the database starts actively using the disk system and CPU. It affects the speed of infrastructure operation and its longevity.

Human factor. Sometimes, editing certain variables in auto-scaling configurations or enabling additional options increases the number of CPU threads, for example from 8 to 32. If it happens unconsciously or uncontrollably, it becomes a problem and regularly eats your budget, so you should regularly check and keep all unnecessary options disabled.

Code from ChatGPT can also eat your money.

Now developers actively use AI to get code snippets for automating routine operations. But you shouldn’t trust AI-provided snippets too much because, firstly, it has to interpret your requests, and its decisions and snippets can be not optimal for your specific case. Secondly, it may be an acceptable option for quickly testing new modules and functions. Still, it should be noted that AI doesn’t take into account the context, such as infrastructure tariff limits and budgets, the environment, and the conditions where the code will be executed. Therefore, when using AI, you should critically evaluate and verify the examples according to the best programming practices, and check the relevance of the code specifically for your conditions and environment.

This is our list of the most common pain points that lead to excessive infrastructure budget spending. What can you tell about your experience? Please, share your opinion in the comments.

--

--

ServerBee Blog

We specialize in scalable DevOps solutions. We help companies in supporting critical software applications and infrastructure on AWS, GCP, Azure even BareMetal.