Taming the cloud: Essential SaaS strategies for managing costs and reducing waste

May 16, 2024

For anyone managing a SaaS platform, cloud costs can often feel like a sizable and unpredictable utility bill that shows up at the end of the month. I’ve seen them leave finance teams in absolute shock. What’s the path forward?

CapEx → OpEx & COGS

Transitioning from server farms (CapEx) to cloud services (OpEx and COGS) over the past ~15 years has fundamentally changed how businesses think about spending on compute. While traditional servers represented a fixed, upfront investment, cloud services are more like renting an apartment — with ongoing costs that get counted directly against operations or goods sold.

Typical SaaS companies find around 5-10% of their ARR flowing into cloud expenses. In my own experience across various SaaS orgs, it’s been in the 6-8% range.

On the bill itself, it’s common to find that 25-50% of it is spent on OpEx (e.g. Telemetry tools that help monitor and triage system issues), with the rest primarily representing COGS (e.g. compute needed to keep an application running for a set of customers).

Strategies to reduce your fixed costs

Effectively managing cloud costs is a balancing act between flexibility and monetary restraint. These are the common strategies for reducing costs related to the amount of cloud resources you tend to use, at minimum, every month:

Cost savings plans
This strategy is typically low effort, low risk, and high reward.
Many cloud providers offer flexible cost savings plans for server instances that provide significant discounts in exchange for commitment to a certain level of spending. These plans are a smart move for businesses with predictable usage patterns around their server instances as they allow access to lower rates while retaining a great deal of flexibility. However, be warned - if there’s a chance you might need to significantly reduce your cloud spend soon, committing to a high spend might trap you into an expensive arrangement.
Reservations
This strategy is typically moderate effort, moderate risk, and high reward.
For companies with stable and predictable long-term needs, committing to reserved server instances can yield higher discounts compared to flexible savings plans. While this strategy locks you into specific types of server instances - as opposed to savings plans which provide more flexibility over time - the cost benefits are more substantial. This strategy is best suited for mature organizations that have clear visibility into their future requirements.
Consolidation of systems
This strategy is typically high effort, low risk, and high reward.
By optimizing resource usage through techniques such as “bin packing” and implementing multi-tenancy architectures, you can significantly reduce wasted capacity and improve overall efficiency. This approach not only maximizes the utilization of existing resources but also decreases the need for additional spending on underutilized services.
Enterprise Discount Plans (EDPs)
This strategy is typically moderate effort, moderate risk, and moderate reward.
EDPs are tailored to rapidly growing startups and offer discounts based on anticipated growth in cloud service usage. While they can provide substantial cost savings for businesses in a high-growth phase, they also come with the risk of locking you into higher future spending. It can’t be understated how important it is to carefully project future usage to avoid overcommitment here.
Resellers and Arbitrage
This strategy is typically moderate effort, moderate risk, and moderate reward.
Engaging with resellers who buy cloud resources in bulk and then pass on the savings to their customers can be another avenue to reduce costs. These players leverage enterprise discount plans and other mechanisms to offer services at reduced rates. However, the effectiveness of this strategy can vary, and sometimes the administrative and support complexities might not justify the potential savings.

The variability dilemma

Once we move past the fixed costs, it becomes clear that there can be significant variability in monthly bills as well. I’ve seen orgs range from 10-30% variable cost per month. How to reason about it?

This variability - usually driven by some combination of auto-scaling, redundant system deployments for upgrades and migrations, and R&D - is a double-edged sword. On one hand, it allows flexibility for the infrastructure team and promotes reliability for customers. On the other, it can spiral into wasteful spending if not monitored closely.

Here’s where waste usually sneaks in:

“Permanent temporaries”: Resources intended for short-term use may linger longer than necessary, inadvertently inflating costs. Whether it’s infrastructure left running after delayed updates or R&D tools unused during project pauses, each scenario begs the question: Is it cheaper to leave it running until needed again or rebuild it at that time? What is the cost of your employees’ time?
Maintenance lapses: Often, there’s no dedicated team managing these costs full-time, leading to suboptimal cost management practices.

Strategies for dealing with variability

To combat these challenges and avoid poor cloud hygiene, consider the following:

Detailed tagging: It’s crucial to map costs back to specific vectors of spend - region, tenancy, environment, initiative, product line, etc. Cloud platforms allow for this type of tagging, but it requires some additional tooling and discipline to get right, especially for containerized resources. This is the biggest effort you will undertake in dealing with variability, but it’s your only shot at getting this right. Just do it.
Tag probable waste: The first time you ever do this, create a tag that indicates likely waste. This way, when you’re done inventorying your system, you can narrow in on what is likely to be waste, make decisions quickly, and get your bill back in shape.
Regular audits: Have your infrastructure team conduct weekly “glances” at and monthly reviews of cloud expenditures.
Proactive ticket management: Ask that infrastructure teams include potential cost implications in their ticket workflows, this way costs can be predicted to the extent possible, and otherwise investigated if a spike pops up. Note that you’re not looking for perfection here; There is a cost to estimation precision, too. You’re just looking for a basic level of diligence.
An accountable party: Assign someone the task of tracking and reporting on potential future anomalies and actual anomalies from the recent past. Remember, not all anomalies are a function of waste! The point of this role is not to eliminate anomalies, but rather to a) help better model them and b) help more quickly catch and remediate them when proven to be wasteful.

Go forward

While cloud costs are indeed variable and difficult to model with ultra-precision - similar to a utility bill - it is in fact possible for a SaaS organization to gets its arms around the scope of potential variability in play, and react quite quickly if and when that fails. Investing even a small amount into process, tooling, and a cultural acceptance of the balance between ‘good’ and ‘bad’ variability will go a long way.

Discussion about this post

Ready for more?