Spending $5,000 a Month on Kubernetes? Up to 30% of That May Be Recoverable Waste
When your Kubernetes bill crosses $5K/month, the waste hiding inside it becomes expensive enough to matter. Here is what you are paying for that is not doing any work — and how to find it without disrupting production.
Your Kubernetes bill just crossed $5,000 a month. Congratulations — and also: you probably have a problem you haven't looked at yet.
At lower spend levels, waste is a rounding error. A forgotten PVC here, an idle deployment there — the numbers are too small to justify the operational time to find them. But at $5K/month and above, the math changes. A cluster carrying 20–30% in recoverable waste is burning $1,000–$1,500 every month on resources that are not serving any workload, running any job, or holding any data that anyone needs.
The waste does not announce itself. It accumulates quietly: a StatefulSet gets deleted but its PersistentVolumeClaims stay behind. A load test namespace from three sprints ago still exists. A LoadBalancer service is still provisioned for an application that was decommissioned. Each item looks small in isolation. Together, they are why your cloud bill keeps climbing even when your engineering team insists they have not deployed anything significant.
This post breaks down where that waste comes from, how to find it, and what a structured audit looks like for a cluster at your scale.
The $5K Threshold: Why It Matters
Below a certain spend level, optimization is a judgment call. Above it, it becomes an operational responsibility.
A cluster spending $5,000/month is almost certainly running workloads across multiple teams, multiple namespaces, and a long deployment history. That history is where waste lives. Every deployment that was not cleaned up after its environment was decommissioned. Every database PVC that was not deleted when the team migrated to a managed service. Every node group that was scaled up for a traffic spike and never scaled back down.
The older the cluster and the more teams sharing it, the more waste it carries — because cleanup responsibilities fall between team boundaries. DevOps assumes the application team cleaned up their namespace. The application team assumes DevOps handles infrastructure teardown. The result is neither team cleans it up, and the bill grows.
What $1,500/Month in Waste Actually Looks Like
In a $5K/month cluster, here is a realistic waste inventory — not a theoretical one:
Orphaned PersistentVolumeClaims: Three migrations ago, a team moved their PostgreSQL StatefulSet to a managed database. The StatefulSet is gone. The three 200 GB PVCs it left behind are still provisioned on gp3 EBS volumes. That is 600 GB × $0.08/GB = $48/month. Multiplied across four other similar migrations: $240/month in storage nobody is using.
Stale LoadBalancer services: A microservice was sunset six months ago. The LoadBalancer service in front of it was never deleted. The cloud provider is still provisioning the external load balancer. At roughly $18/month per load balancer, three of these left behind = $54/month.
Idle node groups: A node group was provisioned for a batch processing workload that now runs on Fargate. The node group still has two on-demand instances running in case the batch job needs them — but it has not run there in four months. Two m5.xlarge nodes × $0.192/hr = $276/month in compute serving zero workloads.
Stale namespaces from ended projects: A compliance audit required a staging environment that mirrored production for 60 days. The audit passed eight months ago. The staging namespace is still running a partial copy of the application, including its database, its cache, and its internal services. Rough monthly cost of that namespace: $400–$600.
Unlabeled resources no team claims: Several deployments in the default namespace have no owner labels. Nobody knows who created them. Combined resource requests suggest $80–$100/month in compute allocation that cannot be attributed or cleaned up without an investigation.
Running total: $1,000–$1,200/month recoverable — inside a single $5K/month cluster.
This is not an unusual scenario. It is what most clusters above $5K look like when someone finally looks carefully.
Why It Is Hard to See Without a Structured Audit
Standard cost monitoring tools show you spend trends. They tell you your namespace costs are up 12% month-over-month. They do not tell you which specific PVC in which specific namespace is responsible, who created it, when, and whether it is safe to delete.
The gap between "costs are rising" and "here is the specific resource to delete and why it is safe to do so" is the audit gap. Most teams live in that gap for months before doing something about it.
The reasons:
- Dashboards aggregate. A $240/month storage line item across 40 PVCs is invisible until you look at PVCs specifically. Dashboards show the total; the audit finds the individual items.
- Teams do not see what they do not own. If your team does not have a namespace, you do not see what is running there. Cross-namespace orphan detection requires a cluster-wide view.
- Nobody is incentivized to own cleanup. The team that would do the cleanup is also the team shipping features. Cleanup has no sprint ticket and no deployment date.
- Automated cleanup is risky. Even if you know waste exists, deleting it without validation risks taking down something that was still in use. So cleanup gets deferred until someone has time to validate carefully — which means it gets deferred indefinitely.
What a Structured Audit Covers
A thorough cost audit at the $5K+ tier covers seven categories:
| Category | What to Look For | Why It Is Expensive |
|---|---|---|
| Orphaned PVCs and Released PVs | Claims not mounted by any pod; PVs in Released state | Full provisioned-size disk billing continues |
| Stale namespaces | Namespaces with no running pods, age > 30 days | All resource costs within the namespace accumulate |
| LoadBalancer services with no endpoints | spec.type=LoadBalancer + empty subsets | Cloud LB provisioned and billed per service |
| Idle Deployments | spec.replicas == 0 for 30+ days | Potential LB costs; reserved node capacity blocked |
| Overprovisioned node groups | Nodes at < 30% CPU/memory for 2+ weeks | Compute cost without proportional workload value |
| Completed Jobs without TTL | Jobs with completionTime set, no TTL cleanup | Minor cost, high hygiene debt |
| Resources without owner labels | Any resource class missing owner or team labels | Cleanup blocked; cost attribution broken |
For the full detection commands for each category, see the Kubernetes cost audit checklist for EKS, GKE, and AKS.
The Audit-First Principle: Find Before You Delete
The instinct when you find $1,500/month in waste is to delete it immediately. Resist this.
In a $5K/month cluster, some of those resources that look idle are not. A PVC that has not been mounted in 30 days might belong to a StatefulSet in a crash loop that the team is actively debugging. A deployment with zero replicas might be a cron-driven scaler that will scale it up at the end of the month. A LoadBalancer service might be referenced by an external DNS entry that an external partner is still using.
The right sequence is:
- Audit — run a read-only scan to get the full inventory
- Quantify — estimate monthly cost per finding
- Review — send findings to resource owners for confirmation
- Act — delete what is confirmed safe, with a record of the approval
This sequence adds 24–48 hours of review time. It prevents incidents. At $5K/month, an incident caused by accidental deletion of a production data store costs significantly more than a two-day delay.
For more on this approach, see read-only Kubernetes cost optimization: how to find waste without installing agents.
Running the Audit: Manual vs Automated
Manual approach — possible with kubectl and jq, but time-intensive at scale:
bash# Find PVCs not in Bound state kubectl get pvc --all-namespaces -o json | \ jq '.items[] | select(.status.phase != "Bound") | {name: .metadata.name, namespace: .metadata.namespace, size: .spec.resources.requests.storage}' # Find LoadBalancer services with no endpoints kubectl get endpoints --all-namespaces -o json | \ jq '.items[] | select(.subsets == null) | {name: .metadata.name, namespace: .metadata.namespace}' # Find stale namespaces (no running pods) kubectl get pods --all-namespaces -o json | \ jq '[.items[].metadata.namespace] | unique' > /tmp/active_ns.json kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | \ tr ' ' '\n' | grep -Fvf /tmp/active_ns.json
Running this manually across one cluster takes a few hours. Across multiple clusters, it becomes a project. And the output needs to be aggregated, deduplicated, and estimated for cost before it becomes actionable.
Automated approach — KorPro runs the same checks across all your clusters in minutes, groups findings by namespace and resource type, and shows estimated monthly savings per finding without requiring write access or cloud credentials. See how the KorPro inspector detects waste in under 5 minutes.
What Happens After the Audit
Once findings are reviewed and confirmed, cleanup is fast. Storage deletion, service removal, and namespace teardown are low-risk once you have confirmed there is nothing active inside them. The audit is the work — the cleanup is the easy part.
For teams managing multiple clusters, the savings compound. A single cluster finding $1,000–$1,500/month in recoverable waste, multiplied across three or five clusters, is a material reduction in cloud spend — with no change to engineering capacity or application performance.
The cost recovery use case page covers how KorPro structures findings for teams making this case internally.
Find Out What's Wasting Money in Your Cluster
KorPro audits your cluster with read-only access — no agents, no cloud credentials — and surfaces exactly what is recoverable and what it costs.
Ready to Clean Up Your Clusters?
KorPro automatically detects unused resources, orphaned secrets, and wasted spend across all your Kubernetes clusters. Start optimizing in minutes.
Related Articles
Read-Only Kubernetes Cost Optimization: How to Find Waste Without Installing Agents
Security-conscious platform teams can discover significant Kubernetes waste using only read-only cluster access — no agents, no cloud credentials, no write permissions required. Here is how the audit-first model works.
How MSPs Recover Margin from Unused Kubernetes Resources Across Customer Clusters
MSPs and cloud service providers managing Kubernetes for customers absorb infrastructure waste that erodes margin and complicates billing. Here is how to identify and recover that waste across customer clusters without creating operational risk.
Kubernetes Cost Audit Checklist for EKS, GKE, and AKS
A practical Kubernetes cost audit checklist covering idle workloads, orphaned storage, stale namespaces, and ownership gaps across EKS, GKE, and AKS. Built for platform teams who need to recover real spend.
Written by
KorPro Team