Spending $5,000 a Month on Kubernetes? Up to 30% of That May Be Recoverable Waste

Your Kubernetes bill just crossed $5,000 a month. Congratulations — and also: you probably have a problem you haven't looked at yet.

At lower spend levels, waste is a rounding error. A forgotten PVC here, an idle deployment there — the numbers are too small to justify the operational time to find them. But at $5K/month and above, the math changes. A cluster carrying 20–30% in recoverable waste is burning $1,000–$1,500 every month on resources that are not serving any workload, running any job, or holding any data that anyone needs.

The waste does not announce itself. It accumulates quietly: a StatefulSet gets deleted but its PersistentVolumeClaims stay behind. A load test namespace from three sprints ago still exists. A LoadBalancer service is still provisioned for an application that was decommissioned. Each item looks small in isolation. Together, they are why your cloud bill keeps climbing even when your engineering team insists they have not deployed anything significant.

This post breaks down where that waste comes from, how to find it, and what a structured audit looks like for a cluster at your scale.

The $5K Threshold: Why It Matters

Below a certain spend level, optimization is a judgment call. Above it, it becomes an operational responsibility.

A cluster spending $5,000/month is almost certainly running workloads across multiple teams, multiple namespaces, and a long deployment history. That history is where waste lives. Every deployment that was not cleaned up after its environment was decommissioned. Every database PVC that was not deleted when the team migrated to a managed service. Every node group that was scaled up for a traffic spike and never scaled back down.

The older the cluster and the more teams sharing it, the more waste it carries — because cleanup responsibilities fall between team boundaries. DevOps assumes the application team cleaned up their namespace. The application team assumes DevOps handles infrastructure teardown. The result is neither team cleans it up, and the bill grows.

What $1,500/Month in Waste Actually Looks Like

In a $5K/month cluster, here is a realistic waste inventory — not a theoretical one:

Orphaned PersistentVolumeClaims: Three migrations ago, a team moved their PostgreSQL StatefulSet to a managed database. The StatefulSet is gone. The three 200 GB PVCs it left behind are still provisioned on gp3 EBS volumes. That is 600 GB × $0.08/GB = $48/month. Multiplied across four other similar migrations: $240/month in storage nobody is using.

Stale LoadBalancer services: A microservice was sunset six months ago. The LoadBalancer service in front of it was never deleted. The cloud provider is still provisioning the external load balancer. At roughly $18/month per load balancer, three of these left behind = $54/month.

Idle node groups: A node group was provisioned for a batch processing workload that now runs on Fargate. The node group still has two on-demand instances running in case the batch job needs them — but it has not run there in four months. Two m5.xlarge nodes × $0.192/hr = $276/month in compute serving zero workloads.

Stale namespaces from ended projects: A compliance audit required a staging environment that mirrored production for 60 days. The audit passed eight months ago. The staging namespace is still running a partial copy of the application, including its database, its cache, and its internal services. Rough monthly cost of that namespace: $400–$600.

Unlabeled resources no team claims: Several deployments in the default namespace have no owner labels. Nobody knows who created them. Combined resource requests suggest $80–$100/month in compute allocation that cannot be attributed or cleaned up without an investigation.

Running total: $1,000–$1,200/month recoverable — inside a single $5K/month cluster.

This is not an unusual scenario. It is what most clusters above $5K look like when someone finally looks carefully.

Why It Is Hard to See Without a Structured Audit

Standard cost monitoring tools show you spend trends. They tell you your namespace costs are up 12% month-over-month. They do not tell you which specific PVC in which specific namespace is responsible, who created it, when, and whether it is safe to delete.

The gap between "costs are rising" and "here is the specific resource to delete and why it is safe to do so" is the audit gap. Most teams live in that gap for months before doing something about it.

The reasons:

Dashboards aggregate. A $240/month storage line item across 40 PVCs is invisible until you look at PVCs specifically. Dashboards show the total; the audit finds the individual items.
Teams do not see what they do not own. If your team does not have a namespace, you do not see what is running there. Cross-namespace orphan detection requires a cluster-wide view.
Nobody is incentivized to own cleanup. The team that would do the cleanup is also the team shipping features. Cleanup has no sprint ticket and no deployment date.
Automated cleanup is risky. Even if you know waste exists, deleting it without validation risks taking down something that was still in use. So cleanup gets deferred until someone has time to validate carefully — which means it gets deferred indefinitely.

What a Structured Audit Covers

A thorough cost audit at the $5K+ tier covers seven categories:

Category	What to Look For	Why It Is Expensive
Orphaned PVCs and Released PVs	Claims not mounted by any pod; PVs in Released state	Full provisioned-size disk billing continues
Stale namespaces	Namespaces with no running pods, age > 30 days	All resource costs within the namespace accumulate
LoadBalancer services with no endpoints	`spec.type=LoadBalancer` + empty `subsets`	Cloud LB provisioned and billed per service
Idle Deployments	`spec.replicas == 0` for 30+ days	Potential LB costs; reserved node capacity blocked
Overprovisioned node groups	Nodes at < 30% CPU/memory for 2+ weeks	Compute cost without proportional workload value
Completed Jobs without TTL	Jobs with `completionTime` set, no TTL cleanup	Minor cost, high hygiene debt
Resources without owner labels	Any resource class missing `owner` or `team` labels	Cleanup blocked; cost attribution broken

For the full detection commands for each category, see the Kubernetes cost audit checklist for EKS, GKE, and AKS.

The Audit-First Principle: Find Before You Delete

The instinct when you find $1,500/month in waste is to delete it immediately. Resist this.

In a $5K/month cluster, some of those resources that look idle are not. A PVC that has not been mounted in 30 days might belong to a StatefulSet in a crash loop that the team is actively debugging. A deployment with zero replicas might be a cron-driven scaler that will scale it up at the end of the month. A LoadBalancer service might be referenced by an external DNS entry that an external partner is still using.

The right sequence is:

Audit — run a read-only scan to get the full inventory
Quantify — estimate monthly cost per finding
Review — send findings to resource owners for confirmation
Act — delete what is confirmed safe, with a record of the approval

This sequence adds 24–48 hours of review time. It prevents incidents. At $5K/month, an incident caused by accidental deletion of a production data store costs significantly more than a two-day delay.

For more on this approach, see read-only Kubernetes cost optimization: how to find waste without installing agents.

Running the Audit: Manual vs Automated

Manual approach — possible with kubectl and jq, but time-intensive at scale:

bash
# Find PVCs not in Bound state
kubectl get pvc --all-namespaces -o json | \
  jq '.items[] | select(.status.phase != "Bound") | {name: .metadata.name, namespace: .metadata.namespace, size: .spec.resources.requests.storage}'

# Find LoadBalancer services with no endpoints
kubectl get endpoints --all-namespaces -o json | \
  jq '.items[] | select(.subsets == null) | {name: .metadata.name, namespace: .metadata.namespace}'

# Find stale namespaces (no running pods)
kubectl get pods --all-namespaces -o json | \
  jq '[.items[].metadata.namespace] | unique' > /tmp/active_ns.json
kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | \
  tr ' ' '\n' | grep -Fvf /tmp/active_ns.json

Running this manually across one cluster takes a few hours. Across multiple clusters, it becomes a project. And the output needs to be aggregated, deduplicated, and estimated for cost before it becomes actionable.

Automated approach — KorPro runs the same checks across all your clusters in minutes, groups findings by namespace and resource type, and shows estimated monthly savings per finding without requiring write access or cloud credentials. See how the KorPro inspector detects waste in under 5 minutes.

What Happens After the Audit

Once findings are reviewed and confirmed, cleanup is fast. Storage deletion, service removal, and namespace teardown are low-risk once you have confirmed there is nothing active inside them. The audit is the work — the cleanup is the easy part.

For teams managing multiple clusters, the savings compound. A single cluster finding $1,000–$1,500/month in recoverable waste, multiplied across three or five clusters, is a material reduction in cloud spend — with no change to engineering capacity or application performance.

The cost recovery use case page covers how KorPro structures findings for teams making this case internally.

Find Out What's Wasting Money in Your Cluster

KorPro audits your cluster with read-only access — no agents, no cloud credentials — and surfaces exactly what is recoverable and what it costs.

Create your free KorPro account | Contact our team

Stop Wasting Kubernetes Resources

Ready to Clean Up Your Clusters?

KorPro automatically detects unused resources, orphaned secrets, and wasted spend across all your Kubernetes clusters. Start optimizing in minutes.

Get Started Free Contact Us