Kubernetes Cost Recovery: Reclaim Wasted Cloud Spend

The average Kubernetes cluster wastes between 20% and 40% of its monthly cloud bill on resources that are allocated but not doing any useful work.

That figure comes up consistently across FinOps audits and platform engineering reviews. And it is almost never obvious from looking at the bill. Your cloud cost dashboard shows spend by service, region, and cluster. It does not show you the orphaned PVC from a StatefulSet that was deleted four months ago. It does not flag the staging namespace that nobody has touched since the last product launch. It does not tell you that 60% of your pod CPU requests are going unused.

Cost recovery — finding that waste and reclaiming it — is not a one-time cleanup exercise. It is a discipline. This guide covers where the waste hides, how to quantify it with real dollar figures, and how to recover it systematically.

Why Kubernetes Clusters Accumulate Waste

Kubernetes makes resource creation fast and resource tracking hard. A developer can spin up a Deployment, a Service, two ConfigMaps, a Secret, and a PersistentVolumeClaim in a few minutes. When the experiment is done, they delete the Deployment — and leave everything else running.

This is not carelessness. It is a structural problem.

Kubernetes does not automatically clean up related resources unless owner references are explicitly set. A Service and its ConfigMap do not know they belong to a Deployment that no longer exists. They just keep running. The PVC keeps holding an EBS volume. The LoadBalancer Service keeps provisioning a cloud load balancer. The billing continues.

Several factors accelerate the accumulation:

Fast-moving teams. Sprint cycles create and abandon resources frequently. Nobody owns cleanup as a task.
No TTL on resources. Kubernetes has no built-in mechanism to expire a namespace, a ConfigMap, or a PVC after inactivity.
Helm upgrades that orphan old releases. When a chart generates names with a hash or version suffix, upgrading creates new resources without reliably removing old ones.
Partial deletions. Developers delete the Deployment. The Service, Secrets, and PVCs stay.
Namespace sprawl. Feature branches, pull-request environments, and debugging namespaces that were never torn down.

The longer a cluster runs under active development without systematic auditing, the larger the waste accumulates. A cluster that is 18 months old with a team of 8 engineers typically has hundreds of orphaned resources. At realistic per-resource costs, that translates to real money.

Common Waste Sources: What They Cost

Orphaned ConfigMaps, Secrets, and Services

ConfigMaps and Secrets stored in etcd carry negligible direct compute cost — but that is not why you should care about them. Orphaned Secrets containing database passwords, API keys, or TLS certificates from decommissioned services are an unnecessary attack surface. In environments subject to SOC 2, HIPAA, or PCI-DSS, undocumented credentials are an audit finding. Remediation — incident response, policy review, audit fees — costs far more than any cloud storage savings.

Orphaned Services of type ClusterIP have no direct billing impact, but Services of type LoadBalancer are a different matter. Each one provisions a cloud load balancer that bills you whether or not any pods sit behind it. On AWS with a Network Load Balancer: roughly $16/month. On GKE: roughly $18/month. On AKS: roughly $18/month.

On clusters with six or more months of active development, finding 8 to 15 orphaned LoadBalancer Services is not unusual. At $18 each, that is $144 to $270/month from load balancers serving zero traffic. See how to find orphaned Kubernetes resources for a full audit workflow by resource type.

Unused PVCs Holding Expensive Storage

PersistentVolumeClaims are the most expensive category of Kubernetes waste by total dollar volume. When a StatefulSet or Deployment using persistent storage is deleted, the underlying PersistentVolume survives. Most managed Kubernetes StorageClasses default to a Retain reclaim policy — by design, to prevent accidental data loss. The side effect: every orphaned PVC keeps billing you for storage indefinitely.

Cloud storage rates across major providers:

Provider	Disk Type	Cost per GB/month
AWS (EKS)	gp3 SSD	$0.08
AWS (EKS)	io1 SSD	$0.125
GCP (GKE)	Standard SSD	$0.17
Azure (AKS)	Premium SSD	$0.135
Azure (AKS)	Standard SSD	$0.10

A cluster with 15 orphaned PVCs averaging 80 GB each costs $96–$204/month in storage for volumes serving no application. Over a year: $1,152–$2,448. For data that nobody is reading.

Over-Provisioned Pods

Over-provisioning is subtler than orphaned resources, but it is often the largest single waste category by dollar volume. Kubernetes schedules pods based on resource requests, not actual usage. If your requests are set too high — which they almost always are, because engineers set conservative estimates at deploy time and rarely revisit them — you are reserving node capacity that goes unused.

Industry data consistently shows that 30–50% of requested CPU and memory goes unused in typical production clusters. A pod with requests: cpu: 500m, memory: 512Mi that actually uses 150m CPU and 180Mi memory is reserving 350m CPU and 332Mi memory on a node that could be running other workloads. Multiply that across a 50-pod deployment and you have the equivalent of several full nodes reserved but idle.

On a typical m5.large instance on AWS ($0.096/hour, ~$70/month), a cluster running 10 nodes at 35% waste is paying for 3.5 nodes that do nothing — roughly $245/month in avoidable cost, just from over-provisioning on a single cluster.

Idle Dev and Staging Namespaces

Dev and staging namespaces are the most consistently overlooked cost source. They are created for a purpose (testing, a feature branch, a release candidate environment), that purpose ends, and the namespace stays. With Deployments still running at full replica count. With PVCs still holding data. With LoadBalancer Services still routing to pods that exist but serve no traffic.

These namespaces are particularly expensive because they often mirror production configuration — same resource requests, same persistent storage, sometimes the same node types. A staging environment that is a 50% replica of production costs 50% of production. If production costs $20,000/month and staging has been idle for 90 days, that is $30,000 in recoverable spend.

Manual Recovery: kubectl Commands That Surface Waste

If you want to audit a cluster manually before investing in tooling, these commands give you a starting point.

Find ConfigMaps and Secrets outside system namespaces:

bash
kubectl get configmaps -A | grep -v kube-system
kubectl get secrets -A | grep -v kube-system | grep -v "kubernetes.io/service-account-token"

Cross-reference the output against your running workloads. Any ConfigMap or Secret with no corresponding Deployment, StatefulSet, or CronJob is a cleanup candidate.

Find PVCs not bound to any running pod:

bash
kubectl get pvc -A | grep -v Bound

Unbound PVCs are clearly orphaned. For Bound PVCs, check whether any running pod is actually mounting them:

bash
kubectl get pods --all-namespaces -o json | \
  jq -r '.items[] | . as $pod | .spec.volumes[]? | select(.persistentVolumeClaim) | \
  "\($pod.metadata.namespace)/\(.persistentVolumeClaim.claimName)"' | sort -u

Any PVC not in this list is mounted by nothing — and may be deletable.

Find over-provisioned pods:

bash
kubectl top pods -A

Compare actual CPU and memory usage against declared requests:

bash
kubectl get pods -A -o json | \
  jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name) \(.spec.containers[].resources.requests)"'

When actual usage consistently runs at 20–30% of requests across a Deployment, you have an over-provisioning problem.

Find idle namespaces:

bash
kubectl get namespaces
kubectl get pods -n <namespace> --field-selector=status.phase=Running

Any non-system namespace with zero running pods that still has Services, PVCs, or Secrets is a candidate for full teardown.

Why Manual Doesn't Scale

The kubectl approach above works for a single cluster, a few namespaces, and a few hours of your time. It does not scale.

The core problem is that orphan detection is not a list operation — it is a graph problem. A ConfigMap that looks unused might be mounted by a CronJob that only runs at month-end. A Service with no matching pods might be a valid headless Service for StatefulSet DNS. A PVC with no current mount might be referenced by a suspended workload. The naive approach — "no pods reference this, so it's orphaned" — generates false positives that lead to production incidents when the wrong thing gets deleted.

Proper orphan detection requires a dependency graph: every resource linked to every other resource it references, with liveness checks to determine whether the reference is active or stale. Building that graph manually for one cluster takes hours. Running it weekly across five clusters is a part-time job. And you still need to track cost attribution, prioritize findings, and maintain a deletion history.

Beyond accuracy, there is the continuity problem. A one-time audit cleans up what exists today. By next month, new orphans have accumulated. Manual auditing is a recurring cost — developer time spent on work that does not ship features — not a one-time investment. See the FinOps guide to Kubernetes waste for a deeper look at why cost dashboards alone don't surface this category of spend.

Automated Cost Recovery: How Scanning Tools Work

Automated scanning tools solve both the accuracy and continuity problems. They deploy inside each cluster — typically as a CronJob — build the full resource dependency graph on a schedule, and continuously identify resources that are orphaned, over-provisioned, or idle.

A well-implemented scanner does several things that manual kubectl commands cannot:

Transitive orphan detection. If Resource B is orphaned because Resource A (its parent workload) is also orphaned, a naive scan flags only A. A dependency graph flags both, in the correct deletion order.
Cross-namespace analysis. ClusterRoles, ClusterRoleBindings, and PersistentVolumes operate at the cluster level. Namespace-scoped audits miss cross-namespace dependencies.
Liveness checks. Not just "does a reference exist" but "is the referencing workload actually running." A Deployment scaled to zero still references its ConfigMaps and Secrets — those references are stale.
Cost attribution. Each finding tagged with a dollar estimate so you can prioritize by recovery value, not by resource count.

This translates directly to cost recovery optimization strategies that are actionable — a ranked list of what to delete first, with confidence levels and estimated monthly savings per item.

What to Expect: Recovery Benchmarks

Based on typical cluster profiles:

Cluster Age	Typical Orphan Rate	Expected Monthly Recovery
Under 3 months	5–10% of resources	$50–$200
3–12 months	15–25% of resources	$200–$800
1–2 years	25–40% of resources	$500–$2,000
Over 2 years	35–50% of resources	$1,000–$5,000+

The 20–40% total cloud spend reduction figure is realistic for organizations that have never run a systematic cleanup, include over-provisioning improvements alongside orphan removal, and operate multiple clusters. A company running 8 clusters at an average age of 18 months, with a combined cloud bill of $150,000/month, can realistically recover $30,000–$60,000/year on a first cleanup pass.

The caveat: these numbers require actually doing the work — identifying waste, validating it is safe to remove, and deleting it with proper process. A scan that surfaces findings but never triggers cleanup is not cost recovery. It is a more expensive version of not knowing.

Run a Free KorPro Scan — No Cloud Credentials Required

KorPro is self-hosted. The Inspector agent deploys inside your cluster via Helm and scans entirely within your infrastructure. It does not require cloud provider credentials, IAM role grants, or network access to your cloud account. Everything stays inside your cluster — only the findings you choose to report cross the boundary, which makes the model air-gapped and on-prem friendly, unlike SaaS cost tools that ingest your cluster data into their cloud.

bash
helm install korpro-inspector oci://ghcr.io/kortechnologies/charts/korpro-inspector \
  --namespace korpro-system \
  --create-namespace \
  --set licenseKey="<your-key>"

The first scan completes in 1–3 minutes. The dashboard shows every orphaned resource with its type, namespace, cost estimate, and recommended deletion order. Over-provisioned pods are flagged with actual vs. requested usage side by side.

The free tier covers one cluster with no credit card required.

Run a free KorPro scan at app.korpro.io — see your cluster's recoverable waste in under 5 minutes.

If you are managing multiple clusters or want a guided assessment of your FinOps posture, visit the KorPro cost recovery use case page to see how teams structure ongoing cleanup programs.

The waste is already there. The only question is when you go find it.

Kubernetes Cost Recovery: How to Find and Reclaim Wasted Cloud Spend

Why Kubernetes Clusters Accumulate Waste

Common Waste Sources: What They Cost

Orphaned ConfigMaps, Secrets, and Services

Unused PVCs Holding Expensive Storage

Over-Provisioned Pods

Idle Dev and Staging Namespaces

Manual Recovery: kubectl Commands That Surface Waste

Why Manual Doesn't Scale

Automated Cost Recovery: How Scanning Tools Work

What to Expect: Recovery Benchmarks

Run a Free KorPro Scan — No Cloud Credentials Required

Ready to Clean Up Your Clusters?

Related Articles

What an AI Cost Optimization Tool Actually Sees

Self-Hosted vs SaaS AI Cost Optimization: How to Choose

Cutting AI Costs Shouldn't Mean Shipping Your Logs to Another Vendor