Kubernetes in Production: Real Use Cases & Hidden Costs

You already know how Kubernetes works. You don't need an explainer on what a Deployment is.

What you might not have mapped out explicitly is how each type of production workload generates its own distinct pattern of waste — not the waste from poor resource requests, but the structural residue that accumulates as your platform evolves: orphaned resources, stale data, namespaces that outlived their purpose, credentials no workload references anymore.

Every K8s use case leaves behind a different kind of debris. This post maps that debris so you know what to look for in your own clusters. For a broader treatment of cleanup strategies, see our Kubernetes cost optimization guide.

Microservices Platforms: The Slow Accumulation

Microservices are where most production K8s clusters start. The waste pattern here is incremental rather than dramatic — no single event creates a huge bill, but the accumulation is relentless.

ConfigMap and Secret sprawl. Each service gets its own ConfigMap and Secret per environment at creation. Each time you onboard a new service, add two to four objects per namespace. When you decommission a service, the Deployment gets removed but the ConfigMaps and Secrets frequently stay. Nobody's kubectl delete-ing the config; they're focused on getting the new thing running. A platform with 80 services and two years of churn might have 40–60% more ConfigMaps and Secrets than it has active workloads. Individually these are nearly free, but they expand your security surface area and make audits painful — especially for secrets. See our Kubernetes secrets detection guide for how to trace which secrets actually have live references.

Orphaned Services. Services often get created with a Deployment and deleted separately — or not at all. A Services without backing pods still costs nothing directly, but it holds a cluster IP, may have been pointed to by Ingress rules, and creates noise that makes kubectl get svc increasingly useless for operators trying to understand the cluster's actual topology. On cloud providers that auto-provision external load balancers, every orphaned LoadBalancer-type Service costs $15–25/month.

Sidecar over-provisioning. When you're running a service mesh or centralized logging, every pod carries sidecars. Sidecars that made sense for a high-traffic service get cloned into low-traffic internal tools with the same resource requests. If your Envoy sidecar requests 100m CPU and 128Mi memory per pod, and you have 300 pods running services that see 10 requests/hour, you're reserving node capacity for work that isn't happening.

ML and Data Pipelines: The Big-Ticket Waste

ML workloads generate the most expensive individual waste events. The economics are different from microservices — fewer resources, but each one can cost thousands per month.

GPU nodes left running after jobs complete. The pattern is well-understood but still happens constantly: a training job finishes, the pod completes, but the node isn't released because the cluster autoscaler waits for its scale-down delay (default: 10 minutes, but commonly set to 30–60 minutes on GPU nodes to avoid thrash). If something goes wrong with the scale-down — a DaemonSet pod blocking eviction, a non-evictable pod from a previous run, a misconfigured PodDisruptionBudget — the node stays. An ml.p4d.24xlarge on AWS costs $32.77/hour. One node stuck for a weekend because a team went offline is $1,572 in waste. Scale that to a platform running dozens of jobs per week.

Large PVCs holding stale training data. Training jobs typically write checkpoints and intermediate artifacts to PVCs. When the experiment is abandoned, the PVC stays bound because it has no TTL and nobody explicitly deleted it. A single 500Gi SSD volume on GCP costs $85/month. A data science team running 20 parallel experiments over a year will have accumulated multi-terabytes of PVC storage from work nobody is actively using. This doesn't show up as "waste" in cost allocation — it shows up as a storage line item attributed to the data team's namespace.

Namespace sprawl per experiment. One namespace per experiment was a popular isolation pattern, but it generates sprawl when experiments aren't cleaned up. Namespaces are free, but each carries RBAC objects, LimitRanges, ResourceQuotas, and experiment-specific Secrets and ConfigMaps. Fifty abandoned experiment namespaces is fifty orphaned resource collections, each requiring manual investigation before deletion.

Multi-Tenant SaaS: Churn Creates Namespace Debt

If you're running a SaaS product where each tenant gets their own namespace (or set of namespaces), customer churn generates Kubernetes waste directly. This is one of the more predictable waste patterns once you see it, but it's easy to miss because the cost is diffuse.

Per-tenant namespace accumulation. Off-boarding removes application workloads but often leaves the namespace and its non-application objects. Namespace deletion is treated as high-stakes, so teams delay it — which means namespaces from churned tenants accumulate. A multi-tenant platform with 15% annual churn and 500 tenants retires 75 namespaces per year. After three years without systematic cleanup, you're carrying 150–200 zombie namespaces.

Secrets accumulation from churned tenants. Each tenant had credentials: database connection strings, API keys, webhook secrets. These get stored as Kubernetes Secrets, often with tenant-specific names. After churn, these Secrets have no live workload referencing them — but they remain, holding credentials that may still be valid at the upstream service. This isn't just a cost issue; it's a security issue. Orphaned Secrets with live credentials expand your blast radius if the cluster is compromised.

Idle tenant namespaces. Some customers reduce usage without churning — downgrading from paid to freemium. Their namespace stays alive with pods scaled to zero but the workload is genuinely idle. Node capacity reserved for zero-replica Deployments across 30 namespaces adds up. The cascading orphans pattern bites hard here: a zero-replica Deployment looks "active" to a naive scanner but drives real cost through the PVCs and Services still attached to it.

CI/CD Workloads: Pipelines That Leave Traces

CI/CD on Kubernetes is efficient when pipelines clean up after themselves. When they don't — and they often don't — you end up with a layer of ephemeral-turned-permanent resources.

Ephemeral namespaces from crashed pipelines. The pattern: pipeline creates a namespace, deploys a test environment, runs tests, deletes the namespace. This works when the pipeline succeeds. When the pipeline fails at step three — deployment crash, test failure, runner timeout — the namespace persists with whatever resources were created. If your pipeline creates 50 test namespaces per day and 5% of runs fail without cleanup, you're accumulating 2–3 abandoned namespaces per day. After a month, that's 60–90 orphaned namespaces with running (or crash-looping) pods consuming CPU and memory.

Build cache PVCs that grow unbounded. Caching node_modules, Maven artifacts, or Docker layers on a PVC is a real speedup. But nobody sets a retention policy on the volume contents — it grows, auto-expands when the storage class allows it, and silently accumulates. A cache PVC that started at 50Gi ends up at 200Gi after a year. Ten services with separate SSD cache volumes on GCP ($0.17/Gi/month) is $340/month on caches that could be pruned aggressively with no build impact.

Test ConfigMaps and Secrets from failed runs. Test environments need configuration — database connection strings, mock credentials, feature flags. When the namespace isn't cleaned up after a failed run, neither are they. And when resources were created in a shared namespace as a workaround for permission scoping, they outlast every pipeline that created them.

Internal Developer Platforms: The Environment Debt Problem

IDPs built on Kubernetes give developers self-service access to environments. The model works well for getting environments up; it rarely solves the problem of getting them down.

Dev and staging environments that outlive their purpose. An engineer spins up an environment to test a feature branch. The feature ships, the branch merges, the PR closes — the namespace doesn't. Without an automated lifecycle hook connecting "PR closed" to "namespace deleted," environments accumulate. A platform team with 50+ active developers can easily carry 80–100 environments that haven't seen meaningful traffic in weeks. The compute cost — even 2–4 pods per environment — runs $500–2,000/month in aggregate.

Duplicated configurations. Templating environments at scale means configuration drift. A team copies a base ConfigMap, modifies it for their use, and the copy stops tracking the source. You end up with 15 versions of the same configuration across the cluster. Before you can safely remove any of them, someone has to understand the divergence — which compounds the cleanup cost.

Over-allocated resource quotas per team. ResourceQuotas get set at namespace creation based on what teams ask for, not what they use. Actual usage ends up at 20–30% of quota, but the scheduler constrains bin packing around the quota ceiling. On a shared cluster, this means you're running more nodes than the actual workload requires. Nobody reprices quotas because it requires cross-team coordination and nobody wants to be accountable for breaking someone's headroom.

The Cross-Cutting Pattern

Every use case looks different on the surface, but the underlying pattern is the same:

Resources are created as part of normal operations.
The workload that motivated the resources is removed or becomes idle.
The resources persist — either because cleanup isn't automated, or because cleanup failed silently.
Over time, the ratio of active resources to total resources drifts downward.

The difference between use cases is in which resources accumulate and how expensive they are individually. For microservices, it's ConfigMaps and Secrets — cheap individually, but a security surface and audit headache. For ML workloads, it's PVCs and occasionally GPU nodes — expensive immediately. For multi-tenant SaaS and IDPs, it's namespaces and the entire resource tree hanging off them.

The waste is distributed, which is what makes it hard to act on. No single orphaned resource is alarming. It's the aggregate across namespaces and clusters that becomes meaningful — and most tooling surfaces resources one at a time.

A scanner that understands dependency chains rather than just direct references is the difference between finding 20% of your waste and finding 80% of it. A Secret referenced by a zero-replica Deployment isn't being used. A PVC attached to a Job that completed six months ago isn't being used. Detecting this requires evaluating liveness at the workload level and propagating that status through the dependency graph — which is what cascading orphan detection does.

See Your Cluster's Actual Waste Profile

If any of these patterns sound familiar, you're probably carrying more orphaned resources than you realize. The question isn't whether the waste exists — it does in every production cluster at sufficient scale and age. The question is how much, and in which namespaces.

KorPro detects waste across all of these patterns — microservices sprawl, ML pipeline residue, multi-tenant namespace debt, CI/CD cleanup failures, and IDP environment accumulation — and shows you cost impact per resource and per cluster. It builds a dependency graph rather than doing simple reference checks, so it catches cascading orphans that other tools miss.

Run a free KorPro scan at app.korpro.io to see your cluster's waste profile. You'll get a breakdown by resource type, namespace, and estimated monthly cost — a map of what each of your use cases has left behind.

For teams that want to operationalize cleanup rather than run it once, KorPro's cleaner clusters workflow gives you continuous scanning, Slack alerting, and safe deletion workflows so the residue doesn't build back up.

Kubernetes in Production: Real Use Cases and Their Hidden Cost Implications

Microservices Platforms: The Slow Accumulation

ML and Data Pipelines: The Big-Ticket Waste

Multi-Tenant SaaS: Churn Creates Namespace Debt

CI/CD Workloads: Pipelines That Leave Traces

Internal Developer Platforms: The Environment Debt Problem

The Cross-Cutting Pattern

See Your Cluster's Actual Waste Profile

Ready to Clean Up Your Clusters?

Related Articles

What an AI Cost Optimization Tool Actually Sees

Self-Hosted vs SaaS AI Cost Optimization: How to Choose

Cutting AI Costs Shouldn't Mean Shipping Your Logs to Another Vendor