How MSPs Recover Margin from Unused Kubernetes | KorPro

Managed service providers running Kubernetes infrastructure for customers face a cost problem that most tooling ignores. Cloud waste in customer clusters is not just a customer problem — it is a margin problem. When an MSP absorbs infrastructure costs as part of a managed service, every orphaned PVC, idle node group, and forgotten staging namespace reduces the margin on that engagement. When infrastructure is billed back to the customer, unattributed waste creates billing disputes and erodes trust.

The challenge is doing something about it without creating operational risk. Automated cleanup on a customer cluster — without a validated finding and explicit approval — is exactly the kind of incident that damages relationships. The right model is structured discovery: audit the waste, quantify the impact, review with the customer, and clean up with confidence.

This post is for MSPs and cloud service providers managing Kubernetes infrastructure across multiple customer accounts. It covers where margin leaks, why existing tools do not scale to multi-tenant managed service workflows, and how to run structured cost recovery audits across a customer fleet.

Where MSP Margin Leaks in Kubernetes Environments

Kubernetes waste in MSP-managed clusters tends to cluster around predictable patterns:

Orphaned storage from environment migrations. When an MSP migrates a customer workload from one cluster or namespace to another, the original PersistentVolumeClaims are often left behind. The underlying cloud disks — EBS, GCP Persistent Disk, Azure Managed Disk — continue billing. In long-lived managed environments, storage orphans from three or four previous migrations can accumulate to meaningful monthly cost.

LoadBalancer services from decommissioned workloads. Customers request new environments, test deployments, and proof-of-concept setups. When those engagements end, the LoadBalancer services often remain. Each one provisions a cloud load balancer that bills independently of whether traffic flows through it.

Stale namespaces from ended project phases. MSP customers frequently run project-specific environments: a UAT namespace for a release, a load test namespace for a performance review, a staging namespace that duplicated production for a compliance audit. When the project phase ends, the namespace stays.

Idle node groups for seasonal or deprecated workloads. A customer's Q4 traffic spike required a larger node group. Q4 is over, but the node group was never scaled back. A microservice was deprecated but the node group that ran it is still provisioned.

Unclaimed resources from customer churn. When a customer off-boards, the off-boarding process may not fully clean up all cloud resources. Clusters that were supposed to be deleted may still have storage volumes, load balancers, or even running nodes in an orphaned state.

The Scale Problem: Why Standard Tools Do Not Work for MSPs

Most Kubernetes cost optimization tools are designed for a single organization running its own clusters. The operational model assumes:

You control the cluster and can install what you need
You have cloud provider credentials for billing API access
You have a single unified view of your infrastructure
The team approving cleanup is the same team running the workloads

For MSPs, none of these assumptions hold cleanly:

Customer clusters may have security policies restricting what can be installed
Cloud provider credentials belong to the customer account, not the MSP
Infrastructure spans dozens or hundreds of clusters across multiple customers
Cleanup approval involves both the MSP operations team and the customer stakeholder

Tools that require DaemonSet installation, cloud billing API access, or cluster-admin RBAC are hard to deploy in customer environments with minimal friction. Tools that only cover one cluster at a time do not scale to a fleet.

The gap is a fleet-level audit tool that requires minimal permissions, produces per-cluster findings, and supports the discovery-then-approval workflow that managed service operations require.

A Practical MSP Cost Recovery Workflow

A structured workflow for MSP cost recovery has four phases: audit, quantify, review, and recover.

Phase 1: Audit — Find the Waste

Run a read-only audit against each customer cluster. The minimum RBAC required is a ClusterRole with get, list, and watch permissions on pods, services, endpoints, PVCs, PVs, namespaces, ConfigMaps, Secrets, Deployments, StatefulSets, Jobs, CronJobs, Ingresses, and StorageClasses.

bash
# Quick stale namespace check across a customer cluster
kubectl get namespaces -o json | \
  jq '.items[] | select(.metadata.creationTimestamp < "2026-01-01") | {name: .metadata.name, created: .metadata.creationTimestamp}'

# Find orphaned PVCs (not in Bound state)
kubectl get pvc --all-namespaces -o json | \
  jq '.items[] | select(.status.phase != "Bound") | {name: .metadata.name, namespace: .metadata.namespace, phase: .status.phase, size: .spec.resources.requests.storage}'

# Find LoadBalancer services with no endpoints
kubectl get endpoints --all-namespaces -o json | \
  jq '.items[] | select(.subsets == null or (.subsets | length == 0)) | {name: .metadata.name, namespace: .metadata.namespace}'

This is the same read-only audit approach described in read-only Kubernetes cost optimization. No write access required, no cloud credentials required.

Phase 2: Quantify — Estimate the Monthly Impact

For each finding, estimate the monthly cost. Storage waste is the most straightforward:

Waste Category	Estimated Cost Signal	MSP Margin Impact
Orphaned PVCs (Unbound or unmounted)	Provisioned GB × storage class rate	Direct cloud cost absorbed or billed
Released PVs	Provisioned GB × storage class rate	Same as above
LoadBalancer services with no endpoints	~$15–$20 per LB per month	Cloud LB cost per service
Idle node groups	Node SKU hourly rate × hours	Compute cost per idle node
Stale namespaces with no active pods	Sum of all resource costs within namespace	Accumulated across all resource types
Completed Jobs not garbage-collected	Pod resource requests × run time	Minor, but signals hygiene debt

For clusters with dozens of findings, grouping by namespace and presenting a per-namespace summary makes the customer conversation easier.

Phase 3: Review — Validate With the Customer

Before deleting anything from a customer cluster, present findings for review. A well-structured findings report shows:

Resource name and namespace
Resource type and why it was flagged
Estimated monthly cost
Suggested action (delete, right-size, label and retain)
Owner label if present; if missing, flag as an ownership gap

This step is important for two reasons. First, MSP operations teams cannot know every workload's context — a PVC that appears orphaned may be intentionally retained for a recovery scenario the customer has not communicated. Second, getting explicit customer sign-off on cleanup creates a clear record that protects the MSP from disputes if something unexpected happens post-cleanup.

Phase 4: Recover — Clean Up With Confidence

Once findings are confirmed, cleanup is straightforward. For storage:

bash
# Delete a confirmed orphaned PVC
kubectl delete pvc <pvc-name> -n <namespace>

# Delete a Released PV after confirming the PVC is gone
kubectl delete pv <pv-name>

For LoadBalancer services with no endpoints:

bash
kubectl delete service <service-name> -n <namespace>

For stale namespaces that are confirmed empty and approved for deletion:

bash
kubectl delete namespace <namespace-name>

Document each deletion with the approval reference (ticket number, email thread, or signed-off report). For recurring hygiene, schedule the audit-review-recover cycle quarterly.

Presenting Cost Recovery Value to Customers

Cost recovery audits have a secondary value beyond margin: they are a demonstrable deliverable. Many MSP customers cannot easily see the infrastructure hygiene work happening on their behalf. A findings report — showing specific resources, their waste cost, and the cleanup performed — makes the value of managed infrastructure operations concrete.

Framing matters. A report that shows "we identified and removed 14 orphaned PVCs, 3 LoadBalancer services with no active backends, and 7 stale namespaces, recovering an estimated $X/month in cloud spend" is more compelling than a general statement about cost optimization.

For MSPs that charge based on infrastructure cost, reducing customer cloud spend also directly improves the customer's total cost of ownership on the engagement — a strong retention signal.

Scaling Audits Across a Customer Fleet

Running manual kubectl commands across a fleet of 20, 50, or 100 customer clusters is not a sustainable workflow. Each cluster requires a kubeconfig, a separate RBAC grant, and manual aggregation of findings.

KorPro is designed for fleet-scale audits. It connects to multiple clusters, runs the same resource hygiene checks across all of them, and surfaces findings by cluster and namespace. MSP operations teams can review the full fleet's waste inventory from a single interface — without cloud provider credentials, without agents installed on customer clusters, and without write access to any of them.

See how the KorPro Inspector works and the multi-cluster cost recovery use case for how fleet audits are structured.

For additional context on multi-cluster management patterns, see multi-cloud Kubernetes best practices.

Talk to Us About Partner Use Cases

KorPro works with MSPs and service providers who manage Kubernetes infrastructure for customers. If you want to discuss how fleet audits fit your managed service workflow, we want to hear about your setup.

Talk to our team about partner use cases | Contact us

Stop Wasting Kubernetes Resources

Ready to Clean Up Your Clusters?

KorPro automatically detects unused resources, orphaned secrets, and wasted spend across all your Kubernetes clusters. Start optimizing in minutes.

Get Started Free Contact Us

How MSPs Recover Margin from Unused Kubernetes Resources Across Customer Clusters