VerticalPodAutoscaler(VPA)
A controller that recommends or automatically adjusts CPU and memory resource requests for Pods based on observed usage.
Also known as: VPA
What is VerticalPodAutoscaler?
The VerticalPodAutoscaler (VPA) is a Kubernetes extension (not built-in — requires separate installation) that analyzes historical CPU and memory usage of Pods and recommends right-sized resource requests. It addresses the most common cause of cluster cost waste: resource requests set by developers that don't match actual runtime usage, either because they were guessed, copied from defaults, or set conservatively to avoid OOMKills.
VPA operates in three modes. Recommendation mode (updateMode: Off) analyzes usage and writes recommendations to the VPA object's status field without making any changes — ideal for gradually discovering right-sized values before applying them. Auto mode (updateMode: Auto) actually updates Pod resource requests by evicting and restarting Pods with new specs — this causes brief disruptions and should be tested carefully. Initial mode sets requests only at Pod creation time, not for already-running Pods.
VPA uses the Metrics Server for real-time CPU/memory data and the VPA recommender component's own historical database for percentile-based recommendations. The recommender targets the 90th-percentile usage for requests and adds an OOM buffer for memory. VPA cannot currently resize requests without restarting the Pod, though in-place Pod resource resize (KEP-1287) is in alpha since Kubernetes 1.27 and aims to eventually eliminate the restart requirement.
Example
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-api
updatePolicy:
updateMode: "Off" # Recommendation only — no automatic restarts
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8GiCost & Waste Implications
VPA recommendations consistently show that Pods request 3–10x more CPU and memory than they actually use. A team running 50 Pods with requests of 1 CPU / 2Gi each when actual usage is 100m CPU / 200Mi could right-size to ~10% of current requests, reducing the node count needed to run the workload by roughly 80%. VPA in recommendation mode is zero-risk and delivers the data needed to make this change safely.
How KorPro Helps
KorPro integrates with VPA recommendations and independently analyzes pod metrics to surface rightsizing opportunities, showing current requests, actual usage, and estimated monthly savings per workload.
Scan Your Cluster FreeRelated Terms
HorizontalPodAutoscaler(HPA)
ScalingA controller that automatically scales the replica count of a Deployment or StatefulSet based on observed metrics.
Read definitionResource Requests and Limits
ConfigurationPer-container declarations of guaranteed CPU/memory (requests) and hard maximums (limits) that drive scheduling and enforcement.
Read definitionCluster Autoscaler
ScalingA component that automatically adds nodes when Pods are unschedulable and removes nodes when they are underutilized.
Read definitionKubernetes Cost Optimization
FinOpsThe practice of reducing Kubernetes infrastructure spend while maintaining performance and reliability.
Read definitionStop Wasting Money on Orphaned Kubernetes Resources
KorPro connects to your clusters across GCP, AWS, and Azure — no agents, no installation — and surfaces every orphaned resource with its monthly cost estimate.