Scaling

VerticalPodAutoscaler(VPA)

A controller that recommends or automatically adjusts CPU and memory resource requests for Pods based on observed usage.

Also known as: VPA

What is VerticalPodAutoscaler?

The VerticalPodAutoscaler (VPA) is a Kubernetes extension (not built-in — requires separate installation) that analyzes historical CPU and memory usage of Pods and recommends right-sized resource requests. It addresses the most common cause of cluster cost waste: resource requests set by developers that don't match actual runtime usage, either because they were guessed, copied from defaults, or set conservatively to avoid OOMKills.

VPA operates in three modes. Recommendation mode (updateMode: Off) analyzes usage and writes recommendations to the VPA object's status field without making any changes — ideal for gradually discovering right-sized values before applying them. Auto mode (updateMode: Auto) actually updates Pod resource requests by evicting and restarting Pods with new specs — this causes brief disruptions and should be tested carefully. Initial mode sets requests only at Pod creation time, not for already-running Pods.

VPA uses the Metrics Server for real-time CPU/memory data and the VPA recommender component's own historical database for percentile-based recommendations. The recommender targets the 90th-percentile usage for requests and adds an OOM buffer for memory. VPA cannot currently resize requests without restarting the Pod, though in-place Pod resource resize (KEP-1287) is in alpha since Kubernetes 1.27 and aims to eventually eliminate the restart requirement.

Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-api
  updatePolicy:
    updateMode: "Off"   # Recommendation only — no automatic restarts
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 4
        memory: 8Gi

Cost & Waste Implications

VPA recommendations consistently show that Pods request 3–10x more CPU and memory than they actually use. A team running 50 Pods with requests of 1 CPU / 2Gi each when actual usage is 100m CPU / 200Mi could right-size to ~10% of current requests, reducing the node count needed to run the workload by roughly 80%. VPA in recommendation mode is zero-risk and delivers the data needed to make this change safely.

KorPro— Kubernetes Cost Optimization

How KorPro Helps

KorPro integrates with VPA recommendations and independently analyzes pod metrics to surface rightsizing opportunities, showing current requests, actual usage, and estimated monthly savings per workload.

Scan Your Cluster Free

Related Terms

HorizontalPodAutoscaler(HPA)

Scaling

A controller that automatically scales the replica count of a Deployment or StatefulSet based on observed metrics.

Read definition

Resource Requests and Limits

Configuration

Per-container declarations of guaranteed CPU/memory (requests) and hard maximums (limits) that drive scheduling and enforcement.

Read definition

Cluster Autoscaler

Scaling

A component that automatically adds nodes when Pods are unschedulable and removes nodes when they are underutilized.

Read definition

Kubernetes Cost Optimization

FinOps

The practice of reducing Kubernetes infrastructure spend while maintaining performance and reliability.

Read definition

Stop Wasting Money on Orphaned Kubernetes Resources

KorPro connects to your clusters across GCP, AWS, and Azure — no agents, no installation — and surfaces every orphaned resource with its monthly cost estimate.

Get Started Free Contact Sales