HorizontalPodAutoscaler(HPA)
A controller that automatically scales the replica count of a Deployment or StatefulSet based on observed metrics.
Also known as: HPA
What is HorizontalPodAutoscaler?
The HorizontalPodAutoscaler (HPA) is a Kubernetes controller that watches a target workload (Deployment, StatefulSet, or any resource with a /scale subresource) and adjusts its replica count to keep one or more metrics at a target value. The classic metric is average CPU utilization across all Pods (e.g., 'keep average CPU at 70%'): when traffic increases and CPU rises above target, HPA adds replicas; when traffic drops, HPA scales down to minReplicas.
HPA v2 (stable since Kubernetes 1.23) supports multiple metrics simultaneously: CPU, memory, custom metrics (from Prometheus via the custom.metrics.k8s.io API), and external metrics (from cloud provider metrics like SQS queue depth). Scale-down behavior is rate-limited by default — HPA waits a configurable stabilization window (300 seconds by default) before scaling down, preventing flapping. Scale-up is faster, with a 15-second stabilization window.
HPA and VPA should not both manage CPU/memory requests and replica counts simultaneously for the same workload, as they will fight each other. The recommended pattern is: use HPA for replica count based on CPU/memory usage metrics, and use VPA in recommendation-only mode to suggest right-sized resource requests that inform HPA's scaling behavior.
Example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-api
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300Cost & Waste Implications
Without HPA, Deployments are statically sized for peak load and run at that replica count 24/7. A service with 10 replicas at peak but 2 replicas at off-peak hours that runs 8 hours of peak and 16 hours of off-peak saves 53% of compute cost by scaling dynamically. HPA is one of the most impactful and low-risk cost optimization techniques available in Kubernetes.
How KorPro Helps
KorPro identifies Deployments without HPAs that have variable CPU/memory utilization patterns, quantifying the estimated savings from implementing autoscaling versus running at static peak capacity.
Scan Your Cluster FreeRelated Terms
Deployment
WorkloadsA controller that manages a ReplicaSet to keep a specified number of identical Pod replicas running and handles rolling updates.
Read definitionVerticalPodAutoscaler(VPA)
ScalingA controller that recommends or automatically adjusts CPU and memory resource requests for Pods based on observed usage.
Read definitionCluster Autoscaler
ScalingA component that automatically adds nodes when Pods are unschedulable and removes nodes when they are underutilized.
Read definitionResource Requests and Limits
ConfigurationPer-container declarations of guaranteed CPU/memory (requests) and hard maximums (limits) that drive scheduling and enforcement.
Read definitionStop Wasting Money on Orphaned Kubernetes Resources
KorPro connects to your clusters across GCP, AWS, and Azure — no agents, no installation — and surfaces every orphaned resource with its monthly cost estimate.