Configuration

Resource Requests and Limits

Per-container declarations of guaranteed CPU/memory (requests) and hard maximums (limits) that drive scheduling and enforcement.

What is Resource Requests and Limits?

Resource requests and limits are the primary mechanism by which Kubernetes manages compute resources at the container level. Requests are the amount of CPU and memory the scheduler assumes a container will use — the node must have at least this much allocatable capacity for the Pod to be scheduled there. Limits are the hard ceilings: a container exceeding its CPU limit is throttled; a container exceeding its memory limit is killed with OOMKilled and restarted.

The relationship between requests and limits determines a Pod's Quality of Service (QoS) class. Guaranteed QoS (requests == limits for all containers) gives the Pod the highest scheduling priority and is the last to be evicted under node pressure. Burstable QoS (requests < limits, or only some containers set both) is the middle tier. BestEffort QoS (no requests or limits set) is evicted first and offers no scheduling guarantees.

Setting requests too high wastes cluster capacity — reserved but unused CPU and memory can't be used by other Pods. Setting requests too low causes over-scheduling, node pressure, and OOMKill cascades. The Vertical Pod Autoscaler (VPA) analyzes historical usage and recommends (or automatically applies) right-sized requests, closing the gap between reserved and used resources.

Example

# A container spec with both requests and limits set
containers:
- name: api
  image: my-org/api:v3
  resources:
    requests:
      cpu: "250m"      # 0.25 cores guaranteed
      memory: "512Mi"  # 512 MiB guaranteed
    limits:
      cpu: "1000m"     # 1 core maximum
      memory: "1Gi"    # 1 GiB maximum (OOMKill if exceeded)

# Check actual usage vs requests
kubectl top pods -n production --containers

Cost & Waste Implications

Kubernetes cloud costs are determined by node capacity, not Pod utilization. If Pods request 4Gi of memory each but only use 500Mi, you need 8x as many nodes as actually necessary. Studies show average cluster memory utilization around 20% — meaning 80% of provisioned memory is paid for but unused. Rightsizing requests to match p95 actual usage typically reduces cluster node count by 30–50%.

KorPro— Kubernetes Cost Optimization

How KorPro Helps

KorPro analyzes actual CPU and memory utilization from cluster metrics and compares it against declared resource requests, surfacing over-provisioned workloads with estimated monthly savings from rightsizing.

Scan Your Cluster Free

Related Terms

Pod

Core Concepts

The smallest deployable unit in Kubernetes — one or more containers that share a network namespace and storage volumes.

Read definition

VerticalPodAutoscaler(VPA)

Scaling

A controller that recommends or automatically adjusts CPU and memory resource requests for Pods based on observed usage.

Read definition

HorizontalPodAutoscaler(HPA)

Scaling

A controller that automatically scales the replica count of a Deployment or StatefulSet based on observed metrics.

Read definition

Kubernetes Cost Optimization

FinOps

The practice of reducing Kubernetes infrastructure spend while maintaining performance and reliability.

Read definition

Stop Wasting Money on Orphaned Kubernetes Resources

KorPro connects to your clusters across GCP, AWS, and Azure — no agents, no installation — and surfaces every orphaned resource with its monthly cost estimate.

Get Started Free Contact Sales