How to Monitor Pods in Kubernetes [kubectl top & describe]
Monitor Kubernetes Pods with kubectl top and describe. Includes a full OOMKilled diagnosis walkthrough with CPU/memory requests, limits, and fixes.
Why Pod Monitoring Matters in Production
When something goes wrong in a Kubernetes cluster, Pods are almost always where you start looking. A Pod that is consuming too much memory, starving for CPU, or stuck in a crash loop will affect the application it runs and potentially the stability of the entire node. Knowing how to quickly inspect Pod health and resource consumption is one of the most important skills for any engineer working with production Kubernetes.
This guide covers the practical tools and commands you need to monitor Pods effectively. It focuses on kubectl top and kubectl describe, explains how CPU and memory requests and limits work, and walks through a complete scenario of diagnosing an OOMKilled error from first symptom to root cause.
Using kubectl top to Check Resource Consumption
The kubectl top command shows real-time CPU and memory usage for Pods and nodes. It requires the Metrics Server to be installed in your cluster. Most managed Kubernetes services include it by default. If you are running a local cluster, you may need to install it separately.
To see resource usage for all Pods in the current namespace, run the following.
Code:
kubectl top pods
The output shows each Pod's current CPU usage in millicores and memory usage in mebibytes. This tells you what the Pod is actually consuming right now, not what it requested or what its limits are.
To see Pods across all namespaces, add the all-namespaces flag.
Code:
kubectl top pods --all-namespaces
To see resource usage for a specific Pod, specify its name.
Code:
kubectl top pod my-app-pod-7f8b9c6d4-xk2lm
To see usage broken down by container within a Pod, use the containers flag. This is essential for multi-container Pods where you need to identify which container is consuming the most resources.
Code:
kubectl top pod my-app-pod-7f8b9c6d4-xk2lm --containers
To check node-level resource usage, which helps you understand whether the problem is a single Pod or a node running out of capacity, use the following.
Code:
kubectl top nodes
The key thing to remember about kubectl top is that it shows actual usage at a point in time. It does not show historical data. For trends and alerting, you need a monitoring stack like Prometheus and Grafana. But for immediate triage during an incident, kubectl top is the fastest way to see what is happening.
Using kubectl describe to Inspect Pod State
While kubectl top shows what a Pod is consuming, kubectl describe shows what Kubernetes knows about the Pod: its configuration, its current state, recent events, and the reason for any failures.
To describe a specific Pod, run the following.
Code:
kubectl describe pod my-app-pod-7f8b9c6d4-xk2lm
The output is long, but the sections that matter most during troubleshooting are these.
The Containers section shows each container's image, state (Waiting, Running, or Terminated), restart count, resource requests and limits, and the reason for the last termination. If a container was killed, this section tells you why.
The Conditions section shows whether the Pod has been scheduled, whether its containers are initialized, whether it is ready to receive traffic, and whether all containers are running. A Pod that is not Ready will not receive traffic from a Service.
The Events section at the bottom shows recent cluster events related to the Pod. This includes scheduling decisions, image pulls, container starts, health check failures, and kill events. Events are time-stamped and ordered, so you can reconstruct what happened and when.
When you are troubleshooting, kubectl describe is usually the second command you run after kubectl get pods shows you something is wrong.
Understanding CPU and Memory Requests vs Limits
Kubernetes uses two settings to manage how much CPU and memory a container can use: requests and limits. Understanding the difference between them is critical for diagnosing resource-related issues.
Requests
A request is the amount of CPU or memory that Kubernetes guarantees to a container. The scheduler uses requests to decide which node has enough capacity to run the Pod. If a container requests 256 mebibytes of memory, Kubernetes will only place it on a node that has at least 256 mebibytes available.
Requests do not cap usage. A container can use more than its request if the node has spare capacity. Requests are a floor, not a ceiling.
Limits
A limit is the maximum amount of CPU or memory a container is allowed to use. If a container tries to exceed its memory limit, the Linux kernel kills the container process immediately. This is the OOMKilled error. If a container tries to exceed its CPU limit, it is throttled rather than killed. It still runs, but it gets less CPU time.
The Gap Between Requests and Limits
The relationship between requests and limits determines how your cluster behaves under pressure.
If requests are much lower than limits, the scheduler may place more Pods on a node than it can actually support when all of them are active. This is called overcommitment. It works well when workloads are bursty and rarely all peak at the same time. It fails badly when they do.
If requests equal limits, every Pod gets exactly what it asked for and no more. This is the safest configuration but uses the most resources.
If limits are not set at all, a container can consume all available memory on the node, potentially causing other Pods to be evicted or the node to become unstable.
To see the requests and limits configured for a Pod, use kubectl describe.
Code:
kubectl describe pod my-app-pod-7f8b9c6d4-xk2lm
Look for the Containers section. Under each container, you will see lines like the following.
Code:
Requests:
cpu: 250m
memory: 256Mi
Limits:
cpu: 500m
memory: 512Mi
This tells you the container is guaranteed 250 millicores of CPU and 256 mebibytes of memory, and it is allowed to burst up to 500 millicores and 512 mebibytes.
Scenario: Diagnosing an OOMKilled Error
You receive an alert that your application is restarting repeatedly. Users are reporting intermittent errors. Here is how to diagnose the problem step by step.
Step 1: Check Pod Status
Start by listing the Pods in the affected namespace.
Code:
kubectl get pods -n production
You see output like the following.
Code:
NAME READY STATUS RESTARTS AGE
my-app-pod-7f8b9c6d4-xk2lm 0/1 CrashLoopBackOff 5 12m
my-app-pod-7f8b9c6d4-ab3nm 1/1 Running 0 12m
my-app-pod-7f8b9c6d4-zz9qr 1/1 Running 3 12m
One Pod is in CrashLoopBackOff with 5 restarts. Another is Running but has restarted 3 times. This pattern suggests the containers are crashing and Kubernetes is restarting them, with increasing backoff delays.
Step 2: Describe the Failing Pod
Run kubectl describe on the Pod with the most restarts.
Code:
kubectl describe pod my-app-pod-7f8b9c6d4-xk2lm -n production
Scroll to the Containers section and look at the Last State field. You see the following.
Code:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Exit code 137 means the process was killed by a signal. Combined with the OOMKilled reason, this confirms the container exceeded its memory limit and the kernel terminated it.
Step 3: Check the Memory Limit
In the same kubectl describe output, look at the resource configuration for the container.
Code:
Limits:
memory: 256Mi
Requests:
memory: 128Mi
The container has a memory limit of 256 mebibytes. Anything above that triggers an immediate kill.
Step 4: Check Actual Memory Usage
Look at the Pods that are still running to see how much memory the application actually needs.
Code:
kubectl top pod my-app-pod-7f8b9c6d4-ab3nm -n production --containers
You see output like the following.
Code:
POD NAME CPU(cores) MEMORY(bytes)
my-app-pod-7f8b9c6d4-ab3nm my-app 45m 241Mi
The running Pod is using 241 mebibytes of memory. That is 94 percent of the 256 mebibyte limit. The application is running very close to its ceiling. Any spike in traffic, a large request payload, or a temporary cache expansion will push it over the limit and trigger an OOMKill.
Step 5: Check Events for Confirmation
Scroll to the Events section of the kubectl describe output. You will see entries like the following.
Code:
Events:
Type Reason Age From Message
Warning OOMKilling 3m (x5 over 12m) kubelet Memory capped at 256Mi
Normal Pulled 2m (x6 over 12m) kubelet Container image already present
Normal Created 2m (x6 over 12m) kubelet Created container my-app
Normal Started 2m (x6 over 12m) kubelet Started container my-app
Warning BackOff 30s (x8 over 10m) kubelet Back-off restarting failed container
The OOMKilling warning confirms the diagnosis. The container has been killed five times in twelve minutes for exceeding its memory limit.
Step 6: Fix the Problem
There are two paths forward depending on the root cause.
If the application genuinely needs more memory, increase the memory limit. Update the Deployment spec to give the container more headroom. A reasonable starting point is to set the limit to at least 1.5 times the observed steady-state usage.
Code:
kubectl edit deployment my-app -n production
Change the memory limit from 256Mi to 512Mi and the request from 128Mi to 256Mi. Save and exit. Kubernetes will perform a rolling update with the new resource settings.
Alternatively, apply the change declaratively by updating your manifest and running the following.
Code:
kubectl apply -f my-app-deployment.yaml -n production
If the application has a memory leak, increasing the limit only delays the problem. Check application logs for clues.
Code:
kubectl logs my-app-pod-7f8b9c6d4-xk2lm -n production --previous
The previous flag shows logs from the last terminated container, which is essential since the current container may have just started and will not have useful output yet. Look for patterns like steadily increasing memory usage, unclosed connections, or growing caches that are never evicted.
Step 7: Verify the Fix
After applying the new resource settings, monitor the replacement Pods.
Code:
kubectl get pods -n production -w
The w flag watches for changes in real time. Wait for the new Pods to reach Running status with zero restarts. Then check their memory usage.
Code:
kubectl top pods -n production
Confirm that memory usage is well below the new limit. Continue monitoring over the next hour to make sure the application is stable under normal traffic.
Building a Monitoring Baseline
Reacting to OOMKilled errors is necessary, but preventing them is better. Here are practices that reduce the frequency of resource-related incidents.
Set resource requests based on observed usage, not guesses. Use kubectl top regularly or deploy Prometheus with the kube-state-metrics exporter to collect historical usage data. Set requests to match the 95th percentile of observed usage and limits to handle reasonable spikes above that.
Use the Vertical Pod Autoscaler in recommendation mode to get data-driven suggestions for request and limit values. It analyzes actual usage patterns and tells you what each container should be configured for.
Set up alerts for Pods that are consistently using more than 80 percent of their memory limit. This gives you time to adjust before an OOMKill happens.
Review resource configurations as part of every deployment. When application code changes, its resource profile may change too. A new feature that caches more data in memory or processes larger payloads will need updated limits.
How KorPro Helps With Pod Resource Management
Monitoring individual Pods with kubectl is effective for incident response, but it does not scale across dozens of clusters and thousands of Pods. KorPro provides continuous visibility into resource usage across your entire Kubernetes estate. It identifies Pods that are over-provisioned and wasting money, Pods that are under-provisioned and at risk of OOMKill, and orphaned resources that consume capacity without serving any purpose.
For engineers who are tired of manually running kubectl top across multiple clusters, KorPro automates the detection and surfaces actionable recommendations so you can fix problems before they become incidents.
Conclusion
Monitoring Pods in Kubernetes comes down to two core skills: knowing what your Pods are consuming right now with kubectl top, and understanding their full state and history with kubectl describe. When those two tools are combined with a solid understanding of how CPU and memory requests and limits work, you can diagnose most production issues quickly and confidently. The OOMKilled scenario covered in this guide is one of the most common problems you will encounter, and the diagnostic process applies to a wide range of resource-related failures. Build the habit of checking resource usage proactively, and you will spend less time reacting to incidents and more time building reliable systems.
Automate Pod Resource Visibility Across Clusters
Tired of running kubectl top manually across dozens of clusters? Get started with KorPro to get continuous visibility into over-provisioned and under-provisioned Pods, orphaned resources, and wasted spend — all from a single dashboard. Want to see it in action? Contact us for a demo.
Ready to Clean Up Your Clusters?
KorPro automatically detects unused resources, orphaned secrets, and wasted spend across all your Kubernetes clusters. Start optimizing in minutes.
Related Articles
Extended Kubernetes Support: How Kor Pro Helps Teams Reduce Risk, Optimize Cost, and Modernize Safely
Extended Kubernetes support helps teams manage aging clusters safely. Learn how Kor Pro improves visibility into workloads, pods, ingress, and cost to reduce risk and plan modernization.
Kor: The Open-Source Kubernetes Cleanup Tool (and How KorPro Extends It)
Kor is an open-source CLI that finds unused Kubernetes resources in your cluster. Learn how to install and use Kor, what it detects, and how KorPro extends it to multi-cloud with cost analysis.
Kubernetes End of Life and Extended Support: What Happens When Your Version Expires [2026]
Kubernetes versions lose support faster than most teams realize. Learn the release cycle, what extended support means on EKS, GKE, and AKS, and how to plan upgrades before your cluster becomes a liability.
Written by
KorPro Team