How to Monitor Pods in Kubernetes [kubectl top & describe]

Why Pod Monitoring Matters in Production

When something goes wrong in a Kubernetes cluster, Pods are almost always where you start looking. A Pod that is consuming too much memory, starving for CPU, or stuck in a crash loop will affect the application it runs and potentially the stability of the entire node. Knowing how to quickly inspect Pod health and resource consumption is one of the most important skills for any engineer working with production Kubernetes.

This guide covers the practical tools and commands you need to monitor Pods effectively. It focuses on kubectl top and kubectl describe, explains how CPU and memory requests and limits work, and walks through a complete scenario of diagnosing an OOMKilled error from first symptom to root cause.

Using kubectl top to Check Resource Consumption

The kubectl top command shows real-time CPU and memory usage for Pods and nodes. It requires the Metrics Server to be installed in your cluster. Most managed Kubernetes services include it by default. If you are running a local cluster, you may need to install it separately.

To see resource usage for all Pods in the current namespace, run the following.

Code:

kubectl top pods

The output shows each Pod's current CPU usage in millicores and memory usage in mebibytes. This tells you what the Pod is actually consuming right now, not what it requested or what its limits are.

To see Pods across all namespaces, add the all-namespaces flag.

Code:

kubectl top pods --all-namespaces

To see resource usage for a specific Pod, specify its name.

Code:

kubectl top pod my-app-pod-7f8b9c6d4-xk2lm

To see usage broken down by container within a Pod, use the containers flag. This is essential for multi-container Pods where you need to identify which container is consuming the most resources.

Code:

kubectl top pod my-app-pod-7f8b9c6d4-xk2lm --containers

To check node-level resource usage, which helps you understand whether the problem is a single Pod or a node running out of capacity, use the following.

Code:

kubectl top nodes

The key thing to remember about kubectl top is that it shows actual usage at a point in time. It does not show historical data. For trends and alerting, you need a monitoring stack like Prometheus and Grafana. But for immediate triage during an incident, kubectl top is the fastest way to see what is happening.

Using kubectl describe to Inspect Pod State

While kubectl top shows what a Pod is consuming, kubectl describe shows what Kubernetes knows about the Pod: its configuration, its current state, recent events, and the reason for any failures.

To describe a specific Pod, run the following.

Code:

kubectl describe pod my-app-pod-7f8b9c6d4-xk2lm

The output is long, but the sections that matter most during troubleshooting are these.

The Containers section shows each container's image, state (Waiting, Running, or Terminated), restart count, resource requests and limits, and the reason for the last termination. If a container was killed, this section tells you why.

The Conditions section shows whether the Pod has been scheduled, whether its containers are initialized, whether it is ready to receive traffic, and whether all containers are running. A Pod that is not Ready will not receive traffic from a Service.

The Events section at the bottom shows recent cluster events related to the Pod. This includes scheduling decisions, image pulls, container starts, health check failures, and kill events. Events are time-stamped and ordered, so you can reconstruct what happened and when.

When you are troubleshooting, kubectl describe is usually the second command you run after kubectl get pods shows you something is wrong.

Understanding CPU and Memory Requests vs Limits

Kubernetes uses two settings to manage how much CPU and memory a container can use: requests and limits. Understanding the difference between them is critical for diagnosing resource-related issues.

Requests

A request is the amount of CPU or memory that Kubernetes guarantees to a container. The scheduler uses requests to decide which node has enough capacity to run the Pod. If a container requests 256 mebibytes of memory, Kubernetes will only place it on a node that has at least 256 mebibytes available.

Requests do not cap usage. A container can use more than its request if the node has spare capacity. Requests are a floor, not a ceiling.

Limits

A limit is the maximum amount of CPU or memory a container is allowed to use. If a container tries to exceed its memory limit, the Linux kernel kills the container process immediately. This is the OOMKilled error. If a container tries to exceed its CPU limit, it is throttled rather than killed. It still runs, but it gets less CPU time.

The Gap Between Requests and Limits

The relationship between requests and limits determines how your cluster behaves under pressure.

If requests are much lower than limits, the scheduler may place more Pods on a node than it can actually support when all of them are active. This is called overcommitment. It works well when workloads are bursty and rarely all peak at the same time. It fails badly when they do.

If requests equal limits, every Pod gets exactly what it asked for and no more. This is the safest configuration but uses the most resources.

If limits are not set at all, a container can consume all available memory on the node, potentially causing other Pods to be evicted or the node to become unstable.

To see the requests and limits configured for a Pod, use kubectl describe.

Code:

kubectl describe pod my-app-pod-7f8b9c6d4-xk2lm

Look for the Containers section. Under each container, you will see lines like the following.

Code:

Requests:
  cpu:     250m
  memory:  256Mi
Limits:
  cpu:     500m
  memory:  512Mi

This tells you the container is guaranteed 250 millicores of CPU and 256 mebibytes of memory, and it is allowed to burst up to 500 millicores and 512 mebibytes.

Scenario: Diagnosing an OOMKilled Error

You receive an alert that your application is restarting repeatedly. Users are reporting intermittent errors. Here is how to diagnose the problem step by step.

Step 1: Check Pod Status

Start by listing the Pods in the affected namespace.

Code:

kubectl get pods -n production

You see output like the following.

Code:

NAME                        READY   STATUS             RESTARTS   AGE
my-app-pod-7f8b9c6d4-xk2lm   0/1     CrashLoopBackOff   5          12m
my-app-pod-7f8b9c6d4-ab3nm   1/1     Running             0          12m
my-app-pod-7f8b9c6d4-zz9qr   1/1     Running             3          12m

One Pod is in CrashLoopBackOff with 5 restarts. Another is Running but has restarted 3 times. This pattern suggests the containers are crashing and Kubernetes is restarting them, with increasing backoff delays.

Step 2: Describe the Failing Pod

Run kubectl describe on the Pod with the most restarts.

Code:

kubectl describe pod my-app-pod-7f8b9c6d4-xk2lm -n production

Scroll to the Containers section and look at the Last State field. You see the following.

Code:

Last State:  Terminated
  Reason:    OOMKilled
  Exit Code: 137

Exit code 137 means the process was killed by a signal. Combined with the OOMKilled reason, this confirms the container exceeded its memory limit and the kernel terminated it.

Step 3: Check the Memory Limit

In the same kubectl describe output, look at the resource configuration for the container.

Code:

Limits:
  memory:  256Mi
Requests:
  memory:  128Mi

The container has a memory limit of 256 mebibytes. Anything above that triggers an immediate kill.

Step 4: Check Actual Memory Usage

Look at the Pods that are still running to see how much memory the application actually needs.

Code:

kubectl top pod my-app-pod-7f8b9c6d4-ab3nm -n production --containers

You see output like the following.

Code:

POD                           NAME       CPU(cores)   MEMORY(bytes)
my-app-pod-7f8b9c6d4-ab3nm   my-app     45m          241Mi

The running Pod is using 241 mebibytes of memory. That is 94 percent of the 256 mebibyte limit. The application is running very close to its ceiling. Any spike in traffic, a large request payload, or a temporary cache expansion will push it over the limit and trigger an OOMKill.

Step 5: Check Events for Confirmation

Scroll to the Events section of the kubectl describe output. You will see entries like the following.

Code:

Events:
  Type     Reason     Age                From               Message
  Warning  OOMKilling 3m (x5 over 12m)  kubelet            Memory capped at 256Mi
  Normal   Pulled     2m (x6 over 12m)  kubelet            Container image already present
  Normal   Created    2m (x6 over 12m)  kubelet            Created container my-app
  Normal   Started    2m (x6 over 12m)  kubelet            Started container my-app
  Warning  BackOff    30s (x8 over 10m) kubelet            Back-off restarting failed container

The OOMKilling warning confirms the diagnosis. The container has been killed five times in twelve minutes for exceeding its memory limit.

Step 6: Fix the Problem

There are two paths forward depending on the root cause.

If the application genuinely needs more memory, increase the memory limit. Update the Deployment spec to give the container more headroom. A reasonable starting point is to set the limit to at least 1.5 times the observed steady-state usage.

Code:

kubectl edit deployment my-app -n production

Change the memory limit from 256Mi to 512Mi and the request from 128Mi to 256Mi. Save and exit. Kubernetes will perform a rolling update with the new resource settings.

Alternatively, apply the change declaratively by updating your manifest and running the following.

Code:

kubectl apply -f my-app-deployment.yaml -n production

If the application has a memory leak, increasing the limit only delays the problem. Check application logs for clues.

Code:

kubectl logs my-app-pod-7f8b9c6d4-xk2lm -n production --previous

The previous flag shows logs from the last terminated container, which is essential since the current container may have just started and will not have useful output yet. Look for patterns like steadily increasing memory usage, unclosed connections, or growing caches that are never evicted.

Step 7: Verify the Fix

After applying the new resource settings, monitor the replacement Pods.

Code:

kubectl get pods -n production -w

The w flag watches for changes in real time. Wait for the new Pods to reach Running status with zero restarts. Then check their memory usage.

Code:

kubectl top pods -n production

Confirm that memory usage is well below the new limit. Continue monitoring over the next hour to make sure the application is stable under normal traffic.

Building a Monitoring Baseline

Reacting to OOMKilled errors is necessary, but preventing them is better. Here are practices that reduce the frequency of resource-related incidents.

Set resource requests based on observed usage, not guesses. Use kubectl top regularly or deploy Prometheus with the kube-state-metrics exporter to collect historical usage data. Set requests to match the 95th percentile of observed usage and limits to handle reasonable spikes above that.

Use the Vertical Pod Autoscaler in recommendation mode to get data-driven suggestions for request and limit values. It analyzes actual usage patterns and tells you what each container should be configured for.

Set up alerts for Pods that are consistently using more than 80 percent of their memory limit. This gives you time to adjust before an OOMKill happens.

Review resource configurations as part of every deployment. When application code changes, its resource profile may change too. A new feature that caches more data in memory or processes larger payloads will need updated limits.

How KorPro Helps With Pod Resource Management

Monitoring individual Pods with kubectl is effective for incident response, but it does not scale across dozens of clusters and thousands of Pods. KorPro provides continuous visibility into resource usage across your entire Kubernetes estate. It identifies Pods that are over-provisioned and wasting money, Pods that are under-provisioned and at risk of OOMKill, and orphaned resources that consume capacity without serving any purpose.

For engineers who are tired of manually running kubectl top across multiple clusters, KorPro automates the detection and surfaces actionable recommendations so you can fix problems before they become incidents.

Conclusion

Monitoring Pods in Kubernetes comes down to two core skills: knowing what your Pods are consuming right now with kubectl top, and understanding their full state and history with kubectl describe. When those two tools are combined with a solid understanding of how CPU and memory requests and limits work, you can diagnose most production issues quickly and confidently. The OOMKilled scenario covered in this guide is one of the most common problems you will encounter, and the diagnostic process applies to a wide range of resource-related failures. Build the habit of checking resource usage proactively, and you will spend less time reacting to incidents and more time building reliable systems.

Automate Pod Resource Visibility Across Clusters

Tired of running kubectl top manually across dozens of clusters? Get started with KorPro to get continuous visibility into over-provisioned and under-provisioned Pods, orphaned resources, and wasted spend — all from a single dashboard. Want to see it in action? Contact us for a demo.

Stop Wasting Kubernetes Resources

Ready to Clean Up Your Clusters?

KorPro automatically detects unused resources, orphaned secrets, and wasted spend across all your Kubernetes clusters. Start optimizing in minutes.

Get Started Free Contact Us

How to Monitor Pods in Kubernetes [kubectl top & describe]

Why Pod Monitoring Matters in Production

Using kubectl top to Check Resource Consumption

Using kubectl describe to Inspect Pod State

Understanding CPU and Memory Requests vs Limits

Requests

Limits

The Gap Between Requests and Limits

Scenario: Diagnosing an OOMKilled Error

Step 1: Check Pod Status

Step 2: Describe the Failing Pod

Step 3: Check the Memory Limit

Step 4: Check Actual Memory Usage

Step 5: Check Events for Confirmation

Step 6: Fix the Problem

Step 7: Verify the Fix

Building a Monitoring Baseline

How KorPro Helps With Pod Resource Management

Conclusion

Automate Pod Resource Visibility Across Clusters

Ready to Clean Up Your Clusters?

Related Articles

P95 + Headroom: How to Right-Size Kubernetes Without Throttling Workloads

Beyond the Cluster: Cutting Managed Cloud-Service Waste Around Kubernetes

Log Ingestion Costs: Why Your Observability Bill Outgrew Your Cluster