Troubleshooting Guide

Diagnosing `CrashLoopBackOff` in Kubernetes

CrashLoopBackOff means a container repeatedly starts, crashes, and is throttled by Kubernetes’ backoff. This guide provides a safe diagnosis workflow and common fixes.

What CrashLoopBackOff actually means

Kubernetes restarts containers based on the pod’s restart policy (typically Always for Deployments). When the container exits repeatedly, Kubernetes delays subsequent restarts (backoff), resulting in CrashLoopBackOff.

Fast triage checklist (5 minutes)

Identify the failing container and last exit code.
Check recent logs (including previous instance logs).
Inspect pod events (image pull errors, probes, OOM kills).
Validate config/secrets mounts and environment variables.
Confirm readiness/liveness probe behavior.

Step 1 — Locate the failing pod and container

kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>

In describe output, look for:

Last State and Exit Code
Reason (for example, OOMKilled, Error)
Events at the bottom (probe failures, mount errors, image pulls)

Step 2 — Read logs (including the previous crash)

If the container restarts quickly, “current logs” might be short. Use --previous to view the prior instance.

# Current container logs
kubectl logs <pod-name> -n <namespace> -c <container-name>

# Logs from the previous crashed instance
kubectl logs <pod-name> -n <namespace> -c <container-name> --previous

Step 3 — Confirm whether probes are causing restarts

Misconfigured probes are a common cause of restart loops, especially when:

The app needs more startup time than initialDelaySeconds provides
The readiness endpoint is too heavy and times out
Liveness probes check a dependency (DB/cache) that may be temporarily down

Check events for probe failures:

kubectl describe pod <pod-name> -n <namespace> | sed -n '/Events/,$p'

Step 4 — Common root causes and fixes

A) Application exits immediately (bad args/config)

Symptoms: exit code 1 or 2, logs show “unknown flag”, “missing env var”, or “cannot parse config”.

Fix:

Validate env vars and secrets exist and are spelled correctly
Confirm config file path and mount permissions
Compare container args/command to a known-good release

B) `OOMKilled` (out of memory)

Symptoms: Reason: OOMKilled in pod status; restarts increase under load.

Fix:

Increase memory limits/requests appropriately
Investigate memory leaks (heap dumps, profiling)
Lower concurrency temporarily to stabilize

C) Image or runtime errors

Symptoms: events show ImagePullBackOff, ErrImagePull, or the binary can’t execute (wrong architecture).

Fix:

Verify image tag exists and registry credentials are valid
Confirm image architecture matches nodes (amd64/arm64)
Pin to a known-good tag (avoid floating latest)

D) Dependency not ready (DB/cache)

Symptoms: logs show connection refused/timeouts; the app exits rather than retrying.

Fix:

Add retry/backoff logic in the application
Increase startup probe window (use startupProbe when appropriate)
Separate readiness checks from dependency health when possible

Safe remediation patterns

Don’t delete blindly: capture describe output and logs first.
Roll back quickly: if the crash correlates with a recent rollout, revert to the last stable version.
Change one thing: adjust probes or config in small steps to avoid hiding the real cause.

When to escalate

Escalate to engineering with concrete evidence:

Exit code + reason (OOMKilled, Error)
Relevant log excerpt (with timestamps)
Recent deployment changes (image tag, config map, secret, flags)
Probe failures and thresholds

Related Samples

This is a sample article to demonstrate how I write.

Diagnosing CrashLoopBackOff in Kubernetes

What CrashLoopBackOff actually means

Fast triage checklist (5 minutes)

Step 1 — Locate the failing pod and container

Step 2 — Read logs (including the previous crash)

Step 3 — Confirm whether probes are causing restarts

Step 4 — Common root causes and fixes

A) Application exits immediately (bad args/config)

B) OOMKilled (out of memory)

C) Image or runtime errors

D) Dependency not ready (DB/cache)

Safe remediation patterns

When to escalate

Related Samples

Diagnosing `CrashLoopBackOff` in Kubernetes

B) `OOMKilled` (out of memory)