Kubernetes Error: ImagePullBackOff — Cannot Pull Image
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 2m (x4 over 5m) kubelet Pulling image "myapp:v2.3.1"
Warning Failed 2m (x4 over 5m) kubelet Failed to pull image "myapp:v2.3.1": rpc error: code = Unknown desc = Error response from daemon: manifest for myapp:v2.3.1 not found: manifest unknown
Warning Failed 2m (x4 over 5m) kubelet Error: ErrImagePull
Normal BackOff 30s (x10 over 5m) kubelet Back-off pulling image "myapp:v2.3.1"
Warning Failed 30s (x10 over 5m) kubelet Error: ImagePullBackOff
ImagePullBackOff is the kubelet telling you it tried to pull your image, failed, retried, failed again, and is now waiting before the next attempt. It’s not actually the error — it’s the back-off state. The real error is one of manifest unknown, unauthorized, no matching manifest, toomanyrequests, or a network failure, and you find it in kubectl describe pod’s Events list.
The five causes are almost evenly distributed in production: typos, expired registry credentials, architecture mismatches, Docker Hub rate-limits, and egress problems. Each has a single, mechanical fix once you know which one applies — so spend 30 seconds reading Events before you change anything.
Why this happens
- Wrong image name or tag. Typo in the image reference, tag that doesn't exist (someone deleted `:latest`, or you referenced a tag from another repo), or the registry returned a manifest-unknown response. `kubectl describe pod` shows the exact registry error.
- Private registry auth missing or wrong. The image is in a private registry (ECR, GCR, GHCR, Docker Hub private repos) and the pod has no `imagePullSecrets` or the secret's docker-config is wrong/expired. ECR tokens expire every 12 hours unless you're using IRSA/Workload Identity.
- Architecture mismatch (arm64 vs amd64). Image was built for `linux/amd64` but the node is `linux/arm64` (Graviton, Apple Silicon-based dev cluster) or vice versa. The pull succeeds, then fails with 'no matching manifest for linux/arm64 in the manifest list entries.' Common with single-arch images and modern ARM nodes.
- Registry rate-limit (Docker Hub anonymous). Docker Hub limits anonymous pulls to 100/6h per IP. Busy clusters share an egress IP and get 429s as `toomanyrequests: You have reached your pull rate limit`. Fix is to authenticate (free tier gets 200/6h, paid more) or mirror images to your own registry.
- Network egress blocked or DNS broken. Pod's node can't reach the registry — VPC egress denied, NAT gateway saturated, or CoreDNS can't resolve the registry hostname. Less common than the above but produces an ImagePullBackOff with a TCP/DNS error in the Events.
How to fix it
Fixes are ordered by likelihood. Start with the first one that matches your context.
1. Read kubectl describe pod's Events section
The first command on every ImagePullBackOff. The Events list shows the exact reason from the kubelet, which is one of: manifest unknown, unauthorized, no matching manifest, rate limit, or network error. Each maps to a different fix.
# Find the failing pod:
kubectl get pods -A | grep -E 'ImagePullBackOff|ErrImagePull'
# Read the events:
kubectl describe pod <pod-name> -n <namespace> | grep -A 30 Events:
# Or filter events directly:
kubectl get events -n <namespace> --sort-by=.lastTimestamp \
| grep <pod-name>
2. Verify the image is pullable outside Kubernetes
Reproduce the pull from your laptop or a test node. If `docker pull` (or `crane`/`skopeo`) fails the same way, the issue is the image, the tag, or registry auth — not Kubernetes. If it succeeds outside but fails inside, the issue is the cluster's auth or egress.
# Try the pull as the cluster would see it:
docker pull myregistry.example/team/app:v2.3.1
# Inspect manifest without pulling layers:
crane manifest myregistry.example/team/app:v2.3.1
# Check what platforms a multi-arch manifest supports:
docker buildx imagetools inspect myregistry.example/team/app:v2.3.1
3. Add or fix imagePullSecrets for private registries
For private registries, the pod (via its ServiceAccount) needs a docker-registry secret. ECR, GCR, ACR all have native cloud-IAM alternatives (IRSA on EKS, Workload Identity on GKE) — prefer those over long-lived secrets.
# 1. Create the secret (one-time, repeat per namespace):
# kubectl create secret docker-registry regcred \
# --docker-server=myregistry.example \
# --docker-username=<user> \
# --docker-password=<pat-or-password> \
# -n production
# 2. Reference it in the pod spec or service account:
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
namespace: production
imagePullSecrets:
- name: regcred
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: production
spec:
template:
spec:
serviceAccountName: app-sa
containers:
- name: app
image: myregistry.example/team/app:v2.3.1
4. Build multi-arch images for mixed-architecture clusters
If your cluster has Graviton/ARM nodes, your images need an `arm64` variant in their manifest list. `docker buildx build --platform linux/amd64,linux/arm64 --push` produces one. Mixed-arch is increasingly common — assume it.
# One-time setup:
docker buildx create --use --name multiarch
docker buildx inspect --bootstrap
# Build + push amd64 and arm64 in one manifest list:
docker buildx build \
--platform linux/amd64,linux/arm64 \
-t myregistry.example/team/app:v2.3.1 \
--push .
5. Authenticate to Docker Hub or move to a private mirror
For Docker Hub rate-limit errors, log in (raises limit to 200/6h), pay for a Pro account (no limit), or mirror frequently-pulled public images to your own ECR/GCR. Distroless and base images especially benefit from a registry mirror inside your VPC.
Detection and monitoring in production
Track ImagePullBackOff at cluster level — Prometheus + kube-state-metrics exposes `kube_pod_status_reason{reason="ImagePullBackOff"}`. Alert when any pod is in this state for >5 minutes; for production deployments, ImagePullBackOff after a deploy means the rollout will time out and traffic stays on the old version. Also alert on ECR token expiry (use a CronJob to refresh, or migrate to IRSA).
Related errors
- kubernetesCrashLoopBackOffYour container starts, exits with a non-zero code (or is OOMKilled), the kubelet restarts it, it exits again — repeat. After several quick failures the kubelet enters CrashLoopBackOff, an exponential delay between restart attempts so you don't melt the node.
- pythonModuleNotFoundErrorThe Python interpreter walked `sys.path` and couldn't find the module you imported. Most common cause: you installed the package in a different environment (different venv, different Python version, system pip vs project pip) than the one running your code.
- nextjsmodule_not_foundThe Next.js build (webpack/Turbopack) tried to resolve an import path and couldn't find it. Either the package isn't installed, the relative path is wrong, a TypeScript path alias isn't mirrored in `tsconfig.json` and `next.config.js`, or the file's case differs between disk and import (Linux is case-sensitive, macOS isn't).
- nodejsERR_REQUIRE_ESMYour CommonJS file used `require('some-package')` but the package is now ESM-only (its `package.json` has `"type": "module"` or `"exports"` only points to `.mjs`). Node's CJS loader can't synchronously load an ESM module.
- postgresECONNREFUSEDYour application tried to open a TCP connection to Postgres and the OS rejected it — Postgres isn't listening on the host:port you specified, or a firewall blocked the connection.
Frequently asked questions
What's the difference between ErrImagePull and ImagePullBackOff? +
How do I see the actual error message from the registry? +
My ECR pull worked yesterday and fails today. +
Why does the same image work on one node but ImagePullBackOff on another? +
Is `imagePullPolicy: Always` causing my problem? +
Can I retry a failed image pull manually? +
My image is in Docker Hub and public. Why am I getting unauthorized? +
How do I make sure imagePullSecrets are picked up? +
When to escalate to Kubernetes support
Escalate to your cluster admin or cloud team if (a) ImagePullBackOff is happening only from one node pool (network/egress issue), (b) the registry returns 5xx errors persistently (registry-side outage), or (c) your IRSA/Workload Identity is configured but pods still get unauthorized (IAM trust policy mismatch). For the registry itself: open a ticket with the registry vendor including the failing image reference and a packet capture if possible.