Postgres database

Postgres Error: `53300` — Too Many Connections

stderr text

FATAL:  sorry, too many clients already
SQLSTATE: 53300

Error: Connection terminated unexpectedly
    at Connection.<anonymous> (/app/node_modules/pg/lib/client.js:132:73)

Postgres returns FATAL with SQLSTATE 53300 and closes the connection immediately — your client sees a connection-reset error.

SQLSTATE 53300 — “too many clients already” — is Postgres’s hard limit on concurrent backend processes. Each connection is a real OS process holding ~10MB of RAM plus catalogue caches; the limit exists to keep the host from OOM-killing under load. The fix is almost never to raise the cap — it’s to put a pooler between your app and Postgres so you can have thousands of cheap clients backed by a small fixed pool of expensive backends.

A useful rule of thumb: a single Postgres instance can comfortably serve ~25-50 active backends regardless of max_connections. Beyond that, query throughput drops because of context-switching and shared-buffer contention. PgBouncer in transaction-pool mode lets you map 2,000 application clients onto 25 active backends and keeps each backend usefully busy.

Why this happens

Connection pools sized too aggressively per app instance. A common bug: pool size of 20 × 10 app pods × 3 services = 600 connections, against a Postgres `max_connections = 100`. Each app instance individually looks fine; in aggregate they exhaust the server.
Long-running transactions or idle-in-transaction sessions. Connections held open by long transactions or idle-in-transaction states (often caused by application bugs that forget to commit/rollback) consume slots forever. `pg_stat_activity` shows them with `state = 'idle in transaction'`.
New deploy spawning fresh pool while old pods still active. Rolling deploys briefly double the connection footprint — old pods plus new pods. If your steady-state usage is already near the cap, deploys push you over and trigger 53300 for ~30 seconds during rollout.
Lambda or serverless without a connection pooler. Lambda concurrency × connections-per-execution can spike to thousands of connections in seconds. Postgres has no way to absorb that. RDS Proxy or PgBouncer in front is mandatory for serverless workloads.
Background workers leaking connections. Cron jobs, workers, and migration scripts that don't return connections to the pool (or use `psql` in a tight loop) leak connections until they hit the cap. Often surfaces only after weeks of uptime.

How to fix it

Fixes are ordered by likelihood. Start with the first one that matches your context.

1. Put PgBouncer or RDS Proxy in front of Postgres

A pooler lets your app open thousands of cheap pseudo-connections backed by tens of real Postgres backends. PgBouncer's `transaction` pool mode is the right default for most web workloads — connections are reused per transaction.

pgbouncer.ini text

[databases]
appdb = host=db.internal port=5432 dbname=appdb

[pgbouncer]
pool_mode = transaction
max_client_conn = 2000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3

2. Lower per-app pool size and tune idle timeouts

Most app pools default to 10-20 connections. For a transactional web app, 5-10 is usually plenty if queries are fast. Set `idleTimeoutMillis` so idle connections close instead of hoarding slots.

pool.js javascript

import { Pool } from 'pg';

export const pool = new Pool({
  host: process.env.PGHOST,
  database: process.env.PGDATABASE,
  user: process.env.PGUSER,
  password: process.env.PGPASSWORD,
  max: 8,                       // per-instance hard cap
  idleTimeoutMillis: 10_000,    // close idle after 10s
  connectionTimeoutMillis: 5_000,
});

3. Find and kill idle-in-transaction sessions

Idle-in-transaction sessions are the silent killer. Set `idle_in_transaction_session_timeout` so Postgres terminates them automatically, and audit `pg_stat_activity` to find offending application paths.

hunt.sql sql

-- Sessions in idle-in-transaction for over 5 minutes
SELECT pid, usename, application_name, state, query, now() - state_change AS idle_for
FROM pg_stat_activity
WHERE state = 'idle in transaction'
  AND now() - state_change > interval '5 minutes'
ORDER BY idle_for DESC;

-- Terminate one
SELECT pg_terminate_backend(<pid>);

4. Raise max_connections only after measuring memory

Each connection costs ~10MB. Raising `max_connections` from 100 to 500 needs ~4GB of headroom on top of `shared_buffers` and `work_mem`. On a small instance, you'll OOM the host before relieving pressure. Add a pooler instead.

5. For Lambda, mandate RDS Proxy

RDS Proxy (or PgBouncer on EC2) sits between Lambda and RDS, multiplexing thousands of Lambda concurrent executions onto a small pool. Without it, a traffic spike will hit `max_connections` instantly.

Detection and monitoring in production

Alarm on Postgres `pg_stat_database.numbackends / max_connections > 0.8`. Track idle-in-transaction count separately — sustained non-zero values indicate a code bug. CloudWatch (RDS) and Datadog both expose these metrics out of the box. Page on saturation, not on individual 53300 events; one event during deploys is normal, sustained 53300 is critical.

Related errors

Frequently asked questions

What''s the difference between 53300 and ECONNREFUSED? +

53300 means Postgres is up and reachable but refusing a new connection due to capacity. ECONNREFUSED means nothing is listening on the host:port (Postgres down, wrong host/port, firewall). 53300 is a Postgres protocol-level rejection; ECONNREFUSED is a TCP-level rejection.

Should I just raise max_connections? +

Almost never as the first response. Each backend costs ~10MB plus a process; raising the cap without adding RAM causes OOM kills that are worse than 53300. Add a pooler (PgBouncer/RDS Proxy) first — it gives you 10-100x the effective connection capacity without changing Postgres at all.

Why does my Lambda function get 53300 occasionally? +

Lambda containers cache connections, but cold starts open new ones, and concurrent invocations open one each. A spike from 0 to 200 concurrent Lambdas opens 200 connections in seconds. RDS Proxy (or PgBouncer on EC2) absorbs the burst by multiplexing them onto a stable pool.

How do I see how many connections are currently in use? +

`SELECT count(*) FROM pg_stat_activity;` shows total. Group by `state` for breakdown (`active`, `idle`, `idle in transaction`). The `numbackends` field on `pg_stat_database` is the same number, queryable per database.

Why are idle-in-transaction sessions so dangerous? +

They consume a connection slot, hold any locks the transaction acquired, and prevent VACUUM from cleaning up dead tuples (causing bloat). A single idle-in-transaction session running for hours can starve every other writer and block autovacuum across the whole table.

What pool_mode should I use in PgBouncer? +

`transaction` for most web workloads — connection reused per transaction, no session state preserved. `session` if you need session-pinned features like LISTEN/NOTIFY, prepared statements, or temp tables. `statement` is rare and breaks transactions, only for read-only fan-out.

Does superuser bypass max_connections? +

Postgres reserves `superuser_reserved_connections` slots (default 3) for superusers — meaning the cap for non-superusers is `max_connections - superuser_reserved_connections`. You can always log in as superuser to terminate runaway sessions, even when the app is fully locked out.

How do I make connection leaks visible during development? +

Set the application pool max to 2-3 in dev. Code that leaks connections will fail fast under any concurrency, surfacing the bug in dev rather than after weeks of production uptime. Add an integration test that runs N+1 sequential transactions against a pool of N connections — leaks turn into hangs immediately.

When to escalate to Postgres support

Escalate to your DBA or cloud provider if (a) `pg_stat_activity` shows few sessions but you're still getting 53300 — likely a bug in connection counting or replica routing, (b) RDS / Cloud SQL hits the cap despite a sized pooler, often a sign the cluster's Performance Insights flagged a runaway session, or (c) deploys consistently push connection count over 100% of cap and the pooler isn't smoothing it.