
Proxies rarely break on traffic volume; they usually break on how many connections you open at once. This guide explains “concurrency” caps, how vendors enforce them, and what you can do to stay under the ceiling. To see how it ties in with thread, port, and traffic caps, see proxy usage limits.
What “concurrent connections” means
A concurrent connection is an active TCP/TLS session that hasn’t been closed yet. Limits count open sockets, not just requests per second.
In practice, each browser tab, headless page, or script thread can open several connections at once due to parallel resource loading, HTTP/2 streams, and background beacons.
Details:
- One tab can hold multiple connections to the proxy, and each connection can fetch multiple requests if keep-alive is on.
- Modern stacks create extra traffic you might not expect: telemetry, ad/analytics calls, preconnect, push, and font/CDN pulls.
- UDP/QUIC isn’t common for proxies; most caps apply to TCP/TLS over HTTP/HTTPS or SOCKS.
Where providers set the cap
Vendors cap concurrency either per IP (e.g., 10–100 connections at once) or per package (one pooled ceiling shared across all IPs you own).
Per-package counting lets you burst on a subset of IPs while staying within the overall plan limit.
Details:
- Per-IP caps are typical for small static plans.
- Per-package caps are common for shared pools where you run many threads across many IPs at once.
- Some “unlimited” plans still throttle new connections per second (CPS), which feels like random timeouts when you ramp too fast.
What actually counts as “one connection”
A connection is a TCP/TLS session from your client to the proxy gateway that stays open until closed.
HTTP keep-alive reuses the same connection for many requests, saving concurrency budget.
Counts toward the cap:
- Each open TCP/TLS socket to the proxy.
- WebSocket tunnels and CONNECT tunnels kept open.
- Each parallel HTTP/1.1 keep-alive connection.
Often confusing:
- Threads ≠ connections. One thread can reuse a single keep-alive connection; the reverse is also true.
- HTTP/2 multiplexes within one TCP/TLS connection, which can reduce the number of sockets while increasing throughput.
How limits are enforced (and how it looks when you hit them)
When you exceed the cap, providers usually refuse new sockets or throttle connection establishment.
Symptoms appear as connection timeouts, 502/504 gateway errors, or sudden latency spikes while old sockets still work.
Common enforcement patterns:
- Hard refuse: new connections are dropped or reset until you fall under the cap.
- Slow mode: connection setup slows dramatically for 30–60 minutes after a spike.
- Silent throttle: no dashboard alert; you just see more timeouts and increased TTFB.
- Burst penalties: short overages trigger longer cool-downs.
How to estimate your needed concurrency
Start from simultaneous sockets, not just RPS.
Multiply your planned parallel workers by the average number of sockets each worker holds under load.
Quick planner:
- Measure one worker: with keep-alive enabled, how many sockets stay open at steady-state?
- Multiply by planned parallelism.
- Add 20–30% headroom for retries, DNS changes, and target-side slowness.
- If using a package-level cap, ensure your peak sum across all apps fits under the ceiling.
When per-IP vs per-package makes sense
Per-IP caps suit stable, long-lived sessions per address. Package caps suit bursty workloads spread across many IPs.
If your workload spikes unevenly, package counting is usually more forgiving.
Rule of thumb:
- Per-IP cap: long sticky sessions, native apps, or few workers per IP.
- Per-package cap: scrapers/automation with many short tasks and rotating IPs.
Keep-alive and connection reuse: your best friend
Turning on keep-alive drastically reduces the number of sockets.
Aim for long-lived connections with request pipelining or HTTP/2 to cut concurrent socket count per worker.
Practical tips:
- Enable keep-alive with a higher idle timeout (30–120 s).
- Prefer HTTP/2 if your client and proxy support it.
- Batch small requests in sequence on the same connection.
- Reuse sessions in headless browsers instead of opening fresh pages for trivial actions.
Connection ramp-up and CPS limits
Even if total concurrency is under the cap, opening many sockets quickly can hit connection-per-second throttles.
Stagger ramp-up and pre-warm a small pool of keep-alive connections per worker.
Mitigation:
- Increase worker start jitter (e.g., ±200–500 ms).
- Cap per-worker max sockets (e.g., 2–4).
- Pre-connect at startup, then reuse.
Detecting you are at the ceiling
Look for failed dials and rising TTFB during spikes, not just overall failure rate.
If retries succeed after a short delay, you are likely facing throttle rather than target bans.
Signals to log:
- Dial errors vs HTTP status codes (separate clearly).
- Connection pool exhaustion in your HTTP client.
- New connection latency histogram (p50/p90/p99).
- Proxy dashboard alerts, if your vendor provides them.
Browsers, headless, and emulators hit caps faster
Real browsers open more connections than simple scripts.
Extensions, analytics, push, and prefetch inflate concurrency even with only a few tabs.
Advice:
- Disable non-essential extensions and prefetch.
- Consolidate work into fewer pages or frames.
- Use headless modes with strict connection limits.
Capacity planning playbook (step by step)
Start small, measure sockets, then scale with headroom.
You want stable success rates at the concurrency you actually pay for.
- Baseline: run a single worker for 3–5 minutes with keep-alive on; record average open sockets.
- Parallelism: add workers gradually until you hit your target throughput or connection errors appear.
- Ceiling test: push 20% above intended load for 1–2 minutes; watch dial failures and slow mode triggers.
- Choose the right plan: if you consistently need more sockets than a per-IP cap allows, move to a package-level model. For pricing mechanics and trade-offs, see proxy pricing models.
- Operationalize: enforce max sockets per process and stagger job starts; monitor dial errors separately from HTTP status codes.
Configuration examples
Short, realistic snippets to keep your socket count under control.
curl (serial with keep-alive)
The default uses keep-alive; avoid --no-keepalive and cap parallelism outside curl.
# Good: one connection reused
curl -x http://USER:PASS@HOST:PORT https://example.com/resource
Python (requests) with a small pool
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
adapter = HTTPAdapter(pool_connections=4, pool_maxsize=4,
max_retries=Retry(total=2, backoff_factor=0.2))
session.mount("http://", adapter)
session.mount("https://", adapter)
proxies = {"http": "http://USER:PASS@HOST:PORT",
"https": "http://USER:PASS@HOST:PORT"}
for url in urls:
r = session.get(url, proxies=proxies, timeout=20)
handle(r)
Node.js (global-agent) with keep-alive
const http = require("http");
const https = require("https");
http.globalAgent.keepAlive = true;
https.globalAgent.keepAlive = true;
http.globalAgent.maxSockets = 4;
https.globalAgent.maxSockets = 4;
// Use a single agent across requests; reuse connections.
Playwright/Chromium (reduce browser fan-out)
- Launch with --disable-features=PreconnectToSearch,NetworkService where safe.
- Turn off prefetch and unused extensions.
- Reuse contexts instead of new browsers per task.
Troubleshooting checklist
Short, targeted checks that resolve most concurrency issues.
Work from client settings outward before blaming the provider.
- Keep-alive enabled, idle timeout ≥ 30 s.
- Max sockets per process capped (2–6 is a good starting point).
- Staggered start/jitter enabled; no instant 0→N spikes.
- Separate dial errors from HTTP 4xx/5xx in logs.
- If per-IP capped: spread workers across more IPs.
- If package capped: redistribute load across time or upgrade the package.
FAQs
How is concurrency different from requests per second (RPS)?
Concurrency counts open sockets at a moment; RPS counts completed requests per second. With keep-alive, one connection can serve many requests, so you can have high RPS with low concurrency.
Why do things work fine, then suddenly slow down?
You likely crossed a cap and the provider applied slow mode or CPS throttling. Reuse connections, lower ramp-up speed, or reduce worker count to recover.
Is HTTP/2 always better for concurrency?
Often yes, because it multiplexes many streams over one TCP/TLS connection. But if either side downgrades or resets, clients may open parallel sockets; still set low max sockets to be safe.
Do rotating proxies change the math?
Rotation doesn’t change the socket limit itself, but frequent IP swaps can trigger extra dials. Keep rotation periods reasonable and reuse connections between rotations when possible.
What metrics should I watch?
Track open sockets, new connection latency, dial errors, and success rate separately from target-side HTTP codes. Alert on spikes in dial failures and p99 TTFB.