How Nginx Works — DevDunia

The C10K Crisis: Why Apache Collapsed at 10,000 Connections

The year is 2002. The web is growing fast. Apache powers most of the internet. But a new problem is emerging: what happens when a single server needs to handle 10,000 simultaneous connections? This was called the C10K Problem, and Apache's architecture had a fundamental flaw that made it impossible to solve.

Apache: Thread-Per-Connection

10,000 CONNECTIONS =

Each square = 100 threads

~10,000 OS threads spawned

8MB stack × 10K = 80 GB RAM just for stacks

Context switch overhead: catastrophic

Each new connection spawns (or borrows) a thread
Thread sits blocked waiting for I/O to complete
10,000 threads = 10,000 CPU context switches per second
Memory exhaustion at a few thousand concurrent users
Load average spikes, latency climbs, server dies

Nginx: Event-Driven Workers

10,000 CONNECTIONS =

Worker 1

2,500 conns

Worker 2

2,500 conns

Worker 3

2,500 conns

Worker 4

2,500 conns

4 workers handle ALL 10,000

Non-blocking I/O via epoll / kqueue

Memory: ~200 bytes per connection

Each worker runs a single-threaded event loop
Worker never blocks — asks the OS to notify on I/O readiness
1 worker core handles thousands of connections via epoll
No context switching between connections — CPU stays hot
Memory per connection: ~200 bytes vs Apache's 8 MB per thread

THE MATH THAT KILLS APACHE

A default Linux thread has an 8 MB stack. 10,000 threads = 80 GB of virtual memory just for stacks before you've stored a single byte of request data. Even with smaller stacks, you're burning CPU on thousands of context switches per second — the OS scheduler spends more time switching between threads than those threads spend doing actual work. Nginx sidesteps this entirely by never blocking a thread in the first place.

ONE THREAD PER CORE ZERO BLOCKING C10K SOLVED

Architecture: Master Process + Worker Processes

Nginx starts as a single master process that manages everything. It reads config, binds ports, and spawns worker processes — one per CPU core by default. Workers do all the actual request handling. The master never touches live traffic.

Master Process

PID 1 · reads nginx.conf · binds :80 :443 · spawns workers · handles signals

Worker 0

CPU Core 0
Event loop
Non-blocking I/O

Worker 1

CPU Core 1
Event loop
Non-blocking I/O

Worker 2

CPU Core 2
Event loop
Non-blocking I/O

Worker 3

CPU Core 3
Event loop
Non-blocking I/O

Cache Mgr

Expires cached
objects, evicts
stale entries

Cache Loader

Loads cache
metadata from disk
into shared mem

MASTER PROCESS RESPONSIBILITIES

Reads and validates nginx.conf
Binds privileged ports (:80, :443) — only master needs root
Forks worker processes (one per worker_processes)
Handles SIGHUP (reload config without downtime)
Monitors workers, respawns on crash
Manages binary upgrades with zero downtime

WORKER PROCESS RESPONSIBILITIES

Runs a tight event loop forever
Accepts connections from the shared listen socket
Reads requests, routes them, sends responses
Proxies to upstream app servers
Reads static files using sendfile()
Handles TLS termination in-process

HOT RELOAD: ZERO DOWNTIME CONFIG CHANGE

Send SIGHUP to the master (or run nginx -s reload). The master reads the new config, forks new worker processes running the new config, then tells old workers to stop accepting new connections and finish their existing ones gracefully. During the brief overlap, both old and new workers run simultaneously — zero dropped connections.

The Event Loop: How 1 Worker Handles Thousands of Connections

This is the core of Nginx's genius. A single worker process runs an infinite loop, asking the operating system: "which of my thousands of open connections are ready for I/O right now?" The OS answers (via epoll on Linux, kqueue on BSD) with a list of ready file descriptors. The worker processes only those — never waiting on anything.

NGINX WORKER EVENT LOOP — Animated

READ

ROUTE

PROXY/SERVE

FILTERS

SEND

epoll
kqueue

CURRENT PHASE
ACCEPT
Worker calls accept() to get a new connection fd. Registers it with epoll for read-readiness.

OS MULTIPLEXING
epoll_wait()
returns N ready fds
Linux: epoll (O(1))
BSD/Mac: kqueue
Old: select/poll (O(N))

WHY IT'S FAST

          Worker never sleeps

          waiting on slow I/O.

          epoll only wakes it

          when data is ready.

          No wasted cycles.

THREAD BLOCKING I/O (OLD WAY)

accept() → new thread spawned
read() → thread BLOCKS waiting for data
write() → thread BLOCKS waiting for socket
close() → thread returned to pool
Thread doing nothing for 95% of its life

EPOLL NON-BLOCKING I/O (NGINX)

epoll_wait() → returns ready fds
read() → returns immediately (data ready)
write() → returns immediately (socket ready)
epoll_wait() → next batch
Worker always doing useful work

HTTP Request Lifecycle in Nginx

Every HTTP request that arrives at Nginx passes through a fixed sequence of internal phases. Nginx's module system hooks into specific phases — this is how features like auth, rate limiting, gzip compression, and caching all plug in without touching the core.

TCP
Connect

3-way
handshake

TLS
Handshake

ClientHello
ServerHello

Parse
Headers

Method, URI
Host header

Server
Block

Match
server_name

Location
Match

prefix or
regex match

Handler

proxy_pass
or static file

Filters

gzip, headers
sub, ssi

Send
Response

write() or
sendfile()

Phase Deep Dive

SERVER BLOCK MATCHING

Nginx picks the right server {} block by comparing the Host header against each server_name. Priority: exact match first, then *.wildcard, then wildcard.*, then regex. If nothing matches, uses the default_server.

LOCATION MATCHING ORDER

= exact match — highest priority
^~ prefix, stops regex if matched
~ case-sensitive regex
~* case-insensitive regex
Longest prefix match (no modifier)

FILTER CHAIN (OUTPUT)

Filters run on the response before it's sent. They're chained — each filter reads from the previous one. Built-in filters: gzip (compress body), headers_filter (add/remove headers), sub (string substitution in body), ssi (server-side includes). Order matters: gzip must run after sub.

HANDLER TYPES

static: serves files directly from disk via root or alias
proxy_pass: forwards to upstream HTTP server
fastcgi_pass: speaks FastCGI to PHP-FPM etc.
uwsgi_pass: speaks uWSGI to Python apps

Reverse Proxy: Nginx as the Front Door

The most common production use of Nginx: sit in front of your app servers and forward requests to them. Clients never talk to your app directly. Nginx handles TLS, absorbs slow clients, adds/rewrites headers, and maintains a keepalive connection pool to upstreams so it's not opening a new TCP connection on every request.

REVERSE PROXY TOPOLOGY

Client

Browser / Mobile / API

HTTPS :443

encrypted

Nginx

            TLS termination

            Header rewrite

            Rate limiting

            Keepalive pool

HTTP :8000-8003

plaintext, internal

App Server 1

:8000

App Server 2

:8001

App Server 3

:8002

Header Manipulation

Headers Nginx ADDS (to upstream)

X-Forwarded-For: <client-ip> — real client IP since upstream sees Nginx's IP
X-Forwarded-Proto: https — tells app the original scheme was HTTPS
X-Real-IP: <client-ip> — simplified single-IP header
Host: example.com — preserves original Host header (needs proxy_set_header Host $host)

Keepalive Connection Pool

Without keepalive, Nginx opens a new TCP connection to the app server for every request. With keepalive 32 in the upstream block, Nginx maintains a pool of 32 persistent connections per worker. Requests reuse connections — no TCP handshake overhead on hot paths.

        upstream backend {

          server 127.0.0.1:8000;

          keepalive 32;

        }

Load Balancing: Three Algorithms Animated

Nginx can distribute requests across multiple upstream servers using three built-in algorithms. Watch them in action — each incoming request dot flies from the queue to the server Nginx picks. Switch algorithms and see the distribution change in real time.

NGINX LOAD BALANCER — Live Simulation

ALGORITHM

SERVER STATS

Server A :8000

Server B :8001

Server C :8002

ROUND ROBIN

Cycles through servers 1, 2, 3, 1, 2, 3... Each gets an equal share. Simple and effective when servers are equally powerful.

ROUND ROBIN

Default. Config: just list servers in upstream. Add weight=N to skew distribution. No state needed — O(1) per request.

upstream app {
  server s1 weight=3;
  server s2 weight=1;
}

LEAST CONNECTIONS

Routes to the server with fewest active connections. Best when requests have wildly varying response times. Config: least_conn; in upstream block.

upstream app {
  least_conn;
  server s1;
  server s2;
}

IP HASH

Hashes client IP → always same server for same client. Essential for session stickiness without a session store. Config: ip_hash; in upstream block.

upstream app {
  ip_hash;
  server s1;
  server s2;
}

SSL/TLS Termination: One Padlock to Rule Them All

Instead of every app server needing TLS certificates, keys, and the CPU cost of encryption, Nginx does it all in one place. The external world gets HTTPS. The internal network between Nginx and your app servers uses plain HTTP. This is called SSL termination — Nginx terminates the encrypted tunnel.

SSL/TLS TERMINATION FLOW

Client

HTTPS

TLS encrypted
TLS 1.3

HTTPS :443
encrypted

Nginx

SSL TERMINATION

 Holds private key
 Validates certs
 Session cache
 OCSP stapling

HTTP :8000
plain

Internal
Network
VPC / localhost

App 1

No TLS needed

App 2

No TLS needed

SESSION CACHE

TLS handshakes are expensive (multiple round trips). Nginx maintains a session cache in shared memory across all worker processes. A returning client presents a session ticket — Nginx resumes without the full handshake. Configure with ssl_session_cache shared:SSL:10m — that's 10 MB shared across all workers, holding ~40,000 sessions.

OCSP STAPLING

Normally a browser checks a certificate's revocation status by querying the CA's OCSP server — adding latency to every new TLS connection. With OCSP stapling (ssl_stapling on), Nginx pre-fetches the OCSP response from the CA and "staples" it to the TLS handshake. Client gets revocation proof for free, with zero extra round trips.

WHY TERMINATE AT NGINX, NOT AT THE APP

Single cert management: one cert renewed in one place (or via certbot + nginx), not on every app server
CPU offload: modern Nginx uses AES-NI hardware acceleration; your Python/Ruby app doesn't have to
Protocol flexibility: Nginx handles HTTP/2 and HTTP/3 externally; your app stays simple HTTP/1.1 internally
Security boundary: app servers on the internal network never need a public IP or open port 443

Zero-Copy: The sendfile() System Call

When Nginx serves a static file, it uses the sendfile() system call instead of the traditional read-then-write approach. The difference: without sendfile, the file data crosses the kernel/user boundary twice. With sendfile, it stays entirely in the kernel — zero copies into user space. This is why Nginx can saturate a gigabit network interface serving static assets while using almost no CPU.

WITHOUT sendfile — 4 Copies

Disk

file lives here

Kernel Buffer

Copy 1: disk → kernel buf

User Space (Nginx)

Copy 2: kernel → user buf

KERNEL / USER BOUNDARY CROSSED TWICE

Kernel Buffer

Copy 3: user → kernel buf

Socket / NIC

Copy 4: kernel → DMA

        4 copies · 2 context switches · CPU burns on memcpy
      

WITH sendfile() — 2 Copies

Disk

file lives here

Kernel Buffer

Copy 1: disk → kernel buf

        sendfile() — stays in kernel, never enters user space
      

Socket Buffer / NIC

Copy 2: kernel buf → DMA

          2 copies · 0 context switches · up to 3x faster
        

          nginx.conf:
sendfile on;
tcp_nopush on; # batch sendfile chunks
tcp_nodelay on; # no Nagle delay
        

TCP_NOPUSH + TCP_NODELAY: THE COMBO

tcp_nopush (Linux: TCP_CORK) tells the kernel to buffer multiple sendfile chunks and send them in one TCP segment — fewer packets, better throughput for large files. tcp_nodelay disables Nagle's algorithm — sends the last partial segment immediately without waiting. Nginx enables both: cork during the bulk transfer, uncork to flush the tail. Best of both worlds.

Nginx vs Apache: Key Differences

Nginx was built from scratch to fix specific problems with Apache's architecture. They're not just different implementations of the same design — they make fundamentally different trade-offs.

Attribute	Nginx	Apache
Concurrency model	Event-driven, non-blocking I/O. Fixed number of workers (one per CPU core).	`prefork`: one process per connection. `worker`: one thread per connection. `event`: hybrid (better but still heavier).
Memory usage at 10K connections	~100–200 MB total (events are cheap)	`prefork`: ~10–80 GB (one process per conn). `worker`: ~8 GB. `event`: ~500 MB.
Config style	Declarative block-based. No per-directory runtime config. All config compiled once on load.	Supports `.htaccess` per-directory overrides — evaluated on every request. Flexible but slower.
Dynamic content	Always via external process (proxy_pass, fastcgi_pass). Nginx never runs PHP/Python natively.	Can run PHP/Perl directly in-process via `mod_php`, `mod_perl`. More tightly coupled but simpler setup.
Static file serving	Extremely fast via `sendfile()`. Benchmarks: Nginx is 2–4x faster than Apache for static assets.	Good, but adds overhead from `.htaccess` checks, module chain, and process model.
Module loading	Modules compiled in at build time (or dynamic modules in Nginx 1.9.11+). No per-request module scanning.	Modules loaded as DSOs (`LoadModule`). Easy to add/remove without recompile.
Best for	High-concurrency reverse proxy, static file server, SSL termination, microservices front door, CDN edge.	Shared hosting, legacy PHP apps, complex per-directory rewrite rules, `.htaccess`-based configs.
Reverse proxy performance	Purpose-built. Upstream keepalive pool, buffering, upstream health checks are first-class features.	`mod_proxy` works but is not the primary use case. More memory overhead per proxied connection.

EVENT LOOP WINS ZERO COPIES MILLIONS SERVED

END OF HOW-09