IPSec VPN Tunnels
A deep-dive for engineers who need to reason about IPSec at a design level; not just configure it.
There's a reason IPSec has survived decades of evolving network architecture; it's not elegant, but it's thorough. Understanding it properly means getting past the surface-level "it encrypts your traffic" explanation and into how the protocol suite actually negotiates, wraps, and protects packets at every hop.
What IPSec Actually Is
IPSec isn't a single protocol. It's a framework, a collection of RFCs that together define how to authenticate, encrypt, and integrity-check IP-layer traffic. The two workhorses are:
- Authentication Header (AH) — Provides data integrity and origin authentication by hashing the IP packet (including certain header fields). It does not encrypt. If your threat model requires confidentiality, AH alone won't cut it.
- Encapsulating Security Payload (ESP) — The one most deployments actually use. ESP encrypts the payload and optionally provides integrity and authentication. In most modern configurations, you'll use ESP with both encryption and HMAC.
These protocols operate over two distinct modes:
- Transport Mode — Only the IP payload is protected. The original IP header remains intact. Used for host-to-host communication where both endpoints natively speak IPSec.
- Tunnel Mode — The entire original IP packet is encapsulated inside a new IP packet with a fresh header. This is what VPN gateways use. The inner packet travels invisibly through the tunnel; the outer packet handles routing.
Tunnel Mode Packet Structure:
┌─────────────────────────────────────────────────────────┐
│ Outer IP Header │ ESP Header │ [Inner IP Header + │
│ (gateway IPs) │ │ Original Payload] │
│ │ │ ← encrypted → │
└─────────────────────────────────────────────────────────┘
IKE: The Handshake That Makes It All Work
Before a single byte of application data is encrypted, two endpoints need to agree on cryptographic parameters and exchange keys. That negotiation is handled by IKE — Internet Key Exchange (most modern deployments use IKEv2, defined in RFC 7296).
IKE operates in two phases:
Phase 1 — IKE SA (Security Association)
The goal here is to build a secure, authenticated channel between the two peers — a control plane tunnel. This involves:
- Algorithm negotiation — Both peers agree on an encryption algorithm (e.g., AES-256), integrity algorithm (e.g., SHA-256), Diffie-Hellman group, and authentication method.
- Diffie-Hellman exchange — Both sides contribute public values; the shared secret is computed independently on each end without it ever crossing the wire.
- Authentication — Using pre-shared keys (PSK) or digital certificates (RSA/ECDSA). Certificate-based auth is strongly preferred in any serious deployment.
Phase 1 Flow (IKEv2 simplified):
Initiator Responder
│ │
│──── IKE_SA_INIT (proposals, DH) ──►│
│◄─── IKE_SA_INIT (chosen algo, DH) ─│
│ │
│──── IKE_AUTH (identity, cert) ────►│
│◄─── IKE_AUTH (identity, cert) ─────│
│ │
│ IKE SA Established │
Phase 2 — Child SA (IPSec SA)
Once the IKE SA is up, it's used to negotiate one or more Child SAs — the actual data-plane tunnels. Each Child SA is unidirectional, so a bidirectional tunnel requires two. Parameters negotiated here include the ESP algorithm suite and traffic selectors (which source/destination IP ranges the tunnel covers).
Phase 2 Flow:
Initiator Responder
│ │
│──── CREATE_CHILD_SA (proposals) ──►│
│◄─── CREATE_CHILD_SA (chosen) ──────│
│ │
│ IPSec SA Established │
│ (data traffic now flows) │
Each SA is identified by a Security Parameter Index (SPI) — a 32-bit value in the ESP header that tells the receiving end which SA (and therefore which key) to use for decryption.
Phase 1 IKE SA
│
│ (long-lived control channel, ~8hr lifetime)
│
├──► Child SA #1 (e.g. office subnet ↔ DC subnet)
├──► Child SA #2 (e.g. dev subnet ↔ cloud VPC)
└──► Child SA #3 (e.g. management hosts only)
Site-to-Site vs. Remote Access: Architectural Differences
These two deployment patterns look similar on the surface but have meaningful architectural differences.
Site-to-Site VPN
- Both endpoints are fixed gateways (firewalls, routers).
- Traffic selectors are static — you define exactly which subnets route through the tunnel.
- IKE SA is typically long-lived; Child SAs are rekeyed periodically.
- Common in connecting branch offices to a hub, or on-premises networks to cloud VPCs.
Remote Access VPN
- One endpoint is a dynamic client (laptop, phone) with a changing IP.
- IKEv2 with EAP (Extensible Authentication Protocol) is the standard approach, allowing user-level authentication on top of device authentication.
- Requires a VPN concentrator/gateway that can handle many concurrent SAs.
- Split tunneling decisions matter here; routing all traffic through the tunnel vs. only corporate-destined traffic has real security and performance trade-offs.
Encryption and Integrity: What to Actually Configure
| Parameter | Recommended | Avoid |
|---|---|---|
| Encryption | AES-256-GCM | DES, 3DES, RC4 |
| Integrity (if not GCM) | SHA-256 / SHA-384 | MD5, SHA-1 |
| DH Group | Group 14 (2048-bit) or higher; prefer Group 20/21 (ECDH) | Groups 1, 2, 5 |
| IKE Auth | RSA/ECDSA certificates | PSK in large deployments |
AES-GCM is worth calling out specifically. It's an AEAD cipher (Authenticated Encryption with Associated Data), meaning it handles both encryption and integrity in a single pass. This is both more efficient and removes certain attack surfaces present when encryption and HMAC are applied independently.
Antipatterns Worth Knowing
Overly broad traffic selectors. Defining 0.0.0.0/0 on both sides sounds convenient; it's a footgun. It routes all traffic through the tunnel regardless of destination, breaks split tunneling logic, and makes troubleshooting miserable. Define selectors as narrowly as your architecture allows.
Long SA lifetimes. Extending rekey intervals to reduce overhead is a common optimisation that backfires. Longer lifetimes mean a compromised key has a larger window of exposure. Stick to defaults (8 hours for IKE SA, 1 hour for Child SA) unless you have a specific reason.
PSKs in production at scale. Pre-shared keys are fine for small, static setups. In any environment with many peers or any key rotation requirement, the operational cost of managing PSKs securely exceeds the setup cost of a proper PKI.
No Dead Peer Detection (DPD). Without DPD, a gateway has no way to know if its peer has gone offline. Stale SAs accumulate, traffic silently blackholes, and on-call gets a late-night alert. Enable DPD with a sane interval.
Alternatives Worth Knowing
- WireGuard — Far simpler codebase (~4,000 lines vs IPSec's complexity), faster handshakes, and excellent performance. The trade-off is a smaller feature surface — no EAP, limited interoperability with legacy infrastructure. Strong choice for greenfield deployments where you control both endpoints.
- SSL/TLS VPNs — Operate at Layer 7, traverse NAT and firewalls more easily, and work from any browser. Better suited for clientless remote access; not a replacement for network-layer tunnels.
- ZTNA (Zero Trust Network Access) — Not a VPN replacement per se, but a different model: rather than granting network access and trusting the endpoint to reach only what it should, ZTNA brokers access at the application level with continuous policy evaluation. Increasingly the right answer for remote access to internal applications.
Resources
- RFC 7296 — IKEv2
- RFC 4301 — IP Security Architecture
- RFC 4303 — ESP
- IPSec VPN Design — Bollapragada, Khalid & Wainner
- VPNs Illustrated: Tunnels, VPNs, and IPsec — Jon C. Snader