Ayush Sharma
Back to Blogs
Light Mode
Change Font
Decrease Size
18
Reset Size
Increase Size
Copy Markdown
Load Balancing Explained: What It Is, How It Works, and When You Need It - Blog cover image
14 min read
By Ayush Sharma

Load Balancing Explained: What It Is, How It Works, and When You Need It

A complete guide to understanding load balancers. Learn the techniques, algorithms, and tools to distribute traffic across servers effectively. From DNS load balancing to Caddy and building your own.

Tags:
Load BalancingSystem DesignBackendInfrastructureDevOps

You launch your app. A hundred users show up. Everything works fine.

Then a thousand users arrive. Your server starts sweating. Response times go up. Some requests timeout. Eventually, the whole thing crashes.

This is the scaling problem. And load balancing is one of the most fundamental solutions to it.

In this guide, we will cover everything you need to understand about load balancers. What they are, when you need them, how they work, the different techniques, and even how to set one up yourself.

Basic Load Balancer Architecture


What is a Load Balancer?

A load balancer is a system that sits between your users and your servers. Its job is simple: distribute incoming requests across multiple servers so that no single server gets overwhelmed.

Think of it like a restaurant host. When customers arrive, the host does not send everyone to the same table. They spread guests across different sections so that no single waiter gets buried with orders.

Without a load balancer, all traffic hits one server. That server has limits. CPU, memory, network bandwidth. Once you hit those limits, users wait. Then they leave.

With a load balancer, you can run multiple servers behind it. Traffic gets distributed. Each server handles a portion of the load. When one server gets busy, others pick up the slack.

The core idea: Instead of scaling up (buying a bigger server), you scale out (adding more servers). Load balancers make scaling out possible.


Why Do You Need Load Balancing?

Let us look at the specific problems load balancing solves.

High Availability

Your single server crashes at 3 AM. Your application is down until someone wakes up and fixes it.

With load balancing, you have multiple servers. If one fails, the load balancer detects it and routes traffic to the healthy ones. Users might not even notice anything happened.

Fault Tolerance

Hardware fails. Network connections drop. Software crashes. These things happen.

Load balancers perform health checks. They constantly ping your servers to verify they are responding correctly. When a server fails a health check, it gets removed from the rotation automatically. When it recovers, it gets added back.

Scalability

Black Friday arrives. Your traffic spikes 10x. With a single server, you are stuck.

With load balancing, you add more servers when traffic increases. The load balancer includes them automatically. When traffic drops, you remove the extra servers. You only pay for what you use.

Better Performance

Load balancers can route users to the server that will respond fastest. Some use geographic proximity. Others track which servers have the most available capacity.

The result: faster response times for your users.


When Should You Use a Load Balancer?

Not every application needs load balancing from day one. Here is how to think about it.

You probably need load balancing when:

  • Your application has more traffic than a single server can handle
  • Downtime is unacceptable for your business
  • You need to deploy updates without taking the application offline
  • You are running a production system that real users depend on
  • You expect traffic spikes (product launches, sales events, viral moments)

You probably do not need load balancing when:

  • You are building a prototype or side project
  • Your traffic is low and a single server handles it comfortably
  • You are fine with occasional downtime during deployments
  • Cost is a primary concern and high availability is not critical

Many developers add load balancing too early, creating complexity they do not need. Others wait too long and suffer outages when traffic grows.

A practical approach: start with a single server, monitor your traffic and server metrics, and add load balancing when you see signs of strain or when downtime becomes costly.


How Load Balancers Work

The basic flow is straightforward.

  1. User sends a request to your domain (example.com)
  2. DNS resolves to the load balancer's IP address
  3. Load balancer receives the request
  4. Load balancer picks a backend server based on its algorithm
  5. Request gets forwarded to that server
  6. Server processes the request and sends the response back
  7. Load balancer returns the response to the user

The user never knows there are multiple servers. They just see your domain.

Request Flow Through Load Balancer

Layer 4 vs Layer 7

Load balancers operate at different layers of the network stack.

Layer 4 (Transport Layer) load balancers work with TCP and UDP. They look at IP addresses and ports to make routing decisions. They are fast because they do not inspect the actual content of requests.

Layer 7 (Application Layer) load balancers understand HTTP and HTTPS. They can look at URLs, headers, cookies, and request content. This allows smarter routing decisions but adds some overhead.

FeatureLayer 4Layer 7
SpeedFasterSlightly slower
FlexibilityBasic routingContent-aware routing
SSL TerminationNoYes
URL-based routingNoYes
Session stickinessIP-based onlyCookie-based
Use caseHigh throughputWeb applications

Most web applications use Layer 7 load balancers because the routing flexibility outweighs the small performance cost.


Load Balancing Algorithms

The algorithm determines how the load balancer picks which server gets each request. Different algorithms suit different situations.

Round Robin

The simplest approach. Requests go to servers in order: Server 1, Server 2, Server 3, Server 1, Server 2, Server 3, and so on.

Pros: Simple to understand and implement. Works well when all servers have equal capacity and requests take similar time to process.

Cons: Does not account for server load. If one server is struggling, it still gets the same number of requests as others.

Best for: Homogeneous server environments where requests are roughly uniform.

Weighted Round Robin

Like round robin, but servers have weights based on their capacity. A server with weight 3 gets three times as many requests as a server with weight 1.

Pros: Accounts for different server capabilities. You can add a powerful server alongside older ones.

Cons: Weights are static. Does not adapt to real-time conditions.

Best for: Mixed server environments with known capacity differences.

Least Connections

Sends each request to the server with the fewest active connections.

Pros: Adapts to real-time load. Servers processing slow requests naturally get fewer new requests.

Cons: Requires tracking connection state. Does not account for server capacity differences.

Best for: Applications where request processing time varies significantly.

Weighted Least Connections

Combines least connections with server weights. Considers both current load and server capacity.

Best for: Production environments with mixed server capacities and variable request durations.

IP Hash

Uses the client's IP address to determine which server handles their request. Same IP always goes to the same server.

Pros: Provides session persistence without cookies. Useful when you cannot modify the application.

Cons: Can create uneven distribution. A single busy client affects only one server.

Best for: Legacy applications that require session stickiness but cannot use cookies.

Least Response Time

Routes requests to the server with the lowest average response time and fewest active connections.

Pros: Optimizes for user experience. Naturally routes around slow servers.

Cons: Requires measuring response times, adding some overhead.

Best for: Performance-critical applications where response time matters.

Load Balancing Algorithms Comparison

Quick Algorithm Reference

AlgorithmBest ForDrawback
Round RobinEqual servers, uniform requestsIgnores server load
Weighted Round RobinMixed server capacitiesStatic weights
Least ConnectionsVariable request timesNo capacity awareness
Weighted Least ConnectionsProduction environmentsMore complex
IP HashSession stickiness needsUneven distribution risk
Least Response TimePerformance priorityMeasurement overhead

DNS Load Balancing

Before we talk about dedicated load balancers, there is a simpler approach: using DNS itself.

How It Works

Normally, your domain points to a single IP address. With DNS load balancing, you configure multiple A records pointing to different server IPs.

example.com    A    192.168.1.1
example.com    A    192.168.1.2
example.com    A    192.168.1.3

When a user's browser asks "what is the IP for example.com?", the DNS server returns these IPs in rotating order. Different users get directed to different servers.

This is called Round Robin DNS.

The Good

  • Simple: No extra infrastructure needed. Just DNS configuration.
  • Cheap: No load balancer to pay for.
  • Global distribution: Works well for directing users to geographically close data centers.

The Bad

  • No health checks: DNS has no idea if a server is down. It keeps sending traffic to dead servers until you manually update the records.
  • Caching issues: Browsers and ISPs cache DNS responses. Even after you remove a bad server, some users keep trying to connect to it for hours.
  • No session persistence: Each DNS query might return a different IP. Users could bounce between servers mid-session.
  • Limited algorithms: Mostly just round robin. No least connections or response time awareness.

When to Use DNS Load Balancing

DNS load balancing makes sense for:

  • Distributing traffic across geographic regions
  • Basic redundancy where some downtime is acceptable
  • Supplementing a dedicated load balancer (GSLB setups)

For production applications that need high availability, DNS load balancing alone is not enough. You need something that can detect failures and respond in real-time.


Let us look at the tools people actually use.

Nginx

Originally a web server, Nginx has become one of the most popular load balancers. It handles both Layer 4 and Layer 7 load balancing.

Strengths:

  • Extremely fast and efficient
  • Can also serve static files and handle SSL termination
  • Huge community and documentation
  • Free and open source (Nginx Plus adds commercial features)

When to use: When you want a battle-tested solution that can also serve as your web server.

HAProxy

A dedicated load balancer that has been around since 2001. Known for reliability and performance.

Strengths:

  • Designed specifically for load balancing
  • Excellent performance under high load
  • Advanced health checks
  • Detailed statistics and monitoring
  • Completely free and open source

When to use: When you need a dedicated, high-performance load balancer. Especially good for TCP load balancing.

Caddy

A modern web server with automatic HTTPS and simple configuration. Its reverse proxy features make it an easy load balancer option.

Strengths:

  • Automatic HTTPS with Let's Encrypt
  • Simple configuration syntax
  • Built-in health checks
  • Good for small to medium deployments

When to use: When you want simplicity and automatic HTTPS without complex configuration.

Cloud Load Balancers

AWS ELB, Google Cloud Load Balancing, and Azure Load Balancer are managed services. You do not run any servers.

Strengths:

  • No infrastructure to manage
  • Automatic scaling
  • Integration with other cloud services
  • Pay-per-use pricing

When to use: When you are already in a cloud environment and want to minimize operational overhead.

Quick Comparison

FeatureNginxHAProxyCaddyCloud LB
Ease of setupMediumMediumEasyEasiest
PerformanceExcellentExcellentGoodExcellent
Auto HTTPSNoNoYesYes
CostFreeFreeFreePay-per-use
Web server tooYesNoYesNo
ManagedNoNoNoYes

Setting Up a Load Balancer with Caddy

Caddy is the simplest way to get started. Here is a basic example.

Basic Configuration

Create a file called Caddyfile:

example.com {
    reverse_proxy backend1:8080 backend2:8080 backend3:8080
}

That is it. Caddy will:

  • Automatically get an SSL certificate
  • Distribute traffic across all three backends
  • Use random selection by default

Adding Round Robin

example.com {
    reverse_proxy backend1:8080 backend2:8080 backend3:8080 {
        lb_policy round_robin
    }
}

Health Checks

example.com {
    reverse_proxy backend1:8080 backend2:8080 backend3:8080 {
        lb_policy round_robin
        health_uri /health
        health_interval 10s
        health_timeout 2s
    }
}

Caddy will check /health on each backend every 10 seconds. If a server fails to respond within 2 seconds, it gets removed from rotation.

Available Policies

Caddy supports these load balancing policies:

  • random (default)
  • round_robin
  • least_conn
  • first (always use first available, good for failover)
  • ip_hash
  • uri_hash
  • header
  • cookie

Building Your Own Load Balancer

Understanding how load balancers work internally helps you make better decisions about using them. Here is the basic concept.

The Core Components

A simple load balancer needs:

  1. A server to accept incoming requests
  2. A list of backend servers
  3. An algorithm to pick which backend handles each request
  4. Health checks to detect failed backends
  5. Logic to forward requests and return responses

The Algorithm

Here is the conceptual flow:

incoming_request:
    backend = select_backend(algorithm, backends)
    if backend is healthy:
        response = forward_request(backend, request)
        return response
    else:
        try next backend

Health Checks

Run a background process that periodically:

  1. Sends a request to each backend's health endpoint
  2. Marks backends as healthy or unhealthy based on response
  3. Updates the available backend list

What Makes It Hard

Building a toy load balancer is straightforward. Building a production-ready one is not.

Connection handling: You need to manage thousands of concurrent connections efficiently. This requires understanding of non-blocking I/O, event loops, and connection pooling.

State management: Sticky sessions require tracking which client goes to which server. This state must be fast to look up and update.

Failure handling: What happens if a backend fails mid-request? You need to retry on a different server without the user noticing.

Performance: Every millisecond the load balancer adds to each request multiplies across all your traffic. The implementation must be highly optimized.

For learning, building a simple load balancer is excellent. For production, use proven tools like Nginx, HAProxy, or Caddy.


Load Balancing for Scalable Systems

When you design for scale, load balancing becomes part of a larger architecture.

Horizontal Scaling

Load balancers enable horizontal scaling. Instead of upgrading to a more powerful server (vertical scaling), you add more servers of the same size.

Vertical scaling has limits. There is only so much CPU and RAM you can put in one machine. It is also risky. One server means one point of failure.

Horizontal scaling with load balancing has no theoretical limit. Need more capacity? Add more servers. One fails? Others continue.

Stateless Applications

For load balancing to work smoothly, your application should be stateless. Each request should be handleable by any server.

This means:

  • No storing session data in server memory
  • No writing files to local disk that other servers need
  • Using external storage (databases, Redis, S3) for shared state

If your application must have sticky sessions, use IP hash or cookie-based routing. But stateless is cleaner.

Database Considerations

Your application servers might scale horizontally, but what about your database?

Common patterns:

  • Read replicas: Write to one database, read from multiple replicas. Load balance read queries across replicas.
  • Connection pooling: Use a tool like PgBouncer to manage database connections efficiently.
  • Caching: Put Redis or Memcached in front of your database to reduce load.

Multi-Region Deployment

For global applications, you might have servers in multiple regions. You can use:

  • DNS-based routing: Direct users to the nearest region
  • Global load balancing (GSLB): More sophisticated than DNS, with health checks and failover
  • CDN: Cache static content at edge locations worldwide

Scalable Architecture with Load Balancing


Best Practices

Always Use Health Checks

Configure your load balancer to actively check backend health. Do not wait for user requests to discover a dead server.

Create a dedicated health endpoint that verifies your application can actually serve requests. A simple /health that returns 200 is better than nothing, but checking database connectivity is even better.

Plan for Load Balancer Failure

The load balancer itself can fail. For high availability:

  • Run multiple load balancer instances
  • Use an active-passive or active-active setup
  • Put a virtual IP (VIP) in front that can float between instances

Enable Logging

Log requests at the load balancer level. This gives you visibility into:

  • Which backends are receiving traffic
  • Response times from each backend
  • Error rates and failed requests

Start Simple

Begin with round robin or least connections. Only add complexity (weighted algorithms, sticky sessions) when you have a specific need.

Test Failure Scenarios

Before you need it in production, test what happens when:

  • A backend server dies
  • The load balancer fails over
  • You add or remove backends

Key Takeaways

  1. Load balancers distribute traffic across multiple servers to handle more load and prevent single points of failure.

  2. DNS load balancing is simple but lacks health checks. It works for basic redundancy and geographic distribution.

  3. Layer 7 load balancers understand HTTP and enable smarter routing based on URLs, headers, and cookies.

  4. Start with simple algorithms like round robin or least connections. Add complexity only when needed.

  5. Health checks are essential. Always configure your load balancer to detect and route around failed backends.

  6. Caddy, Nginx, and HAProxy are solid open-source options. Cloud load balancers reduce operational burden.

  7. Stateless applications scale better. Move session state to external storage.

  8. The load balancer can fail too. Plan for redundancy in your load balancing layer.


Further Reading

...

Comments

0

Loading comments...

Related Articles