HAProxy — HTTP Load Balancing HAProxy1.4

I’ve posted a few articles on load balancing with the use of BIGIP F5 hardware appliances. However, there are also a few alternatives available, some even free! HAProxy is a popular load balancing application that has a robust collection of features.

HAProxy isĀ  “The Reliable, High Performance TCP/HTTP Load Balancer”, taken right from the title of their web page. It has many different uses available, for this article I am going to focus on the HTTP load balancing functionality of it. Our scenario is as follows:


I have a domain called mysite.example.com. It is a simple web environment serving HTML/PHP content. It is being hosted by a Web Farm consisting of 4 Apache web servers..

Summary of scenario:

  • http://mysite.example.com
  • Web Farm of 4 Apache web servers.
  • index.html will be the main content file served by the Web Farm.
  • healthcheck.html will be used by HAProxy for monitoring server status.

NOTICE: Content Synchronization will not be covered in this article, it is assumed that deploying code to a web farm is consistent and congruent across all servers in a Web Farm.

Quick Terms: (for this article)

  • Node = network device
  • Load balancer = HAProxy server
  • Pool = group of related nodes, usually servers
  • Pool Members = a single node from a pool
  • Virtual Machine(VM), guest, virtualbox etc = all refer to the same thing, a virtual instance of a hardware/software device.
  • Farm, or Web Farm = a number of servers grouped together based on a common service, in this case a web service called Apache.

Architecture:

For this proof of concept test I utilized VirtualBox to spin up the 5 servers I needed to get this job done. 5? What I thought you said 4. I did say 4 however we need a server to run the HAProxy. Yes, you could technically get away with running HAPorxy along side Apache on one of the web servers in one of the web farms, but I like to keep things simple. Also, since these are virtual machines I can create as many as I want.
I also kept each virtual machine on the same 10.0.0.0/24 private network to keep things simple, with the exception of the HAProxy server. The HAProxy server had two network interface cards(NICs), one tied into the 10.0.0.0/24 network I mentioned above, as well as another tied into the 192.168.56.x/24 network. The idea here is to show the flexibility of a load balancing environment and that running a load balancing service is not restricted to listening on one network or the same network as the pool members.

NOTICE: If you are unfamiliar with VirtualBox, try VMWare, or QEMU.

Installation:

  1. Setting up our Web Farms and IP’ing them:

    After loading Ubuntu or your flavour of Linux onto each Virtual Machine, install Apache2 (I recommend do this all on one virtual machine, and then cloning it out to the rest). Now IP each server, my set up was as follows:

    Hostname | IP
    web1farm1 | 10.0.0.10
    web2farm1 | 10.0.0.20
    web1farm2 | 10.0.0.110
    web1farm2 | 10.0.0.120

    NOTICE: Make sure they all have the same netmask of 255.255.255.0 !

  2. Simple HTML Content for Testing:

    For this test I kept the default DocumentRoot at /var/www/index.html, however I changed the HTML content in the index.html file to this;

    <h3>Web Server #1</h3>

    Also create a healthcheck.html file in /var/www/. I’ll explain why later.

    HAProxy -- Response OK

    NOTICE: This healthcheck.html file doesn’t contain any HTML tags in it.

  3. On the HAProxy server, install HAProxy package from apt-get.
    sudo apt-get install haproxy

    This will install HAProxy default configuration file located at /etc/haproxy/haproxy.cfg. Make a backup copy of it before editing it.

    sudo cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak

Quick test before Configuration Step

  1. Test Each Web Server

    Browse to:
    http://10.0.0.10/index.html and http:/10.0.0.10/healthcheck.html
    http://10.0.0.20/index.html and http:/10.0.0.20/healthcheck.html
    http://10.0.0.110/index.html and http:/10.0.0.110/healthcheck.html
    http://10.0.0.120/index.html and http:/10.0.0.120/healthcheck.html
    Ensure each web page is rendering correctly.

  2. Check HAProxy server and make sure no other application is using socket port 80
     netstat -anl

Configuring HAProxy

  1. Edit the Default Config File

    Edit the default config file for HAProxy and remove mostly everything execept the following:

     sudo vi /etc/haproxy/haproxy.cfg
    # this config needs haproxy-1.1.28 or haproxy-1.2.1
    global
            maxconn 72000
            daemon
    
    defaults
            timeout connect 4000ms
            timeout client  60000ms
            timeout server  30000ms
    
    frontend webVIP_80
            mode http
            bind    192.168.56.101:80
            #which backend
            default_backend webfarm1
    
    backend webfarm1
            mode http
            balance roundrobin
            option httpchk GET /healthcheck.html HTTP/1.1rnHost: mysite.example.com
            http-check expect string Response OK
            server webserver1 10.0.0.10:80 weight 5 check slowstart 5000ms
            server webserver2 10.0.0.20:80 weight 5 check slowstart 5000ms
            server webserver3 10.0.0.110:80 weight 5 check slowstart 5000ms
            server webserver4 10.0.0.120:80 weight 5 check slowstart 5000ms
    
  2. What is this non-sense!
    • global = This block sets the following parameters as globally defined. For example, maxconn and daemon, signify HAProxy to only allow a maximum of 4096 connections and run as a daemon
    • defaults = This block sets the following parameters that will be assumed as defaults for each instance. I’ve only set a few timeout settings in here, because I will explicity set parameters at each instances configuration block.
    • frontend = This is the main listening setting. You specifiy an arbitrary name, bind, and default_backend. Thebind parameter is which IP:Port(socket) HAProxy should listen on for incoming requests, and default_backend tells it which backend to load balance to. Also, mode is set to http because we will be doing http URI routing later on.
    • backend = We have two of these, one for each of our web farms. The backend parameter is where you identify each server member in the load balancing pool. You also specify which load balancing method to use, the health monitor to use, and also any weight parameters. For our example we are using a balance method of roundrobin, a health monitor of GET /healthcheck.html, and even weight parameter of 5 to each server.

    NOTICE: The frontend bind setting is set to a 192.168.56.x address, while our backend servers are set to 10.0.0.x addresses. The HAProxy server must have a tie into each network in order for it ot reach both networks, hence the two NICs we set up earlier.

  3. Start HAProxy and Test:
    sudo haproxy -d -f /etc/haproxy/haproxy.cfg

    NOTICE: -d signifies debug and -f signifies where the haproxy.cfg is.

    Browse to http://192.168.56.101

BAM! Load Balancing across 4 web servers!!!!

Sources: