Flask Load Balancing with multiple app instances

February 2026 - Posted in Flask by fullstackist

TL;DR Load balancing across multiple instances of a Flask application using gunicorn and HAProxy improves responsiveness, scalability, and fault tolerance by distributing incoming requests evenly across available instances.

Scalable Web Development with Flask: Load Balancing Multiple App Instances

As your Flask application grows in popularity, it's essential to ensure that it can handle increasing traffic without compromising performance or user experience. One effective way to achieve this is by implementing load balancing across multiple app instances. In this article, we'll delve into the world of Flask load balancing, exploring its benefits and showcasing a practical approach to implementing it.

Why Load Balancing Matters

Load balancing ensures that incoming requests are distributed evenly across available instances, preventing any single instance from becoming a bottleneck. This leads to improved responsiveness, increased scalability, and reduced downtime. By distributing the workload, you can also:

Reduce server overload
Improve resource utilization
Enhance fault tolerance

Setting Up Multiple App Instances

Before diving into load balancing, let's set up multiple instances of our Flask application using a simple approach with gunicorn. We'll create two separate instances: one for development and one for production.

Creating the Flask Application

Firstly, we need to create a basic Flask application structure. Create a new file named app.py and add the following code:

from flask import Flask

app = Flask(__name__)

@app.route("/")
def index():
    return "Hello, World!"

if __name__ == "__main__":
    app.run(debug=True)

Configuring Gunicorn for Multiple Instances

Now that we have our Flask application up and running, let's create two separate configurations using gunicorn:

Instance 1 (Development):

Create a new file named dev.conf and add the following code:

worker-class gevent
workers 5
threads 10
bind 127.0.0.1:5000

Instance 2 (Production):

Similarly, create another file named prod.conf with the following configuration:

worker-class sync
workers 5
bind 0.0.0.0:80

Running Multiple Instances

With our configurations in place, let's run each instance separately using the following commands:

Instance 1 (Development):

gunicorn -c dev.conf app:app

Instance 2 (Production):

gunicorn -c prod.conf app:app

Implementing Load Balancing with HAProxy

HAProxy is an excellent load balancer that can distribute incoming requests across multiple instances. We'll configure HAProxy to use the two gunicorn instances we created earlier.

Installing and Configuring HAProxy

Firstly, install HAProxy using your package manager:

apt-get install haproxy

Create a new file named /etc/haproxy/haproxy.cfg with the following configuration:

global
    daemon
    maxconn 256
    user haproxy
    group haproxy

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000

frontend http
    bind *:80

    default_backend app_instances

backend app_instances
    balance roundrobin
    mode http
    option httpchk GET /healthcheck
    server dev_instance 127.0.0.1:5000 check
    server prod_instance 127.0.0.1:80 check

Starting HAProxy

Finally, start the HAProxy service:

service haproxy restart

In this article, we explored how to implement load balancing across multiple instances of a Flask application using gunicorn and HAProxy. By distributing incoming requests evenly across available instances, you can improve your application's responsiveness, scalability, and fault tolerance.

Putting it all Together

To recap, here's the entire configuration:

Create a basic Flask application structure (app.py)
Set up multiple app instances using gunicorn configurations (e.g., dev.conf, prod.conf)
Install and configure HAProxy for load balancing (haproxy.cfg)

By following this guide, you can ensure that your Flask application remains scalable and responsive even under high traffic conditions.

Next Post Previous Post

Fullstackist aims to provide immersive and explanatory content for full stack developers

Web development learning resources and communities for beginners...

TL;DR As a beginner in web development, navigating the vast expanse of online resources can be daunting but with the right resources and communities by your side, you'll be well-equipped to tackle any challenge that comes your way. Unlocking the World of Web Development: Essential Learning Resources and Communities for Beginners As a beginner in web development, navigating the vast expanse of online resources can be daunting. With so many tutorials, courses, and communities vying for attention, it's easy to get lost in the sea of information. But fear not! In this article, we'll guide you through the most valuable learning resources and communities that will help you kickstart your web development journey.

Understanding component-based architecture for UI development...

Component-based architecture breaks down complex user interfaces into smaller, reusable components, improving modularity, reusability, maintenance, and collaboration in UI development. It allows developers to build, maintain, and update large-scale applications more efficiently by creating independent units that can be used across multiple pages or even applications.

What is a Single Page Application (SPA) vs a multi-page site?...

Single Page Applications (SPAs) load a single HTML file initially, handling navigation and interactions dynamically with JavaScript, while Multi-Page Sites (MPS) load multiple pages in sequence from the server. SPAs are often preferred for complex applications requiring dynamic updates and real-time data exchange, but MPS may be suitable for simple websites with minimal user interactions.