System Design, Part 3: Scaling Single Server

  • Server scaling is needed to handle increasing amount of requests, simultaneous connections, that is, to handle increasing load, and traffic.
  • There are two ways to scale:
    • Vertical scaling, or scale up
    • Horizontal scaling, or scale out

Vertical Scaling

  • Refers to adding more (CPU, RAM, storage, etc.) to the server, i.e., adding more power to the server.
  • Good when traffic is relatively moderate to high, since it is simple and easy to maintain.
  • Adding more power to the server is also limited, as we can't add infinite amount of resources to the server.
  • Vertical scaling still has a single point of failure. If server goes down, entire application goes down.

Horizontal Scaling

  • Refers to adding more servers into the whole system architecture. These servers are redundant, that means, each server contains similar configuration and each of them runs the same version of application. This collection of server is called the resource pool, or server pool.
  • Good for handling large applications that have huge load and traffic.
Server 1Server 2Server nClientrequestresponse...Resource Pool/Server Pool


  • Usually, multiple servers are redundant, that means, one server is replicated to other servers, both have same copies of configurations and application version running.

The above image is incomplete. How does a client know which server to connect? And suppose, if clients connect to server 1 , and the server reaches its load limit after getting so many clients connected with it and so many requests to process, how to redirect new connections to another server? And how to determine which server will that be?

The solution is, clients don't directly connect to servers, all the connections go through something called Load Balancers.

Server 1Server 2Server nClientrequestresponse...Load BalancerResource Pool/Server Poolconnection to appropriate serverdetermined by load balancer


Load Balancers are also nothing but a server which has the job to efficiently distribute incoming connection requests among the resource pool.

The Load Balancer receives an incoming request and forwards it to an appropriate server. That server is determined on the basis of a certain algorithm, which can be anything depending upon the architect of the load balancer, or the whole system. The Load Balancers keeps a track on how much one server is occupied and redirects new connections to another server.


© progshala.in