System Design Concepts for Product Managers


Hola Product folks 👋!

let’s dive into system design for Product Managers without wasting anytime! 🎉

System Design Types

  1. High Level System Design: It is the process of defining the overall architecture of a system, including its components, interfaces, and data flows. It is a high-level overview of the system, without getting into the details of how each component works.
  2. Low Level System Design: It is the process of designing the individual components of a system, including their internal structure, algorithms, and data structures. It is a detailed design of each component, including how it will be implemented in code.

High-level system design:

  • What are the different components of YouTube? (e.g., web servers, video encoding servers, database servers, caching servers, etc.)
  • How do these components interact with each other? (e.g., how do users upload videos? how are videos streamed to users? how are videos recommended to users?)
  • What are the overall design goals of YouTube? (e.g., scalability, reliability, performance, etc.)

Low-level system design:

  • How is the YouTube video encoding algorithm designed?
  • How is the YouTube database designed to store and retrieve video data efficiently?
  • How is the YouTube caching system designed to reduce network bandwidth usage?

How does a system look like?

  • The client is a computer that is used to interact with the system.
  • The server is a computer that hosts the system and provides services to clients.
  • The application logic layer is responsible for processing requests from clients and generating responses.
  • The database is used to store data that is used by the system.


  • The client and server communicate with each other over a network.
  • The client sends requests to the server, and the server sends responses back to the client.
  • The application logic layer on the server is responsible for processing the requests from the client and generating the responses.
  • The application logic layer may also need to access the database to retrieve or store data.

This type of architecture is commonly used for web applications, email systems, and other distributed systems. It is a scalable and reliable architecture that can be used to build large and complex systems.

Source Credit: HelloPM

Optimizing Database due to large number of records

When to use a read replica database?

Read replicas are commonly used in applications that have a high read-to-write ratio. This means that the application is performing more read operations than write operations. For example, a web application that serves video content typically has a high read-to-write ratio.

Read replicas can also be used in applications that need to be highly available. For example, a financial trading application needs to be highly available to ensure that traders can continue to trade even if the primary database fails.

What is a Read-replica-database?

A read replica database is a copy of a primary database that is used to serve read-only queries. Read replicas are created from the primary database and are kept synchronized with the primary database. This means that read replicas always have the most up-to-date data.

Read replicas are used to improve the performance and scalability of database-driven applications. By serving read-only queries from read replicas, the load on the primary database is reduced. This can improve the performance of the application and make it more scalable.

Read replicas can also be used to improve the availability of database-driven applications. If the primary database fails, the application can be switched to read replicas to ensure that the application remains available.

Example of using read replicas in YouTube

Photo by on Unsplash

YouTube uses read replicas to improve the performance and scalability of its video streaming service.

YouTube has a large number of users who watch millions of videos every day.

This means that YouTube’s database needs to be able to handle a large number of read-only queries.

YouTube uses read replicas to serve read-only queries, such as queries to retrieve video metadata and user recommendations.

This reduces the load on the primary database and improves the performance of the video streaming service.

Optimizing the system with read replica database

The image that you sent shows a simple client-server architecture with an application logic layer, a main database, and a read replica database.

Source Credit: HelloPM

Enhancement with Cache

Source Credit: HelloPM

What is Cache?

When you cook dinner, you need to have the ingredients on hand before you can start cooking. If you had to go to the store to buy the ingredients every time you wanted to cook, it would be very time-consuming.

Photo by Nathália Rosa on Unsplash

Instead, you store the ingredients in your refrigerator so that you have them ready to use when you need them. This saves you a lot of time and makes it much easier to cook dinner.

Photo by Erik Mclean on Unsplash

Caching works in a similar way.

When a computer program needs to access data, it is faster to retrieve the data from cache than from the main database. This is because cache is a small, fast memory that stores the most recently accessed data.

If the computer program needs to access data that is not in cache, it will have to retrieve the data from the main database.

This is slower because the main database is typically a much larger and slower storage device.

Caching can significantly improve the performance of computer programs by reducing the number of times that the program needs to access the main database. However, it is important to note that cache is limited in size. It is not possible to cache all of the data that a computer program might need to access.

Therefore, it is important to carefully consider which data should be cached. The data that is cached should be the data that is most likely to be accessed by the computer program.

Source: GFG

Types of Cache

There are mainly four types of Cache:

  1. Application Server Cache
  2. Distributed Cache
  3. Global Cache
  4. Content Distributed System(CDN)

Let’s discuss them one-by-one!

Once upon a time in the digital kingdom, there were four caching knights, each with a unique role.

They were all working to make the kingdom’s internet experience faster and smoother. Let’s meet these cache knights and explore their adventures:

Application Server Cache 🛡️ — Sir Speedy

Sir Speedy worked for the renowned Ecomerlin Company, famous for its magical shopping website.

The kingdom’s customers loved shopping there, but sometimes the website got too crowded, slowing down like a tired snail. Sir Speedy came to the rescue!

👑 King User: “Oh, my favorite Ecomerlin website is so slow today!”

Sir Speedy, with his trusty shield, was there to cache the frequently used data and store it close to the website.

He ensured that when users requested a page, he could quickly fetch it from his cache, making the website lightning fast.

Distributed Cache 🌐 — Lady Lightning

Source: GFG

Meanwhile, Lady Lightning worked at the Giant Guild Company, where gamers fought fearsome online dragons in their famous game, “Medieval Legends.” Gamers from all over the kingdom wanted to play, but the game’s servers were located far away. Lady Lightning was the savior!

🎮 Sir Gamer: “I’m ready to slay the dragon! Why is there a delay?”

Lady Lightning used her network of cache servers, spread all around the kingdom. These caches were like her magical beacons. They stored game data closer to players, reducing the time it took to load the dragon-slaying quests. Gamers were delighted!

Global Cache 🌍 — Sir Swift

Source: GFG

Sir Swift was the knight in shining armor at the WorldWideWidgets Company, where they manufactured widgets that everyone wanted. The company’s website served customers all over the globe, but there was a problem.

🌎 Lady Customer: “I need my widgets, and I need them fast!”

Sir Swift had a special mission: to store data globally, making it available to all customers near and far. He was like a global postman, ensuring that when you ordered widgets, they would arrive quickly, no matter where you were in the world.

Content Distributed System(CDN)📦 — Sir Streaming

Source: Hostinger

Our final hero, Sir Streaming, worked for the StreamKingdom Company, famous for streaming magical shows and epic tournaments. People tuned in from all corners of the kingdom, and Sir Streaming had to ensure smooth streaming.

📺 Lord Viewer: “I can’t miss the Royal Tournament! Why is my stream buffering?”

Sir Streaming was the knight of Content Distributed Network (CDN). He worked with CDNs like FastFlick, which were like courier services for videos. These CDNs had caches in different locations, storing video content, so when you watched a stream, it was as smooth as butter.

So there you have it! Our caching knights — Sir Speedy, Lady Lightning, Sir Swift, and Sir Streaming — working together to make the digital kingdom a better place, reducing waiting times, and ensuring that everyone had a magical online experience. 🏰🌟

Why it is needed?

Multiple Servers and Load Balancers

Source: HelloPM

Why Multiple Servers?

A system needs multiple servers when it needs to handle a large number of requests or when it needs to be highly available.

Benefits of using multiple servers:

  • Improved performance: By distributing the load across multiple servers, the system can handle more requests and respond to requests more quickly.
  • Increased scalability: As the system needs to handle more traffic, additional servers can be added to the system to scale up.
  • Improved reliability: If a server fails, the other servers can continue to handle requests. This makes the system more highly available.

What is a load balancer?

A load balancer is a device that distributes traffic across multiple servers. Load balancers can be used to improve the performance, scalability, and reliability of systems that use multiple servers.

Why do multiple servers need load balancers?

Load balancers are needed for multiple servers because they can help to:

  • Distribute traffic evenly: Load balancers can distribute traffic evenly across multiple servers, which helps to improve performance and reliability.
  • Direct traffic to healthy servers: Load balancers can monitor the health of servers and direct traffic to healthy servers. If a server fails, the load balancer can direct traffic to the other servers.
  • Provide a single point of entry: Load balancers can provide a single point of entry for clients, which makes it easier to manage and scale the system.


The load balancer distributes traffic evenly across the three servers.

If one of the servers fails, the load balancer will direct traffic to the other two servers.

This system is more scalable, reliable, and performant than a system with a single server.

Separating the Database and Implementation of Sharding


Why separating database?

There are several reasons why separating the database from the application logic is important:

  • Performance: Databases are typically optimized for storing and retrieving data, while application logic is optimized for performing business operations. Separating the database from the application logic can improve the performance of the system overall.
  • Scalability: Databases can be scaled more easily than application logic. By separating the database from the application logic, it is easier to scale the system to handle more users and transactions.
  • Reliability: Databases are typically more reliable than application logic. By separating the database from the application logic, it is easier to make the system more reliable.
  • Security: By separating the database from the application logic, it is easier to secure the data. The database can be made more secure by using encryption, access control, and other security measures.

What is Database sharding?

Database sharding is a technique for distributing a single dataset across multiple database servers. This allows for larger datasets to be split into smaller chunks and stored in multiple data nodes, increasing the total storage capacity of the system.

Why a system needs database sharding?

Shard 1 and Shard 2.

Each shard contains a subset of the data, and the data is distributed across the shards based on a shard key.

The shard key is a column in the database that is used to determine which shard a particular row of data belongs to.

In this example, the shard key is the customer_id column.

This means that all rows of data with the same customer_id will be stored on the same shard.

When a client application needs to access data, it first needs to determine which shard the data is stored on. This is done by using the shard key. Once the shard has been determined, the client application can send its request to the appropriate shard server.

The shard server will then process the request and return the results to the client application.

Benefits of database sharding:

  • Scalability: Database sharding allows a system to scale horizontally by adding more shard servers. This means that a system can handle more traffic by simply adding more servers.
  • Performance: Database sharding can improve performance by distributing the read and write load across multiple servers. This can be especially beneficial for systems with a high read-to-write ratio.
  • Reliability: Database sharding can improve reliability by making the system less susceptible to outages. If one shard server fails, the other shard servers can continue to operate.

Drawbacks of database sharding:

  • Complexity: Database sharding adds complexity to a system. This is because the system needs to be able to determine which shard a particular row of data belongs to, and it needs to be able to route requests to the appropriate shard server.
  • Cost: Database sharding can increase the cost of hosting a database. This is because more servers are needed to store and process the data.

Implementing Sharding Application Logic:

Sharding application logic is a technique for distributing the application logic of a system across multiple servers. This can be done for a variety of reasons, such as to improve performance, scalability, or reliability.

There are a number of different ways to shard application logic. One common approach is to shard the application logic by function. For example, one server might handle all user authentication requests, while another server handles all product catalog requests.

Another approach to sharding application logic is to shard by geography. For example, one server might handle all requests from users in North America, while another server handles all requests from users in Europe.

Sharding application logic can be a complex task, but it can be a very effective way to improve the performance, scalability, and reliability of a system.

So, this is the ending section of the blog.

Hope you guys enjoyed the blog and learnt a lot of new things.

You may also visit: