The process of storing data records across different  machines is called sharding and it is approach of MongoDB for  meeting the demands in growth of data. As the data size increases, one single hardware can not be able to store the data and also can not provide reasonable write and Read speed throughout. Sharding is the process which solves the problem with scaling horizontally. With sharding, we can add more machines to support data growth.

Why Sharding?

  1. Because Vertical scaling is too expensive.
  2. Also because size of Local disk is not big enough.
  3. When active dataset is big, Memory can’t be large enough
  4. And also Single set of replica has limit as 12 nodes.

 Sharding Concept Description

The below diagram will show us the Sharding in MongoDB using sharded cluster.

Following are three main components of the above diagram…

Query Routers −The query routers, which is also known as mongos process, acts as the entry point as well as interface  to MongoDB.  instead of connecting to the underlying shards and replica sets, Applications connect to query routers. And  mongos executes queries,  gathers results, and passes them to the application.
It don’t hold any determined state  and it  is typically hosted in the same instance as the application server. Also it  is particularly  low on system resources.

Shards − Shard is used to collect data and it come up with data uniformity and high accessibility . In production environment, every shard is a separate duplicate set in production environment.

Config Servers − it store’s the metadata of cluster and this data contains  cluster’s data  mapping set to the shards. And the query router uses these data or we can say metadata to target operating for particular shards. Also sharded clusters have exactly 3 config servers In production environment.