The process of storing data records across different machines is called sharding and it is approach of MongoDB for meeting the demands in growth of data. As the data size increases, one single hardware can not be able to store the data and also can not provide reasonable write and Read speed throughout. Sharding is the process which solves the problem with scaling horizontally. With sharding, we can add more machines to support data growth.
Why Sharding?
- Because Vertical scaling is too expensive.
- Also because size of Local disk is not big enough.
- When active dataset is big, Memory can’t be large enough
- And also Single set of replica has limit as 12 nodes.
Sharding Concept Description
The below diagram will show us the Sharding in MongoDB using sharded cluster.
Following are three main components of the above diagram…
Query Routers −The query routers, which is also known as mongos process, acts as the entry point as well as interface to MongoDB. instead of connecting to the underlying shards and replica sets, Applications connect to query routers. And mongos executes queries, gathers results, and passes them to the application.
It don’t hold any determined state and it is typically hosted in the same instance as the application server. Also it is particularly low on system resources.
Shards − Shard is used to collect data and it come up with data uniformity and high accessibility . In production environment, every shard is a separate duplicate set in production environment.
Config Servers − it store’s the metadata of cluster and this data contains cluster’s data mapping set to the shards. And the query router uses these data or we can say metadata to target operating for particular shards. Also sharded clusters have exactly 3 config servers In production environment.