diff --git a/source/administration/sharded-clusters.txt b/source/administration/sharded-clusters.txt new file mode 100644 index 00000000000..f8d0bf7045a --- /dev/null +++ b/source/administration/sharded-clusters.txt @@ -0,0 +1,261 @@ +.. index:: sharded clusters +.. _sharding-sharded-cluster: + +================ +Sharded Clusters +================ + +.. default-domain:: mongodb + +Sharding occurs within a :term:`sharded cluster`. A sharded cluster +consists of the following. For specific configurations, see +:ref:`sharding-architecture`: + +- :program:`mongod` instances that serve as the + :term:`shards ` that hold your database collections. + +- :program:`mongod` instances that serve as config servers to hold + metadata about the cluster. The metadata maps :term:`chunks + ` to shards. See :ref:`sharding-config-server`. + +- :program:`mongos` instances that route the reads and + writes to the shards. + +.. index:: sharding; config servers +.. index:: config servers +.. _sharding-config-server: + +Config Servers +-------------- + +Config servers maintain the shard metadata in a config database. The +:term:`config database` stores the relationship between :term:`chunks +` and where they reside within a :term:`sharded cluster`. Without +a config database, the :program:`mongos` instances would be unable to +route queries or write operations within the cluster. + +Config servers *do not* run as replica sets. Instead, a :term:`cluster +` operates with a group of *three* config servers that use a +two-phase commit process that ensures immediate consistency and +reliability. + +For testing purposes you may deploy a cluster with a single +config server, but this is not recommended for production. + +.. warning:: + + If your cluster has a single config server, this + :program:`mongod` is a single point of failure. If the instance is + inaccessible the cluster is not accessible. If you cannot recover + the data on a config server, the cluster will be inoperable. + + **Always** use three config servers for production deployments. + +The actual load on configuration servers is small because each +:program:`mongos` instances maintains a cached copy of the configuration +database. MongoDB only writes data to the config server to: + +- create splits in existing chunks, which happens as data in + existing chunks exceeds the maximum chunk size. + +- migrate a chunk between shards. + +Additionally, all config servers must be available on initial setup +of a sharded cluster, each :program:`mongos` instance must be able +to write to the ``config.version`` collection. + +If one or two configuration instances become unavailable, the +cluster's metadata becomes *read only*. It is still possible to read +and write data from the shards, but no chunk migrations or splits will +occur until all three servers are accessible. At the same time, config +server data is only read in the following situations: + +- A new :program:`mongos` starts for the first time, or an existing + :program:`mongos` restarts. + +- After a chunk migration, the :program:`mongos` instances update + themselves with the new cluster metadata. + +If all three config servers are inaccessible, you can continue to use +the cluster as long as you don't restart the :program:`mongos` +instances until after config servers are accessible again. If you +restart the :program:`mongos` instances and there are no accessible +config servers, the :program:`mongos` would be unable to direct +queries or write operations to the cluster. + +Because the configuration data is small relative to the amount of data +stored in a cluster, the amount of activity is relatively low, and 100% +up time is not required for a functioning sharded cluster. As a result, +backing up the config servers is not difficult. Backups of config +servers are critical as clusters become totally inoperable when +you lose all configuration instances and data. Precautions to ensure +that the config servers remain available and intact are critical. + +.. note:: + + Configuration servers store metadata for a single sharded cluster. + You must have a separate configuration server or servers for each + cluster you administer. + +.. index:: mongos +.. _sharding-mongos: +.. _sharding-read-operations: + +Mongos Instances +---------------- + +The :program:`mongos` provides a single unified interface to a sharded +cluster for applications using MongoDB. Except for the selection of a +:term:`shard key`, application developers and administrators need not +consider any of the :ref:`internal details of sharding `. + +:program:`mongos` caches data from the :ref:`config server +`, and uses this to route operations from +applications and clients to the :program:`mongod` instances. +:program:`mongos` have no *persistent* state and consume +minimal system resources. + +The most common practice is to run :program:`mongos` instances on the +same systems as your application servers, but you can maintain +:program:`mongos` instances on the shards or on other dedicated +resources. + +.. note:: + + .. versionchanged:: 2.1 + + Some aggregation operations using the :dbcommand:`aggregate` + command (i.e. :method:`db.collection.aggregate()`,) will cause + :program:`mongos` instances to require more CPU resources than in + previous versions. This modified performance profile may dictate + alternate architecture decisions if you use the :term:`aggregation + framework` extensively in a sharded environment. + +.. _sharding-query-routing: + +Mongos Routing +~~~~~~~~~~~~~~ + +:program:`mongos` uses information from :ref:`config servers +` to route operations to the cluster as +efficiently as possible. In general, operations in a sharded +environment are either: + +1. Targeted at a single shard or a limited group of shards based on + the shard key. + +2. Broadcast to all shards in the cluster that hold documents in a + collection. + +When possible you should design your operations to be as targeted as +possible. Operations have the following targeting characteristics: + +- Query operations broadcast to all shards [#namespace-exception]_ + **unless** the :program:`mongos` can determine which shard or shard + stores this data. + + For queries that include the shard key, :program:`mongos` can target + the query at a specific shard or set of shards, if the portion + of the shard key included in the query is a *prefix* of the shard + key. For example, if the shard key is: + + .. code-block:: javascript + + { a: 1, b: 1, c: 1 } + + The :program:`mongos` *can* route queries that include the full + shard key or either of the following shard key prefixes at a + specific shard or set of shards: + + .. code-block:: javascript + + { a: 1 } + { a: 1, b: 1 } + + Depending on the distribution of data in the cluster and the + selectivity of the query, :program:`mongos` may still have to + contact multiple shards [#possible-all]_ to fulfill these queries. + +- All :method:`insert() ` operations target to + one shard. + +- All single :method:`update() ` operations + target to one shard. This includes :term:`upsert` operations. + +- The :program:`mongos` broadcasts multi-update operations to every + shard. + +- The :program:`mongos` broadcasts :method:`remove() + ` operations to every shard unless the + operation specifies the shard key in full. + +While some operations must broadcast to all shards, you can improve +performance by using as many targeted operations as possible by +ensuring that your operations include the shard key. + +.. [#namespace-exception] If a shard does not store chunks from a + given collection, queries for documents in that collection are not + broadcast to that shard. + +.. [#a/c-as-a-case-of-a] In this example, a :program:`mongos` could + route a query that included ``{ a: 1, c: 1 }`` fields at a specific + subset of shards using the ``{ a: 1 }`` prefix. A :program:`mongos` + cannot route any of the following queries to specific shards + in the cluster: + + .. code-block:: javascript + + { b: 1 } + { c: 1 } + { b: 1, c: 1 } + +.. [#possible-all] :program:`mongos` will route some queries, even + some that include the shard key, to all shards, if needed. + +Sharded Query Response Process +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To route a query to a :term:`cluster `, +:program:`mongos` uses the following process: + +#. Determine the list of :term:`shards ` that must receive the query. + + In some cases, when the :term:`shard key` or a prefix of the shard + key is a part of the query, the :program:`mongos` can route the + query to a subset of the shards. Otherwise, the :program:`mongos` + must direct the query to *all* shards that hold documents for that + collection. + + .. example:: + + Given the following shard key: + + .. code-block:: javascript + + { zipcode: 1, u_id: 1, c_date: 1 } + + Depending on the distribution of chunks in the cluster, the + :program:`mongos` may be able to target the query at a subset of + shards, if the query contains the following fields: + + .. code-block:: javascript + + { zipcode: 1 } + { zipcode: 1, u_id: 1 } + { zipcode: 1, u_id: 1, c_date: 1 } + +#. Establish a cursor on all targeted shards. + + When the first batch of results returns from the cursors: + + a. For query with sorted results (i.e. using + :method:`cursor.sort()`) the :program:`mongos` performs a merge + sort of all queries. + + b. For a query with unsorted results, the :program:`mongos` returns + a result cursor that "round robins" results from all cursors on + the shards. + + .. versionchanged:: 2.0.5 + Before 2.0.5, the :program:`mongos` exhausted each cursor, + one by one. diff --git a/source/administration/sharding-architectures.txt b/source/administration/sharding-architectures.txt index a717bf7425e..b0e4a84a8e1 100644 --- a/source/administration/sharding-architectures.txt +++ b/source/administration/sharding-architectures.txt @@ -9,15 +9,10 @@ Sharded Cluster Architectures .. default-domain:: mongodb This document describes the organization and design of :term:`sharded -cluster` deployments. For documentation of common administrative tasks -related to sharded clusters, see :doc:`/administration/sharding`. For -complete documentation of sharded clusters see the :doc:`/sharding` -section of this manual. +cluster` deployments. -.. seealso:: :ref:`sharding-requirements`. - -Deploying a Test Cluster ------------------------- +Test Cluster +------------ .. warning:: Use this architecture for testing and development only. @@ -25,24 +20,23 @@ You can deploy a very minimal cluster for testing and development. These *non-production* clusters have the following components: -- 1 :ref:`config server `. +- One :ref:`config server `. - At least one :program:`mongod` instance (either :term:`replica sets ` or as a standalone node.) -- 1 :program:`mongos` instance. +- One :program:`mongos` instance. .. _sharding-production-architecture: -Deploying a Production Cluster ------------------------------- +Production Cluster +------------------ -When deploying a production cluster, you must ensure that the -data is redundant and that your systems are highly available. To that -end, a production-level cluster must have the following -components: +In a production cluster, you must ensure that data is redundant and that +your systems are highly available. To that end, a production-level +cluster must have the following components: -- 3 :ref:`config servers `, each residing on a +- Three :ref:`config servers `, each residing on a discrete system. .. note:: @@ -52,21 +46,18 @@ components: multiple shards, you will need to have a group of config servers for each cluster. -- 2 or more :term:`replica sets `, for the :term:`shards +- Two or more :term:`replica sets `, for the :term:`shards `. .. see:: For more information on replica sets see :doc:`/administration/replication-architectures` and :doc:`/replication`. -- :program:`mongos` instances. Typically, you will deploy a single +- Two or more :program:`mongos` instances. Typically, you will deploy a single :program:`mongos` instance on each application server. Alternatively, you may deploy several `mongos` nodes and let your application connect to these via a load balancer. -.. seealso:: :ref:`sharding-procedure-add-shard` and - :ref:`sharding-procedure-remove-shard`. - Sharded and Non-Sharded Data ---------------------------- @@ -77,12 +68,10 @@ deployments some databases and collections will use sharding, while other databases and collections will only reside on a single database instance or replica set (i.e. a :term:`shard`.) -.. note:: - - Regardless of the data architecture of your :term:`sharded cluster`, - ensure that all queries and operations use the :term:`mongos` - router to access the data cluster. Use the :program:`mongos` even - for operations that do not impact the sharded data. +Regardless of the data architecture of your :term:`sharded cluster`, +ensure that all queries and operations use the :term:`mongos` router to +access the data cluster. Use the :program:`mongos` even for operations +that do not impact the sharded data. Every database has a "primary" [#overloaded-primary-term]_ shard that holds all un-sharded collections in that database. All collections @@ -119,7 +108,7 @@ High Availability and MongoDB A :ref:`production ` :term:`cluster` has no single point of failure. This section introduces the -availability concerns for MongoDB deployments, and highlights +availability concerns for MongoDB deployments and highlights potential failure scenarios and available resolutions: - Application servers or :program:`mongos` instances become unavailable. diff --git a/source/administration/sharding-config-server.txt b/source/administration/sharding-config-server.txt new file mode 100644 index 00000000000..ea1509f2301 --- /dev/null +++ b/source/administration/sharding-config-server.txt @@ -0,0 +1,224 @@ +.. index:: config servers; operations +.. _sharding-procedure-config-server: + +========================= +Manage the Config Servers +========================= + +.. default-domain:: mongodb + +:ref:`Config servers ` store all cluster metadata, most importantly, +the mapping from :term:`chunks ` to :term:`shards `. +This section provides an overview of the basic +procedures to migrate, replace, and maintain these servers. + +This page includes the following: + +- :ref:`sharding-config-server-deploy-three` + +- :ref:`sharding-process-config-server-migrate-same-hostname` + +- :ref:`sharding-process-config-server-migrate-different-hostname` + +- :ref:`sharding-config-server-replace` + +- :ref:`sharding-config-server-backup` + +.. _sharding-config-server-deploy-three: + +Deploy Three Config Servers for Production Deployments +------------------------------------------------------ + +For redundancy, all production :term:`sharded clusters ` +should deploy three config servers processes on three different +machines. + +Do not use only a single config server for production deployments. +Only use a single config server deployments for testing. You should +upgrade to three config servers immediately if you are shifting to +production. The following process shows how to convert a test +deployment with only one config server to production deployment with +three config servers. + +#. Shut down all existing MongoDB processes. This includes: + + - all :program:`mongod` instances or :term:`replica sets ` + that provide your shards. + + - the :program:`mongod` instance that provides your existing config + database. + + - all :program:`mongos` instances in your cluster. + +#. Copy the entire :setting:`dbpath` file system tree from the + existing config server to the two machines that will provide the + additional config servers. These commands, issued on the system + with the existing :ref:`config-database`, ``mongo-config0.example.net`` may + resemble the following: + + .. code-block:: sh + + rsync -az /data/configdb mongo-config1.example.net:/data/configdb + rsync -az /data/configdb mongo-config2.example.net:/data/configdb + +#. Start all three config servers, using the same invocation that you + used for the single config server. + + .. code-block:: sh + + mongod --configsvr + +#. Restart all shard :program:`mongod` and :program:`mongos` processes. + +.. _sharding-process-config-server-migrate-same-hostname: + +Migrate Config Servers with the Same Hostname +--------------------------------------------- + +Use this process when you need to migrate a config server to a new +system but the new system will be accessible using the same host +name. + +#. Shut down the config server that you're moving. + + This will render all config data for your cluster :ref:`read only + `. + +#. Change the DNS entry that points to the system that provided the old + config server, so that the *same* hostname points to the new + system. + + How you do this depends on how you organize your DNS and + hostname resolution services. + +#. Move the entire :setting:`dbpath` file system tree from the old + config server to the new config server. This command, issued on the + old config server system, may resemble the following: + + .. code-block:: sh + + rsync -az /data/configdb mongo-config0.example.net:/data/configdb + +#. Start the config instance on the new system. The default invocation + is: + + .. code-block:: sh + + mongod --configsvr + +When you start the third config server, your cluster will become +writable and it will be able to create new splits and migrate chunks +as needed. + +.. _sharding-process-config-server-migrate-different-hostname: + +Migrate Config Servers with Different Hostnames +----------------------------------------------- + +Use this process when you need to migrate a :ref:`config-database` to a new +server and it *will not* be accessible via the same hostname. If +possible, avoid changing the hostname so that you can use the +:ref:`previous procedure `. + +#. Shut down the :ref:`config server ` you're moving. + + This will render all config data for your cluster "read only:" + + .. code-block:: sh + + rsync -az /data/configdb mongodb.config2.example.net:/data/configdb + +#. Start the config instance on the new system. The default invocation + is: + + .. code-block:: sh + + mongod --configsvr + +#. Shut down all existing MongoDB processes. This includes: + + - all :program:`mongod` instances or :term:`replica sets ` + that provide your shards. + + - the :program:`mongod` instances that provide your existing + :ref:`config databases `. + + - all :program:`mongos` instances in your cluster. + +#. Restart all :program:`mongod` processes that provide the shard + servers. + +#. Update the :option:`--configdb ` parameter (or + :setting:`configdb`) for all :program:`mongos` instances and + restart all :program:`mongos` instances. + +.. _sharding-config-server-replace: + +Replace a Config Server +----------------------- + +Use this procedure only if you need to replace one of your config +servers after it becomes inoperable (e.g. hardware failure.) This +process assumes that the hostname of the instance will not change. If +you must change the hostname of the instance, use the process for +:ref:`migrating a config server to a different hostname +`. + +#. Provision a new system, with the same hostname as the previous + host. + + You will have to ensure that the new system has the same IP address + and hostname as the system it's replacing *or* you will need to + modify the DNS records and wait for them to propagate. + +#. Shut down *one* (and only one) of the existing config servers. Copy + all this host's :setting:`dbpath` file system tree from the current system + to the system that will provide the new config server. This + command, issued on the system with the data files, may resemble the + following: + + .. code-block:: sh + + rsync -az /data/configdb mongodb.config2.example.net:/data/configdb + +#. Restart the config server process that you used in the previous + step to copy the data files to the new config server instance. + +#. Start the new config server instance. The default invocation is: + + .. code-block:: sh + + mongod --configsvr + +.. note:: + + In the course of this procedure *never* remove a config server from + the :setting:`configdb` parameter on any of the :program:`mongos` + instances. If you need to change the name of a config server, + always make sure that all :program:`mongos` instances have three + config servers specified in the :setting:`configdb` setting at all + times. + +.. _sharding-config-server-backup: + +Backup Cluster Metadata +----------------------- + +The cluster will remain operational [#read-only]_ without one +of the config database's :program:`mongod` instances, creating a backup +of the cluster metadata from the config database is straight forward: + +#. Shut down one of the :term:`config databases `. + +#. Create a full copy of the data files (i.e. the path specified by + the :setting:`dbpath` option for the config instance.) + +#. Restart the original configuration server. + +.. seealso:: :doc:`backups`. + +.. [#read-only] While one of the three config servers is unavailable, + the cluster cannot split any chunks nor can it migrate chunks + between shards. Your application will be able to write data to the + cluster. The :ref:`sharding-config-server` section of the + documentation provides more information on this topic. diff --git a/source/administration/sharding-troubleshooting.txt b/source/administration/sharding-troubleshooting.txt new file mode 100644 index 00000000000..99505e99a2e --- /dev/null +++ b/source/administration/sharding-troubleshooting.txt @@ -0,0 +1,152 @@ +.. index:: troubleshooting; sharding +.. index:: sharding; troubleshooting +.. _sharding-troubleshooting: + +================================ +Troubleshooting Sharded Clusters +================================ + +.. default-domain:: mongodb + +The two most important factors in maintaining a successful sharded cluster are: + +- :ref:`choosing an appropriate shard key ` and + +- :ref:`sufficient capacity to support current and future operations + `. + +You can prevent most issues encountered with sharding by ensuring that +you choose the best possible :term:`shard key` for your deployment and +ensure that you are always adding additional capacity to your cluster +well before the current resources become saturated. Continue reading +for specific issues you may encounter in a production environment. + +.. _sharding-troubleshooting-not-splitting: + +All Data Remains on One Shard +----------------------------- + +Your cluster must have sufficient data for sharding to make +sense. Sharding works by migrating chunks between the shards until +each shard has roughly the same number of chunks. + +The default chunk size is 64 megabytes. MongoDB will not begin +migrations until the imbalance of chunks in the cluster exceeds the +:ref:`migration threshold `. While the +default chunk size is configurable with the :setting:`chunkSize` +setting, these behaviors help prevent unnecessary chunk migrations, +which can degrade the performance of your cluster as a whole. + +If you have just deployed a sharded cluster, make sure that you have +enough data to make sharding effective. If you do not have sufficient +data to create more than eight 64 megabyte chunks, then all data will +remain on one shard. Either lower the :ref:`chunk size +` setting, or add more data to the cluster. + +As a related problem, the system will split chunks only on +inserts or updates, which means that if you configure sharding and do not +continue to issue insert and update operations, the database will not +create any chunks. You can either wait until your application inserts +data *or* :ref:`split chunks manually `. + +Finally, if your shard key has a low :ref:`cardinality +`, MongoDB may not be able to create +sufficient splits among the data. + +One Shard Receives Too Much Traffic +----------------------------------- + +In some situations, a single shard or a subset of the cluster will +receive a disproportionate portion of the traffic and workload. In +almost all cases this is the result of a shard key that does not +effectively allow :ref:`write scaling `. + +It's also possible that you have "hot chunks." In this case, you may +be able to solve the problem by splitting and then migrating parts of +these chunks. + +In the worst case, you may have to consider re-sharding your data +and :ref:`choosing a different shard key ` +to correct this pattern. + +The Cluster Does Not Balance +---------------------------- + +If you have just deployed your sharded cluster, you may want to +consider the :ref:`troubleshooting suggestions for a new cluster where +data remains on a single shard `. + +If the cluster was initially balanced, but later developed an uneven +distribution of data, consider the following possible causes: + +- You have deleted or removed a significant amount of data from the + cluster. If you have added additional data, it may have a + different distribution with regards to its shard key. + +- Your :term:`shard key` has low :ref:`cardinality ` + and MongoDB cannot split the chunks any further. + +- Your data set is growing faster than the balancer can distribute + data around the cluster. This is uncommon and + typically is the result of: + + - a :ref:`balancing window ` that + is too short, given the rate of data growth. + + - an uneven distribution of :ref:`write operations + ` that requires more data + migration. You may have to choose a different shard key to resolve + this issue. + + - poor network connectivity between shards, which may lead to chunk + migrations that take too long to complete. Investigate your + network configuration and interconnections between shards. + +Migrations Render Cluster Unusable +---------------------------------- + +If migrations impact your cluster or application's performance, +consider the following options, depending on the nature of the impact: + +#. If migrations only interrupt your clusters sporadically, you can + limit the :ref:`balancing window + ` to prevent balancing activity + during peak hours. Ensure that there is enough time remaining to + keep the data from becoming out of balance again. + +#. If the balancer is always migrating chunks to the detriment of + overall cluster performance: + + - You may want to attempt :ref:`decreasing the chunk size ` + to limit the size of the migration. + + - Your cluster may be over capacity, and you may want to attempt to + :ref:`add one or two shards ` to + the cluster to distribute load. + +It's also possible that your shard key causes your +application to direct all writes to a single shard. This kind of +activity pattern can require the balancer to migrate most data soon after writing +it. Consider redeploying your cluster with a shard key that provides +better :ref:`write scaling `. + +Disable Balancing During Backups +-------------------------------- + +If MongoDB migrates a :term:`chunk` during a :doc:`backup +`, you can end with an inconsistent snapshot +of your :term:`sharded cluster`. Never run a backup while the balancer is +active. To ensure that the balancer is inactive during your backup +operation: + +- Set the :ref:`balancing window ` + so that the balancer is inactive during the backup. Ensure that the + backup can complete while you have the balancer disabled. + +- :ref:`manually disable the balancer ` + for the duration of the backup procedure. + +Confirm that the balancer is not active using the +:method:`sh.getBalancerState()` method before starting a backup +operation. When the backup procedure is complete you can reactivate +the balancer process. diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index d218684fac8..216badc4e9b 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -1,1398 +1,30 @@ -.. index:: administration; sharding -.. _sharding-administration: - -============================== -Sharded Cluster Administration -============================== - -.. default-domain:: mongodb - -This document describes common administrative tasks for sharded -clusters. For complete documentation of sharded clusters see the -:doc:`/sharding` section of this manual. - -.. contents:: Sharding Procedures: - :backlinks: none - :local: - -.. _sharding-procedure-setup: - -Set up a Sharded Cluster ------------------------- - -Before deploying a cluster, see :ref:`sharding-requirements`. - -For testing purposes, you can run all the required shard :program:`mongod` processes on a -single server. For production, use the configurations described in -:doc:`/administration/replication-architectures`. - -.. include:: /includes/warning-sharding-hostnames.rst - -If you have an existing replica set, you can use the -:doc:`/tutorial/convert-replica-set-to-replicated-shard-cluster` -tutorial as a guide. If you're deploying a cluster from scratch, see -the :doc:`/tutorial/deploy-shard-cluster` tutorial for more detail or -use the following procedure as a quick starting point: - -1. Create data directories for each of the three (3) config server instances. - -#. Start the three config server instances. For example, to start a - config server instance running on TCP port ``27019`` with the data - stored in ``/data/configdb``, type the following: - - .. code-block:: sh - - mongod --configsvr --dbpath /data/configdb --port 27019 - - For additional command options, see :doc:`/reference/mongod` - and :doc:`/reference/configuration-options`. - - .. include:: /includes/note-config-server-startup.rst - -#. Start a :program:`mongos` instance. For example, to start a - :program:`mongos` that connects to config server instance running on the following hosts: - - - ``mongoc0.example.net`` - - ``mongoc1.example.net`` - - ``mongoc2.example.net`` - - You would issue the following command: - - .. code-block:: sh - - mongos --configdb mongoc0.example.net:27019,mongoc1.example.net:27019,mongoc2.example.net:27019 - -#. Connect to one of the :program:`mongos` instances. For example, if - a :program:`mongos` is accessible at ``mongos0.example.net`` on - port ``27017``, issue the following command: - - .. code-block:: sh - - mongo mongos0.example.net - -#. Add shards to the cluster. - - .. note:: In production deployments, all shards should be replica sets. - - To deploy a replica set, see the - :doc:`/tutorial/deploy-replica-set` tutorial. - - From the :program:`mongo` shell connected - to the :program:`mongos` instance, call the :method:`sh.addShard()` - method for each shard to add to the cluster. - - For example: - - .. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - - If ``mongodb0.example.net:27027`` is a member of a replica - set, call the :method:`sh.addShard()` method with an argument that - resembles the following: - - .. code-block:: javascript - - sh.addShard( "/mongodb0.example.net:27027" ) - - Replace, ```` with the name of the replica set, and - MongoDB will discover all other members of the replica set. - Repeat this step for each new shard in your cluster. - - .. optional:: - - You can specify a name for the shard and a maximum size. See - :dbcommand:`addShard`. - - .. note:: - - .. versionchanged:: 2.0.3 - - Before version 2.0.3, you must specify the shard in the following - form: - - .. code-block:: sh - - replicaSetName/,, - - For example, if the name of the replica set is ``repl0``, then - your :method:`sh.addShard()` command would be: - - .. code-block:: javascript - - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) - -#. Enable sharding for each database you want to shard. - While sharding operates on a per-collection basis, you must enable - sharding for each database that holds collections you want to shard. - This step is a meta-data change and will not redistribute your data. - - MongoDB enables sharding on a per-database basis. This is only a - meta-data change and will not redistribute your data. To enable - sharding for a given database, use the :dbcommand:`enableSharding` - command or the :method:`sh.enableSharding()` shell helper. - - .. code-block:: javascript - - db.runCommand( { enableSharding: } ) - - Or: - - .. code-block:: javascript - - sh.enableSharding() - - .. note:: - - MongoDB creates databases automatically upon their first use. - - Once you enable sharding for a database, MongoDB assigns a - :term:`primary shard` for that database, where MongoDB stores all data - before sharding begins. - -.. _sharding-administration-shard-collection: - -#. Enable sharding on a per-collection basis. - - Finally, you must explicitly specify collections to shard. The - collections must belong to a database for which you have enabled - sharding. When you shard a collection, you also choose the shard - key. To shard a collection, run the :dbcommand:`shardCollection` - command or the :method:`sh.shardCollection()` shell helper. - - .. code-block:: javascript - - db.runCommand( { shardCollection: ".", key: { : 1 } } ) - - Or: - - .. code-block:: javascript - - sh.shardCollection(".", ) - - For example: - - .. code-block:: javascript - - db.runCommand( { shardCollection: "myapp.users", key: { username: 1 } } ) - - Or: - - .. code-block:: javascript - - sh.shardCollection("myapp.users", { username: 1 }) - - The choice of shard key is incredibly important: it affects - everything about the cluster from the efficiency of your queries to - the distribution of data. Furthermore, you cannot change a - collection's shard key after setting it. - - See the :ref:`Shard Key Overview ` and the - more in depth documentation of :ref:`Shard Key Qualities - ` to help you select better shard - keys. - - If you do not specify a shard key, MongoDB will shard the - collection using the ``_id`` field. - -Cluster Management ------------------- - -This section outlines procedures for adding and remove shards, as well -as general monitoring and maintenance of a :term:`sharded cluster`. - -.. _sharding-procedure-add-shard: - -Add a Shard to a Cluster -~~~~~~~~~~~~~~~~~~~~~~~~ - -To add a shard to an *existing* sharded cluster, use the following -procedure: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. First, you need to tell the cluster where to find the individual - shards. You can do this using the :dbcommand:`addShard` command or - the :method:`sh.addShard()` helper: - - .. code-block:: javascript - - sh.addShard( ":" ) - - Replace ```` and ```` with the hostname and TCP - port number of where the shard is accessible. - Alternately specify a :term:`replica set` name and at least one - hostname which is a member of the replica set. - - For example: - - .. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - - .. note:: In production deployments, all shards should be replica sets. - - Repeat for each shard in your cluster. - - .. optional:: - - You may specify a "name" as an argument to the - :dbcommand:`addShard` command, as follows: - - .. code-block:: javascript - - db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) - - You cannot specify a name for a shard using the - :method:`sh.addShard()` helper in the :program:`mongo` shell. If - you use the helper or do not specify a shard name, then MongoDB - will assign a name upon creation. - - .. versionchanged:: 2.0.3 - Before version 2.0.3, you must specify the shard in the - following form: the replica set name, followed by a forward - slash, followed by a comma-separated list of seeds for the - replica set. For example, if the name of the replica set is - "myapp1", then your :method:`sh.addShard()` command might resemble: - - .. code-block:: javascript - - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) - -.. note:: - - It may take some time for :term:`chunks ` to migrate to the - new shard. - - For an introduction to balancing, see :ref:`sharding-balancing`. For - lower level information on balancing, see :ref:`sharding-balancing-internals`. - -.. _sharding-procedure-remove-shard: - -Remove a Shard from a Cluster -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To remove a :term:`shard` from a :term:`sharded cluster`, you must: - -- Migrate :term:`chunks ` to another shard or database. - -- Ensure that this shard is not the :term:`primary shard` for any databases in - the cluster. If it is, move the "primary" status for these databases - to other shards. - -- Finally, remove the shard from the cluster's configuration. - -.. note:: - - To successfully migrate data from a shard, the :term:`balancer` - process **must** be active. - -The procedure to remove a shard is as follows: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Determine the name of the shard you will be removing. - - You must specify the name of the shard. You may have specified this - shard name when you first ran the :dbcommand:`addShard` command. If not, - you can find out the name of the shard by running the - :dbcommand:`listShards` or :dbcommand:`printShardingStatus` - commands or the :method:`sh.status()` shell helper. - - The following examples will remove a shard named ``mongodb0`` from the cluster. - -#. Begin removing chunks from the shard. - - Start by running the :dbcommand:`removeShard` command. This will - start "draining" or migrating chunks from the shard you're removing - to another shard in the cluster. - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - This operation will return the following response immediately: - - .. code-block:: javascript - - { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } - - Depending on your network capacity and the amount of data in the - shard, this operation can take anywhere from a few minutes to several - days to complete. - -#. View progress of the migration. - - You can run the :dbcommand:`removeShard` command again at any stage of the - process to view the progress of the migration, as follows: - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - The output should look something like this: - - .. code-block:: javascript - - { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } - - In the ``remaining`` sub-document ``{ chunks: xx, dbs: y }``, a - counter displays the remaining number of chunks that MongoDB must - migrate to other shards and the number of MongoDB databases that have - "primary" status on this shard. - - Continue checking the status of the :dbcommand:`removeShard` command - until the remaining number of chunks to transfer is 0. - -#. Move any databases to other shards in the cluster as needed. - - This is only necessary when removing a shard that is also the - :term:`primary shard` for one or more databases. - - Issue the following command at the :program:`mongo` shell: - - .. code-block:: javascript - - db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) - - This command will migrate all remaining non-sharded data in the - database named ``myapp`` to the shard named ``mongodb1``. - - .. warning:: - - Do not run the :dbcommand:`movePrimary` command until you have *finished* - draining the shard. - - The command will not return until MongoDB completes moving all - data. The response from this command will resemble the following: - - .. code-block:: javascript - - { "primary" : "mongodb1", "ok" : 1 } - -#. Run :dbcommand:`removeShard` again to clean up all metadata - information and finalize the shard removal, as follows: - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - When successful, this command will return a document like this: - - .. code-block:: javascript - - { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } - -Once the value of the ``stage`` field is "completed," you may safely -stop the processes comprising the ``mongodb0`` shard. - -List Databases with Sharding Enabled -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To list the databases that have sharding enabled, query the -``databases`` collection in the :ref:`config-database`. -A database has sharding enabled if the value of the ``partitioned`` -field is ``true``. Connect to a :program:`mongos` instance with a -:program:`mongo` shell, and run the following operation to get a full -list of databases with sharding enabled: - -.. code-block:: javascript - - use config - db.databases.find( { "partitioned": true } ) - -.. example:: You can use the following sequence of commands when to - return a list of all databases in the cluster: - - .. code-block:: javascript - - use config - db.databases.find() - - If this returns the following result set: - - .. code-block:: javascript - - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" } - { "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" } - - Then sharding is only enabled for the ``animals`` database. - -List Shards -~~~~~~~~~~~ - -To list the current set of configured shards, use the :dbcommand:`listShards` -command, as follows: - -.. code-block:: javascript - - use admin - db.runCommand( { listShards : 1 } ) - -View Cluster Details -~~~~~~~~~~~~~~~~~~~~ - -To view cluster details, issue :method:`db.printShardingStatus()` or -:method:`sh.status()`. Both methods return the same output. - -.. example:: In the following example output from :method:`sh.status()` - - - ``sharding version`` displays the version number of the shard - metadata. - - - ``shards`` displays a list of the :program:`mongod` instances - used as shards in the cluster. - - - ``databases`` displays all databases in the cluster, - including database that do not have sharding enabled. - - - The ``chunks`` information for the ``foo`` database displays how - many chunks are on each shard and displays the range of each chunk. - - .. code-block:: javascript - - --- Sharding Status --- - sharding version: { "_id" : 1, "version" : 3 } - shards: - { "_id" : "shard0000", "host" : "m0.example.net:30001" } - { "_id" : "shard0001", "host" : "m3.example2.net:50000" } - databases: - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "animals", "partitioned" : true, "primary" : "shard0000" } - foo.big chunks: - shard0001 1 - shard0000 6 - { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo - { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo - { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo - { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) - { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - foo.large chunks: - shard0001 1 - shard0000 5 - { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) - { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo - { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) - { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } - -Chunk Management ----------------- - -This section describes various operations on :term:`chunks ` in -:term:`sharded clusters `. MongoDB automates most -chunk management operations. However, these chunk management -operations are accessible to administrators for use in some -situations, typically surrounding initial setup, deployment, and data -ingestion. - -.. _sharding-procedure-create-split: - -Split Chunks -~~~~~~~~~~~~ - -Normally, MongoDB splits a :term:`chunk` following inserts when a -chunk exceeds the :ref:`chunk size `. The -:term:`balancer` may migrate recently split chunks to a new shard -immediately if :program:`mongos` predicts future insertions will -benefit from the move. - -MongoDB treats all chunks the same, whether split manually or -automatically by the system. - -.. warning:: - - You cannot merge or combine chunks once you have split them. - -You may want to split chunks manually if: - -- you have a large amount of data in your cluster and very few - :term:`chunks `, - as is the case after deploying a cluster using existing data. - -- you expect to add a large amount of data that would - initially reside in a single chunk or shard. - -.. example:: - - You plan to insert a large amount of data with :term:`shard key` - values between ``300`` and ``400``, *but* all values of your shard - keys are between ``250`` and ``500`` are in a single chunk. - -Use :method:`sh.status()` to determine the current chunks ranges across -the cluster. - -To split chunks manually, use the :dbcommand:`split` command with -operators: ``middle`` and ``find``. The equivalent shell helpers are -:method:`sh.splitAt()` or :method:`sh.splitFind()`. - -.. example:: - - The following command will split the chunk that contains - the value of ``63109`` for the ``zipcode`` field in the ``people`` - collection of the ``records`` database: - - .. code-block:: javascript - - sh.splitFind( "records.people", { "zipcode": 63109 } ) - -:method:`sh.splitFind()` will split the chunk that contains the -*first* document returned that matches this query into two equally -sized chunks. You must specify the full namespace -(i.e. "``.``") of the sharded collection to -:method:`sh.splitFind()`. The query in :method:`sh.splitFind()` need -not contain the shard key, though it almost always makes sense to -query for the shard key in this case, and including the shard key will -expedite the operation. - -Use :method:`sh.splitAt()` to split a chunk in two using the queried -document as the partition point: - -.. code-block:: javascript - - sh.splitAt( "records.people", { "zipcode": 63109 } ) - -However, the location of the document that this query finds with -respect to the other documents in the chunk does not affect how the -chunk splits. - -.. _sharding-administration-pre-splitting: -.. _sharding-administration-create-chunks: - -Create Chunks (Pre-Splitting) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In most situations a :term:`sharded cluster` will create and distribute -chunks automatically without user intervention. However, in a limited -number of use profiles, MongoDB cannot create enough chunks or -distribute data fast enough to support required throughput. Consider -the following scenarios: - -- you must partition an existing data collection that resides on a - single shard. - -- you must ingest a large volume of data into a cluster that - isn't balanced, or where the ingestion of data will lead to an - imbalance of data. - - This can arise in an initial data loading, or in a case where you - must insert a large volume of data into a single chunk, as is the - case when you must insert at the beginning or end of the chunk - range, as is the case for monotonically increasing or decreasing - shard keys. - -Preemptively splitting chunks increases cluster throughput for these -operations, by reducing the overhead of migrating chunks that hold -data during the write operation. MongoDB only creates splits after an -insert operation, and can only migrate a single chunk at a time. Chunk -migrations are resource intensive and further complicated by large -write volume to the migrating chunk. - -.. warning:: - - You can only pre-split an empty collection. When you enable - sharding for a collection that contains data MongoDB automatically - creates splits. Subsequent attempts to create splits manually, can - lead to unpredictable chunk ranges and sizes as well as inefficient - or ineffective balancing behavior. - -To create and migrate chunks manually, use the following procedure: - -#. Split empty chunks in your collection by manually performing - :dbcommand:`split` command on chunks. - - .. example:: - - To create chunks for documents in the ``myapp.users`` - collection, using the ``email`` field as the :term:`shard key`, - use the following operation in the :program:`mongo` shell: - - .. code-block:: javascript - - for ( var x=97; x<97+26; x++ ){ - for( var y=97; y<97+26; y+=6 ) { - var prefix = String.fromCharCode(x) + String.fromCharCode(y); - db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); - } - } - - This assumes a collection size of 100 million documents. - -#. Migrate chunks manually using the :dbcommand:`moveChunk` command: - - .. example:: - - To migrate all of the manually created user profiles evenly, - putting each prefix chunk on the next shard from the other, run - the following commands in the mongo shell: - - .. code-block:: javascript - - var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ]; - for ( var x=97; x<97+26; x++ ){ - for( var y=97; y<97+26; y+=6 ) { - var prefix = String.fromCharCode(x) + String.fromCharCode(y); - db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]}) - } - } - - You can also let the balancer automatically distribute the new - chunks. For an introduction to balancing, see - :ref:`sharding-balancing`. For lower level information on balancing, - see :ref:`sharding-balancing-internals`. - -.. _sharding-balancing-modify-chunk-size: - -Modify Chunk Size -~~~~~~~~~~~~~~~~~ - -When you initialize a sharded cluster, the default chunk size is 64 -megabytes. This default chunk size works well for most deployments. However, if you -notice that automatic migrations are incurring a level of I/O that -your hardware cannot handle, you may want to reduce the chunk -size. For the automatic splits and migrations, a small chunk size -leads to more rapid and frequent migrations. - -To modify the chunk size, use the following procedure: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Issue the following :method:`save() ` - operation: - - .. code-block:: javascript - - db.settings.save( { _id:"chunksize", value: } ) - - Where the value of ```` reflects the new chunk size in - megabytes. Here, you're essentially writing a document whose values - store the global chunk size configuration value. - -.. note:: - - The :setting:`chunkSize` and :option:`--chunkSize ` - options, passed at runtime to the :program:`mongos` **do not** - affect the chunk size after you have initialized the cluster. - - To eliminate confusion you should *always* set chunk size using the - above procedure and never use the runtime options. - -Modifying the chunk size has several limitations: - -- Automatic splitting only occurs when inserting :term:`documents - ` or updating existing documents. - -- If you lower the chunk size it may take time for all chunks to split to - the new size. - -- Splits cannot be "undone." - -If you increase the chunk size, existing chunks must grow through -insertion or updates until they reach the new size. - -.. _sharding-balancing-manual-migration: - -Migrate Chunks -~~~~~~~~~~~~~~ - -In most circumstances, you should let the automatic balancer -migrate :term:`chunks ` between :term:`shards `. -However, you may want to migrate chunks manually in a few cases: - -- If you create chunks by :term:`pre-splitting` the data in your - collection, you will have to migrate chunks manually to distribute - chunks evenly across the shards. Use pre-splitting in limited - situations, to support bulk data ingestion. - -- If the balancer in an active cluster cannot distribute chunks within - the balancing window, then you will have to migrate chunks manually. - -For more information on how chunks move between shards, see -:ref:`sharding-balancing-internals`, in particular the section -:ref:`sharding-chunk-migration`. - -To migrate chunks, use the :dbcommand:`moveChunk` command. - -.. note:: - - To return a list of shards, use the :dbcommand:`listShards` - command. - - Specify shard names using the :dbcommand:`addShard` command - using the ``name`` argument. If you do not specify a name in the - :dbcommand:`addShard` command, MongoDB will assign a name - automatically. - -The following example assumes that the field ``username`` is the -:term:`shard key` for a collection named ``users`` in the ``myapp`` -database, and that the value ``smith`` exists within the :term:`chunk` -you want to migrate. - -To move this chunk, you would issue the following command from a :program:`mongo` -shell connected to any :program:`mongos` instance. - -.. code-block:: javascript - - db.adminCommand({moveChunk : "myapp.users", find : {username : "smith"}, to : "mongodb-shard3.example.net"}) - -This command moves the chunk that includes the shard key value "smith" to the -:term:`shard` named ``mongodb-shard3.example.net``. The command will -block until the migration is complete. - -See :ref:`sharding-administration-create-chunks` for an introduction -to pre-splitting. - -.. versionadded:: 2.2 - :dbcommand:`moveChunk` command has the: ``_secondaryThrottle`` - parameter. When set to ``true``, MongoDB ensures that - :term:`secondary` members have replicated operations before allowing - new chunk migrations. - -.. warning:: - - The :dbcommand:`moveChunk` command may produce the following error - message: - - .. code-block:: none - - The collection's metadata lock is already taken. - - These errors occur when clients have too many open :term:`cursors - ` that access the chunk you are migrating. You can either - wait until the cursors complete their operation or close the - cursors manually. - - .. todo:: insert link to killing a cursor. - -.. index:: bulk insert -.. _sharding-bulk-inserts: - -Strategies for Bulk Inserts in Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. todo:: Consider moving to the administrative guide as it's of an - applied nature, or create an applications document for sharding - -.. todo:: link the words "bulk insert" to the bulk insert topic when - it's published - - -Large bulk insert operations including initial data ingestion or -routine data import, can have a significant impact on a :term:`sharded -cluster`. Consider the following strategies and possibilities for -bulk insert operations: - -- If the collection does not have data, then there is only one - :term:`chunk`, which must reside on a single shard. MongoDB must - receive data, create splits, and distribute chunks to the available - shards. To avoid this performance cost, you can pre-split the - collection, as described in :ref:`sharding-administration-pre-splitting`. - -- You can parallelize import processes by sending insert operations to - more than one :program:`mongos` instance. If the collection is - empty, pre-split first, as described in - :ref:`sharding-administration-pre-splitting`. - -- If your shard key increases monotonically during an insert then all - the inserts will go to the last chunk in the collection, which will - always end up on a single shard. Therefore, the insert capacity of the - cluster will never exceed the insert capacity of a single shard. - - If your insert volume is never larger than what a single shard can - process, then there is no problem; however, if the insert volume - exceeds that range, and you cannot avoid a monotonically - increasing shard key, then consider the following modifications to - your application: - - - Reverse all the bits of the shard key to preserve the information - while avoiding the correlation of insertion order and increasing - sequence of values. - - - Swap the first and last 16-bit words to "shuffle" the inserts. - - .. example:: The following example, in C++, swaps the leading and - trailing 16-bit word of :term:`BSON` :term:`ObjectIds ` - generated so that they are no longer monotonically increasing. - - .. code-block:: cpp - - using namespace mongo; - OID make_an_id() { - OID x = OID::gen(); - const unsigned char *p = x.getData(); - swap( (unsigned short&) p[0], (unsigned short&) p[10] ); - return x; - } - - void foo() { - // create an object - BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); - // now we might insert o into a sharded collection... - } - - For information on choosing a shard key, see :ref:`sharding-shard-key` - and see :ref:`Shard Key Internals ` (in - particular, :ref:`sharding-internals-operations-and-reliability` and - :ref:`sharding-internals-choose-shard-key`). - -.. index:: balancing; operations -.. _sharding-balancing-operations: - -Balancer Operations -------------------- - -This section describes provides common administrative procedures related -to balancing. For an introduction to balancing, see -:ref:`sharding-balancing`. For lower level information on balancing, see -:ref:`sharding-balancing-internals`. - -.. _sharding-balancing-check-lock: - -Check the Balancer Lock -~~~~~~~~~~~~~~~~~~~~~~~ - -To see if the balancer process is active in your :term:`cluster -`, do the following: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Use the following query to return the balancer lock: - - .. code-block:: javascript - - db.locks.find( { _id : "balancer" } ).pretty() - -When this command returns, you will see output like the following: - -.. code-block:: javascript - - { "_id" : "balancer", - "process" : "mongos0.example.net:1292810611:1804289383", - "state" : 2, - "ts" : ObjectId("4d0f872630c42d1978be8a2e"), - "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)", - "who" : "mongos0.example.net:1292810611:1804289383:Balancer:846930886", - "why" : "doing balance round" } - - -This output confirms that: - -- The balancer originates from the :program:`mongos` running on the - system with the hostname ``mongos0.example.net``. - -- The value in the ``state`` field indicates that a :program:`mongos` - has the lock. For version 2.0 and later, the value of an active lock - is ``2``; for earlier versions the value is ``1``. - -.. optional:: - - You can also use the following shell helper, which returns a - boolean to report if the balancer is active: - - .. code-block:: javascript - - sh.getBalancerState() - -.. _sharding-schedule-balancing-window: - -Schedule the Balancing Window -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In some situations, particularly when your data set grows slowly and a -migration can impact performance, it's useful to be able to ensure -that the balancer is active only at certain times. Use the following -procedure to specify a window during which the :term:`balancer` will -be able to migrate chunks: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Use an operation modeled on the following example :method:`update() - ` operation to modify the balancer's - window: - - .. code-block:: javascript - - db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "", stop : "" } } }, true ) - - Replace ```` and ```` with time values using - two digit hour and minute values (e.g ``HH:MM``) that describe the - beginning and end boundaries of the balancing window. - These times will be evaluated relative to the time zone of each individual - :program:`mongos` instance in the sharded cluster. - For instance, running the following - will force the balancer to run between 11PM and 6AM local time only: - - .. code-block:: javascript - - db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:00" } } }, true ) - -.. note:: - - The balancer window must be sufficient to *complete* the migration - of all data inserted during the day. - - As data insert rates can change based on activity and usage - patterns, it is important to ensure that the balancing window you - select will be sufficient to support the needs of your deployment. - -Remove a Balancing Window Schedule -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If you have :ref:`set the balancing window -` and wish to remove the schedule -so that the balancer is always running, issue the following sequence -of operations: - -.. code-block:: javascript - - use config - db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true }) - -.. _sharding-balancing-disable-temporally: - -Disable the Balancer -~~~~~~~~~~~~~~~~~~~~ - -By default the balancer may run at any time and only moves chunks as -needed. To disable the balancer for a short period of time and prevent -all migration, use the following procedure: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue *one* of the following operations to disable the balancer: - - .. code-block:: javascript - - sh.stopBalancer() - -#. Later, issue *one* the following operations to enable the balancer: - - .. code-block:: javascript - - sh.startBalancer() - -.. note:: - - If a migration is in progress, the system will complete - the in-progress migration. After disabling, you can use the - following operation in the :program:`mongo` shell to determine if - there are no migrations in progress: - - .. code-block:: javascript - - use config - while( db.locks.findOne({_id: "balancer"}).state ) { - print("waiting..."); sleep(1000); - } - - -The above process and the :method:`sh.setBalancerState()`, -:method:`sh.startBalancer()`, and :method:`sh.stopBalancer()` helpers provide -wrappers on the following process, which may be useful if you need to -run this operation from a driver that does not have helper functions: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Issue the following update to disable the balancer: - - .. code-block:: javascript - - db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ); - -#. To enable the balancer again, alter the value of "stopped" as follows: - - .. code-block:: javascript - - db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true ); - -.. index:: config servers; operations -.. _sharding-procedure-config-server: - -Config Server Maintenance -------------------------- - -Config servers store all cluster metadata, most importantly, -the mapping from :term:`chunks ` to :term:`shards `. -This section provides an overview of the basic -procedures to migrate, replace, and maintain these servers. - -.. seealso:: :ref:`sharding-config-server` - -Deploy Three Config Servers for Production Deployments -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -For redundancy, all production :term:`sharded clusters ` -should deploy three config servers processes on three different -machines. - -Do not use only a single config server for production deployments. -Only use a single config server deployments for testing. You should -upgrade to three config servers immediately if you are shifting to -production. The following process shows how to convert a test -deployment with only one config server to production deployment with -three config servers. - -#. Shut down all existing MongoDB processes. This includes: - - - all :program:`mongod` instances or :term:`replica sets ` - that provide your shards. - - - the :program:`mongod` instance that provides your existing config - database. - - - all :program:`mongos` instances in your cluster. - -#. Copy the entire :setting:`dbpath` file system tree from the - existing config server to the two machines that will provide the - additional config servers. These commands, issued on the system - with the existing :ref:`config-database`, ``mongo-config0.example.net`` may - resemble the following: - - .. code-block:: sh - - rsync -az /data/configdb mongo-config1.example.net:/data/configdb - rsync -az /data/configdb mongo-config2.example.net:/data/configdb - -#. Start all three config servers, using the same invocation that you - used for the single config server. - - .. code-block:: sh - - mongod --configsvr - -#. Restart all shard :program:`mongod` and :program:`mongos` processes. - -.. _sharding-process-config-server-migrate-same-hostname: - -Migrate Config Servers with the Same Hostname -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Use this process when you need to migrate a config server to a new -system but the new system will be accessible using the same host -name. - -#. Shut down the config server that you're moving. - - This will render all config data for your cluster :ref:`read only - `. - -#. Change the DNS entry that points to the system that provided the old - config server, so that the *same* hostname points to the new - system. - - How you do this depends on how you organize your DNS and - hostname resolution services. - -#. Move the entire :setting:`dbpath` file system tree from the old - config server to the new config server. This command, issued on the - old config server system, may resemble the following: - - .. code-block:: sh - - rsync -az /data/configdb mongo-config0.example.net:/data/configdb - -#. Start the config instance on the new system. The default invocation - is: - - .. code-block:: sh - - mongod --configsvr - -When you start the third config server, your cluster will become -writable and it will be able to create new splits and migrate chunks -as needed. - -.. _sharding-process-config-server-migrate-different-hostname: - -Migrate Config Servers with Different Hostnames -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Use this process when you need to migrate a :ref:`config-database` to a new -server and it *will not* be accessible via the same hostname. If -possible, avoid changing the hostname so that you can use the -:ref:`previous procedure `. - -#. Shut down the :ref:`config server ` you're moving. - - This will render all config data for your cluster "read only:" - - .. code-block:: sh - - rsync -az /data/configdb mongodb.config2.example.net:/data/configdb - -#. Start the config instance on the new system. The default invocation - is: - - .. code-block:: sh - - mongod --configsvr - -#. Shut down all existing MongoDB processes. This includes: - - - all :program:`mongod` instances or :term:`replica sets ` - that provide your shards. - - - the :program:`mongod` instances that provide your existing - :ref:`config databases `. - - - all :program:`mongos` instances in your cluster. - -#. Restart all :program:`mongod` processes that provide the shard - servers. - -#. Update the :option:`--configdb ` parameter (or - :setting:`configdb`) for all :program:`mongos` instances and - restart all :program:`mongos` instances. - -Replace a Config Server -~~~~~~~~~~~~~~~~~~~~~~~ - -Use this procedure only if you need to replace one of your config -servers after it becomes inoperable (e.g. hardware failure.) This -process assumes that the hostname of the instance will not change. If -you must change the hostname of the instance, use the process for -:ref:`migrating a config server to a different hostname -`. - -#. Provision a new system, with the same hostname as the previous - host. - - You will have to ensure that the new system has the same IP address - and hostname as the system it's replacing *or* you will need to - modify the DNS records and wait for them to propagate. - -#. Shut down *one* (and only one) of the existing config servers. Copy - all this host's :setting:`dbpath` file system tree from the current system - to the system that will provide the new config server. This - command, issued on the system with the data files, may resemble the - following: - - .. code-block:: sh - - rsync -az /data/configdb mongodb.config2.example.net:/data/configdb - -#. Restart the config server process that you used in the previous - step to copy the data files to the new config server instance. - -#. Start the new config server instance. The default invocation is: - - .. code-block:: sh - - mongod --configsvr - -.. note:: - - In the course of this procedure *never* remove a config server from - the :setting:`configdb` parameter on any of the :program:`mongos` - instances. If you need to change the name of a config server, - always make sure that all :program:`mongos` instances have three - config servers specified in the :setting:`configdb` setting at all - times. - -Backup Cluster Metadata -~~~~~~~~~~~~~~~~~~~~~~~ - -The cluster will remain operational [#read-only]_ without one -of the config database's :program:`mongod` instances, creating a backup -of the cluster metadata from the config database is straight forward: - -#. Shut down one of the :term:`config databases `. - -#. Create a full copy of the data files (i.e. the path specified by - the :setting:`dbpath` option for the config instance.) - -#. Restart the original configuration server. - -.. seealso:: :doc:`backups`. - -.. [#read-only] While one of the three config servers is unavailable, - the cluster cannot split any chunks nor can it migrate chunks - between shards. Your application will be able to write data to the - cluster. The :ref:`sharding-config-server` section of the - documentation provides more information on this topic. - -.. index:: troubleshooting; sharding -.. index:: sharding; troubleshooting -.. _sharding-troubleshooting: - -Troubleshooting ---------------- - -The two most important factors in maintaining a successful sharded cluster are: - -- :ref:`choosing an appropriate shard key ` and - -- :ref:`sufficient capacity to support current and future operations - `. - -You can prevent most issues encountered with sharding by ensuring that -you choose the best possible :term:`shard key` for your deployment and -ensure that you are always adding additional capacity to your cluster -well before the current resources become saturated. Continue reading -for specific issues you may encounter in a production environment. - -.. _sharding-troubleshooting-not-splitting: - -All Data Remains on One Shard -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Your cluster must have sufficient data for sharding to make -sense. Sharding works by migrating chunks between the shards until -each shard has roughly the same number of chunks. - -The default chunk size is 64 megabytes. MongoDB will not begin -migrations until the imbalance of chunks in the cluster exceeds the -:ref:`migration threshold `. While the -default chunk size is configurable with the :setting:`chunkSize` -setting, these behaviors help prevent unnecessary chunk migrations, -which can degrade the performance of your cluster as a whole. - -If you have just deployed a sharded cluster, make sure that you have -enough data to make sharding effective. If you do not have sufficient -data to create more than eight 64 megabyte chunks, then all data will -remain on one shard. Either lower the :ref:`chunk size -` setting, or add more data to the cluster. - -As a related problem, the system will split chunks only on -inserts or updates, which means that if you configure sharding and do not -continue to issue insert and update operations, the database will not -create any chunks. You can either wait until your application inserts -data *or* :ref:`split chunks manually `. - -Finally, if your shard key has a low :ref:`cardinality -`, MongoDB may not be able to create -sufficient splits among the data. - -One Shard Receives Too Much Traffic -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In some situations, a single shard or a subset of the cluster will -receive a disproportionate portion of the traffic and workload. In -almost all cases this is the result of a shard key that does not -effectively allow :ref:`write scaling `. - -It's also possible that you have "hot chunks." In this case, you may -be able to solve the problem by splitting and then migrating parts of -these chunks. - -In the worst case, you may have to consider re-sharding your data -and :ref:`choosing a different shard key ` -to correct this pattern. - -The Cluster Does Not Balance -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If you have just deployed your sharded cluster, you may want to -consider the :ref:`troubleshooting suggestions for a new cluster where -data remains on a single shard `. - -If the cluster was initially balanced, but later developed an uneven -distribution of data, consider the following possible causes: - -- You have deleted or removed a significant amount of data from the - cluster. If you have added additional data, it may have a - different distribution with regards to its shard key. - -- Your :term:`shard key` has low :ref:`cardinality ` - and MongoDB cannot split the chunks any further. - -- Your data set is growing faster than the balancer can distribute - data around the cluster. This is uncommon and - typically is the result of: - - - a :ref:`balancing window ` that - is too short, given the rate of data growth. - - - an uneven distribution of :ref:`write operations - ` that requires more data - migration. You may have to choose a different shard key to resolve - this issue. - - - poor network connectivity between shards, which may lead to chunk - migrations that take too long to complete. Investigate your - network configuration and interconnections between shards. - -Migrations Render Cluster Unusable -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If migrations impact your cluster or application's performance, -consider the following options, depending on the nature of the impact: - -#. If migrations only interrupt your clusters sporadically, you can - limit the :ref:`balancing window - ` to prevent balancing activity - during peak hours. Ensure that there is enough time remaining to - keep the data from becoming out of balance again. - -#. If the balancer is always migrating chunks to the detriment of - overall cluster performance: - - - You may want to attempt :ref:`decreasing the chunk size ` - to limit the size of the migration. - - - Your cluster may be over capacity, and you may want to attempt to - :ref:`add one or two shards ` to - the cluster to distribute load. - -It's also possible that your shard key causes your -application to direct all writes to a single shard. This kind of -activity pattern can require the balancer to migrate most data soon after writing -it. Consider redeploying your cluster with a shard key that provides -better :ref:`write scaling `. - -Disable Balancing During Backups -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If MongoDB migrates a :term:`chunk` during a :doc:`backup -`, you can end with an inconsistent snapshot -of your :term:`sharded cluster`. Never run a backup while the balancer is -active. To ensure that the balancer is inactive during your backup -operation: - -- Set the :ref:`balancing window ` - so that the balancer is inactive during the backup. Ensure that the - backup can complete while you have the balancer disabled. - -- :ref:`manually disable the balancer ` - for the duration of the backup procedure. - -Confirm that the balancer is not active using the -:method:`sh.getBalancerState()` method before starting a backup -operation. When the backup procedure is complete you can reactivate -the balancer process. +======================= +Sharding Administration +======================= + +This page lists administrative documentation for sharding. + +.. toctree:: + :maxdepth: 2 + + core/sharding + administration/sharded-clusters + core/sharding-requirements + administration/sharding-architectures + core/sharding-security + tutorial/deploy-shard-cluster + tutorial/manage-shards + tutorial/add-shards-to-shard-cluster + tutorial/remove-shards-from-cluster + tutorial/manage-chunks + tutorial/manage-balancer + administration/sharding-config-server + tutorial/enforce-unique-keys-for-sharded-collections + tutorial/convert-replica-set-to-replicated-shard-cluster + tutorial/backup-small-sharded-cluster-with-mongodump + tutorial/backup-sharded-cluster-with-filesystem-snapshots + tutorial/backup-sharded-cluster-with-database-dumps + tutorial/restore-single-shard + tutorial/restore-sharded-cluster + tutorial/schedule-backup-window-for-sharded-clusters + administration/sharding-troubleshooting diff --git a/source/core/sharding-requirements.txt b/source/core/sharding-requirements.txt new file mode 100644 index 00000000000..b36bb67aec1 --- /dev/null +++ b/source/core/sharding-requirements.txt @@ -0,0 +1,162 @@ +.. index:: sharding; requirements +.. _sharding-requirements: + +============================ +Sharded Cluster Requirements +============================ + +.. default-domain:: mongodb + +This page includes the following: + +- :ref:`sharding-requirements-when-to-use-sharding` + +- :ref:`sharding-requirements-infrastructure` + +- :ref:`sharding-requirements-data` + +- :ref:`sharding-localhost` + +.. _sharding-requirements-when-to-use-sharding: + +When to Use Sharding +-------------------- + +While sharding is a powerful and compelling feature, it comes with +significant :ref:`sharding-requirements-infrastructure` +and some limited complexity costs. As a result, use +sharding only as necessary, and when indicated by actual operational +requirements. Consider the following overview of indications it may be +time to consider sharding. + +You should consider deploying a :term:`sharded cluster`, if: + +- your data set approaches or exceeds the storage capacity of a single + node in your system. + +- the size of your system's active :term:`working set` *will soon* + exceed the capacity of the *maximum* amount of RAM for your system. + +- your system has a large amount of write activity, a single + MongoDB instance cannot write data fast enough to meet demand, and + all other approaches have not reduced contention. + +If these attributes are not present in your system, sharding will only +add additional complexity to your system without providing much +benefit. When designing your data model, if you will eventually need a +sharded cluster, consider which collections you will want to shard and +the corresponding shard keys. + +.. _sharding-capacity-planning: + +.. warning:: + + It takes time and resources to deploy sharding, and if your system + has *already* reached or exceeded its capacity, you will have a + difficult time deploying sharding without impacting your + application. + + As a result, if you think you will need to partition your database + in the future, **do not** wait until your system is overcapacity to + enable sharding. + +.. _sharding-requirements-infrastructure: + +Infrastructure Requirements +--------------------------- + +A :term:`sharded cluster` has the following components: + +- Three :term:`config servers `. + + These special :program:`mongod` instances store the metadata for the + cluster. The :program:`mongos` instances cache this data and use it + to determine which :term:`shard` is responsible for which + :term:`chunk`. + + For development and testing purposes you may deploy a cluster with a single + configuration server process, but always use exactly three config + servers for redundancy and safety in production. + +- Two or more shards. Each shard consists of one or more :program:`mongod` + instances that store the data for the shard. + + These "normal" :program:`mongod` instances hold all of the + actual data for the cluster. + + Typically each shard is a :term:`replica sets `. Each + replica set consists of multiple :program:`mongod` instances. The members + of the replica set provide redundancy and high available for the data in each shard. + + .. warning:: + + MongoDB enables data :term:`partitioning `, or + sharding, on a *per collection* basis. You *must* access all data + in a sharded cluster via the :program:`mongos` instances as below. + If you connect directly to a :program:`mongod` in a sharded cluster + you will see its fraction of the cluster's data. The data on any + given shard may be somewhat random: MongoDB provides no guarantee + that any two contiguous chunks will reside on a single shard. + +- One or more :program:`mongos` instances. + + These instance direct queries from the application layer to the + shards that hold the data. The :program:`mongos` instances have no + persistent state or data files and only cache metadata in RAM from + the config servers. + + .. note:: + + In most situations :program:`mongos` instances use minimal + resources, and you can run them on your application servers + without impacting application performance. However, if you use + the :term:`aggregation framework` some processing may occur on + the :program:`mongos` instances, causing that :program:`mongos` + to require more system resources. + +.. _sharding-requirements-data: + +Data Requirements +----------------- + +Your cluster must manage a significant quantity of data for sharding +to have an effect on your collection. The default :term:`chunk` size +is 64 megabytes, and the :ref:`balancer +` will not begin moving data until the imbalance +of chunks in the cluster exceeds the :ref:`migration threshold +`. + +Practically, this means that unless your cluster has many hundreds of +megabytes of data, chunks will remain on a single shard. + +While there are some exceptional situations where you may need to +shard a small collection of data, most of the time the additional +complexity added by sharding the small collection is not worth the additional +complexity and overhead unless +you need additional concurrency or capacity for some reason. If you +have a small data set, usually a properly configured +single MongoDB instance or replica set will be more than sufficient +for your persistence layer needs. + +:term:`Chunk ` size is :option:`user configurable `. +However, the default value is of 64 megabytes is ideal +for most deployments. See the :ref:`sharding-chunk-size` section in the +:doc:`sharding-internals` document for more information. + +.. index:: sharding; localhost +.. _sharding-localhost: + +Restrictions on the Use of "localhost" +-------------------------------------- + +Because all components of a :term:`sharded cluster` must communicate +with each other over the network, there are special restrictions +regarding the use of localhost addresses: + +If you use either "localhost" or "``127.0.0.1``" as the host +identifier, then you must use "localhost" or "``127.0.0.1``" for *all* +host settings for any MongoDB instances in the cluster. This applies +to both the ``host`` argument to :dbcommand:`addShard` and the value +to the :option:`mongos --configdb` run time option. If you mix +localhost addresses with remote host address, MongoDB will produce +errors. diff --git a/source/core/sharding-security.txt b/source/core/sharding-security.txt new file mode 100644 index 00000000000..74d8b1f09bd --- /dev/null +++ b/source/core/sharding-security.txt @@ -0,0 +1,63 @@ +.. index:: sharding; security +.. _sharding-security: + +======================== +Sharded Cluster Security +======================== + +.. default-domain:: mongodb + +.. todo:: migrate this content to /administration/security.txt when that + document exists. See DOCS-79 for tracking this document. + +.. note:: + + You should always run all :program:`mongod` components in trusted + networking environments that control access to the cluster using + network rules and restrictions to ensure that only known traffic + reaches your :program:`mongod` and :program:`mongos` instances. + +.. warning:: Limitations + + .. versionchanged:: 2.2 + Read only authentication is fully supported in shard + clusters. Previously, in version 2.0, sharded clusters would not + enforce read-only limitations. + + .. versionchanged:: 2.0 + Sharded clusters support authentication. Previously, in version + 1.8, sharded clusters will not support authentication and access + control. You must run your sharded systems in trusted + environments. + +To control access to a sharded cluster, you must set the +:setting:`keyFile` option on all components of the sharded cluster. Use +the :option:`--keyFile ` run-time option or the +:setting:`keyFile` configuration option for all :program:`mongos`, +configuration instances, and shard :program:`mongod` instances. + +There are two classes of security credentials in a sharded cluster: +credentials for "admin" users (i.e. for the :term:`admin database`) and +credentials for all other databases. These credentials reside in +different locations within the cluster and have different roles: + +#. Admin database credentials reside on the config servers, to receive + admin access to the cluster you *must* authenticate a session while + connected to a :program:`mongos` instance using the + :term:`admin database`. + +#. Other database credentials reside on the *primary* shard for the + database. + +This means that you *can* authenticate to these users and databases +while connected directly to the primary shard for a database. However, +for clarity and consistency all interactions between the client and +the database should use a :program:`mongos` instance. + +.. note:: + + Individual shards can store administrative credentials to their + instance, which only permit access to a single shard. MongoDB + stores these credentials in the shards' :term:`admin databases ` and these + credentials are *completely* distinct from the cluster-wide + administrative credentials. diff --git a/source/core/sharding.txt b/source/core/sharding.txt index a06e4cba27d..27730096b10 100644 --- a/source/core/sharding.txt +++ b/source/core/sharding.txt @@ -1,229 +1,59 @@ .. index:: fundamentals; sharding .. _sharding-fundamentals: -===================== -Sharding Fundamentals -===================== - -.. default-domain:: mongodb - -This document provides an overview of the fundamental concepts and -operations of sharding with MongoDB. For a list of all sharding -documentation see :doc:`/sharding`. - -MongoDB's sharding system allows users to :term:`partition` a -:term:`collection` within a database to distribute the collection's documents -across a number of :program:`mongod` instances or :term:`shards `. -Sharding increases write capacity, provides the ability to -support larger working sets, and raises the limits of total data size beyond -the physical resources of a single node. - +================= Sharding Overview ------------------ - -Features -~~~~~~~~ - -With sharding MongoDB automatically distributes data among a -collection of :program:`mongod` instances. Sharding, as implemented in -MongoDB has the following features: - -.. glossary:: - - Range-based Data Partitioning - MongoDB distributes documents among :term:`shards ` based - on the value of the :ref:`shard key `. Each - :term:`chunk` represents a block of :term:`documents ` - with values that fall within a specific range. When chunks grow - beyond the :ref:`chunk size `, MongoDB - divides the chunks into smaller chunks (i.e. :term:`splitting - `) based on the shard key. - - Automatic Data Volume Distribution - The sharding system automatically balances data across the - cluster without intervention from the application - layer. Effective automatic sharding depends on a well chosen - :ref:`shard key `, but requires no - additional complexity, modifications, or intervention from - developers. - - Transparent Query Routing - Sharding is completely transparent to the application layer, - because all connections to a cluster go through - :program:`mongos`. Sharding in MongoDB requires some - :ref:`basic initial configuration `, - but ongoing function is entirely transparent to the application. - - Horizontal Capacity - Sharding increases capacity in two ways: - - #. Effective partitioning of data can provide additional write - capacity by distributing the write load over a number of - :program:`mongod` instances. - - #. Given a shard key with sufficient :ref:`cardinality - `, partitioning data allows - users to increase the potential amount of data to manage - with MongoDB and expand the :term:`working set`. - -A typical :term:`sharded cluster` consists of: - -- 3 config servers that store metadata. The metadata maps :term:`chunks - ` to shards. - -- More than one :term:`replica sets ` that hold - data. These are the :term:`shards `. - -- A number of lightweight routing processes, called :doc:`mongos - ` instances. The :program:`mongos` process routes - operations to the correct shard based the cluster configuration. - -When to Use Sharding -~~~~~~~~~~~~~~~~~~~~ - -While sharding is a powerful and compelling feature, it comes with -significant :ref:`sharding-requirements-infrastructure` -and some limited complexity costs. As a result, use -sharding only as necessary, and when indicated by actual operational -requirements. Consider the following overview of indications it may be -time to consider sharding. - -You should consider deploying a :term:`sharded cluster`, if: - -- your data set approaches or exceeds the storage capacity of a single - node in your system. - -- the size of your system's active :term:`working set` *will soon* - exceed the capacity of the *maximum* amount of RAM for your system. - -- your system has a large amount of write activity, a single - MongoDB instance cannot write data fast enough to meet demand, and - all other approaches have not reduced contention. - -If these attributes are not present in your system, sharding will only -add additional complexity to your system without providing much -benefit. When designing your data model, if you will eventually need a -sharded cluster, consider which collections you will want to shard and -the corresponding shard keys. - -.. _sharding-capacity-planning: - -.. warning:: - - It takes time and resources to deploy sharding, and if your system - has *already* reached or exceeded its capacity, you will have a - difficult time deploying sharding without impacting your - application. - - As a result, if you think you will need to partition your database - in the future, **do not** wait until your system is overcapacity to - enable sharding. - -.. index:: sharding; requirements -.. _sharding-requirements: - -Sharding Requirements ---------------------- - -.. _sharding-requirements-infrastructure: - -Infrastructure Requirements -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -A :term:`sharded cluster` has the following components: - -- Three :term:`config servers `. - - These special :program:`mongod` instances store the metadata for the - cluster. The :program:`mongos` instances cache this data and use it - to determine which :term:`shard` is responsible for which - :term:`chunk`. - - For development and testing purposes you may deploy a cluster with a single - configuration server process, but always use exactly three config - servers for redundancy and safety in production. - -- Two or more shards. Each shard consists of one or more :program:`mongod` - instances that store the data for the shard. - - These "normal" :program:`mongod` instances hold all of the - actual data for the cluster. - - Typically each shard is a :term:`replica sets `. Each - replica set consists of multiple :program:`mongod` instances. The members - of the replica set provide redundancy and high available for the data in each shard. +================= - .. warning:: - - MongoDB enables data :term:`partitioning `, or - sharding, on a *per collection* basis. You *must* access all data - in a sharded cluster via the :program:`mongos` instances as below. - If you connect directly to a :program:`mongod` in a sharded cluster - you will see its fraction of the cluster's data. The data on any - given shard may be somewhat random: MongoDB provides no guarantee - that any two contiguous chunks will reside on a single shard. - -- One or more :program:`mongos` instances. - - These instance direct queries from the application layer to the - shards that hold the data. The :program:`mongos` instances have no - persistent state or data files and only cache metadata in RAM from - the config servers. - - .. note:: - - In most situations :program:`mongos` instances use minimal - resources, and you can run them on your application servers - without impacting application performance. However, if you use - the :term:`aggregation framework` some processing may occur on - the :program:`mongos` instances, causing that :program:`mongos` - to require more system resources. - -Data Requirements -~~~~~~~~~~~~~~~~~ - -Your cluster must manage a significant quantity of data for sharding -to have an effect on your collection. The default :term:`chunk` size -is 64 megabytes, [#chunk-size]_ and the :ref:`balancer -` will not begin moving data until the imbalance -of chunks in the cluster exceeds the :ref:`migration threshold -`. - -Practically, this means that unless your cluster has many hundreds of -megabytes of data, chunks will remain on a single shard. - -While there are some exceptional situations where you may need to -shard a small collection of data, most of the time the additional -complexity added by sharding the small collection is not worth the additional -complexity and overhead unless -you need additional concurrency or capacity for some reason. If you -have a small data set, usually a properly configured -single MongoDB instance or replica set will be more than sufficient -for your persistence layer needs. - -.. [#chunk-size] :term:`chunk` size is :option:`user configurable - `. However, the default value is of 64 - megabytes is ideal for most deployments. See the - :ref:`sharding-chunk-size` section in the - :doc:`sharding-internals` document for more information. - -.. index:: sharding; localhost -.. _sharding-localhost: - -Sharding and "localhost" Addresses -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Because all components of a :term:`sharded cluster` must communicate -with each other over the network, there are special restrictions -regarding the use of localhost addresses: +.. default-domain:: mongodb -If you use either "localhost" or "``127.0.0.1``" as the host -identifier, then you must use "localhost" or "``127.0.0.1``" for *all* -host settings for any MongoDB instances in the cluster. This applies -to both the ``host`` argument to :dbcommand:`addShard` and the value -to the :option:`mongos --configdb` run time option. If you mix -localhost addresses with remote host address, MongoDB will produce -errors. +Sharding is MongoDB’s approach to scaling out. Sharding partitions +database collections and stores the different portions of a collection +on different machines. When your collections become too large for +existing storage, you need only add a new machine and MongoDB +automatically distributes data to the new server. + +Sharding automatically balances the data and load across machines. By +splitting a collection across multiple machines, sharding increases +write capacity, provides the ability to support larger working sets, and +raises the limits of total data size beyond the physical resources of a +single node. + +How Sharding Works +------------------ + +Sharding is enabled on a per-database basis. Once a database is enabled +for sharding, you then choose which collections to shard. + +MongoDB distributes documents among :term:`shards ` based on the +value of the :ref:`shard key `. Each :term:`chunk` +represents a block of :term:`documents ` with values that fall +within a specific range. When chunks grow beyond the :ref:`chunk size +`, MongoDB divides the chunks into smaller chunks +(i.e. :term:`splitting `) based on the shard key. + +The sharding system automatically balances data across the cluster +without intervention from the application layer. Effective automatic +sharding depends on a well chosen :ref:`shard key `, +but requires no additional complexity, modifications, or intervention +from developers. + +Sharding is completely transparent to the application layer, because all +connections to a cluster go through :program:`mongos`. Sharding in +MongoDB requires some :ref:`basic initial configuration +`, but ongoing function is entirely +transparent to the application. + +Sharding increases capacity in two ways: + +#. Effective partitioning of data can provide additional write + capacity by distributing the write load over a number of + :program:`mongod` instances. + +#. Given a shard key with sufficient :ref:`cardinality + `, partitioning data allows users to + increase the potential amount of data to manage with MongoDB and + expand the :term:`working set`. .. index:: shard key single: sharding; shard key @@ -234,15 +64,13 @@ errors. Shard Keys ---------- -.. todo:: link this section to - -"Shard keys" refer to the :term:`field` that exists in every -:term:`document` in a collection that MongoDB uses to distribute -documents among the :term:`shards `. Shard keys, like +MongoDB distributes documents among :term:`shards ` based on +:term:`shard keys `. A shard key must be :term:`field` that +exists in every :term:`document` in the collection. Shard keys, like :term:`indexes `, can be either a single field, or may be a compound key, consisting of multiple fields. -Remember, MongoDB's sharding is range-based: each :term:`chunk` holds +MongoDB's sharding is range-based. Each :term:`chunk` holds documents having specific range of values for the "shard key". Thus, choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster. @@ -278,252 +106,6 @@ the optimal key. In those situations, computing a special purpose shard key into an additional field or using a compound shard key may help produce one that is more ideal. -.. index:: sharding; config servers -.. index:: config servers -.. _sharding-config-server: - -Config Servers --------------- - -Config servers maintain the shard metadata in a config -database. The :term:`config database ` stores -the relationship between :term:`chunks ` and where they reside -within a :term:`sharded cluster`. Without a config database, the -:program:`mongos` instances would be unable to route queries or write -operations within the cluster. - -Config servers *do not* run as replica sets. Instead, a :term:`cluster -` operates with a group of *three* config servers that use a -two-phase commit process that ensures immediate consistency and -reliability. - -For testing purposes you may deploy a cluster with a single -config server, but this is not recommended for production. - -.. warning:: - - If your cluster has a single config server, this - :program:`mongod` is a single point of failure. If the instance is - inaccessible the cluster is not accessible. If you cannot recover - the data on a config server, the cluster will be inoperable. - - **Always** use three config servers for production deployments. - -The actual load on configuration servers is small because each -:program:`mongos` instances maintains a cached copy of the configuration -database. MongoDB only writes data to the config server to: - -- create splits in existing chunks, which happens as data in - existing chunks exceeds the maximum chunk size. - -- migrate a chunk between shards. - -Additionally, all config servers must be available on initial setup -of a sharded cluster, each :program:`mongos` instance must be able -to write to the ``config.version`` collection. - -If one or two configuration instances become unavailable, the -cluster's metadata becomes *read only*. It is still possible to read -and write data from the shards, but no chunk migrations or splits will -occur until all three servers are accessible. At the same time, config -server data is only read in the following situations: - -- A new :program:`mongos` starts for the first time, or an existing - :program:`mongos` restarts. - -- After a chunk migration, the :program:`mongos` instances update - themselves with the new cluster metadata. - -If all three config servers are inaccessible, you can continue to use -the cluster as long as you don't restart the :program:`mongos` -instances until after config servers are accessible again. If you -restart the :program:`mongos` instances and there are no accessible -config servers, the :program:`mongos` would be unable to direct -queries or write operations to the cluster. - -Because the configuration data is small relative to the amount of data -stored in a cluster, the amount of activity is relatively low, and 100% -up time is not required for a functioning sharded cluster. As a result, -backing up the config servers is not difficult. Backups of config -servers are critical as clusters become totally inoperable when -you lose all configuration instances and data. Precautions to ensure -that the config servers remain available and intact are critical. - -.. note:: - - Configuration servers store metadata for a single sharded cluster. - You must have a separate configuration server or servers for each - cluster you administer. - -.. index:: mongos -.. _sharding-mongos: -.. _sharding-read-operations: - -:program:`mongos` and Querying ------------------------------- - -.. seealso:: :doc:`/reference/mongos` and the :program:`mongos`\-only - settings: :setting:`test` and :setting:`chunkSize`. - -Operations -~~~~~~~~~~ - -The :program:`mongos` provides a single unified interface to a sharded -cluster for applications using MongoDB. Except for the selection of a -:term:`shard key`, application developers and administrators need not -consider any of the :doc:`internal details of sharding `. - -:program:`mongos` caches data from the :ref:`config server -`, and uses this to route operations from -applications and clients to the :program:`mongod` instances. -:program:`mongos` have no *persistent* state and consume -minimal system resources. - -The most common practice is to run :program:`mongos` instances on the -same systems as your application servers, but you can maintain -:program:`mongos` instances on the shards or on other dedicated -resources. - -.. note:: - - .. versionchanged:: 2.1 - - Some aggregation operations using the :dbcommand:`aggregate` - command (i.e. :method:`db.collection.aggregate()`,) will cause - :program:`mongos` instances to require more CPU resources than in - previous versions. This modified performance profile may dictate - alternate architecture decisions if you use the :term:`aggregation - framework` extensively in a sharded environment. - -.. _sharding-query-routing: - -Routing -~~~~~~~ - -:program:`mongos` uses information from :ref:`config servers -` to route operations to the cluster as -efficiently as possible. In general, operations in a sharded -environment are either: - -1. Targeted at a single shard or a limited group of shards based on - the shard key. - -2. Broadcast to all shards in the cluster that hold documents in a - collection. - -When possible you should design your operations to be as targeted as -possible. Operations have the following targeting characteristics: - -- Query operations broadcast to all shards [#namespace-exception]_ - **unless** the :program:`mongos` can determine which shard or shard - stores this data. - - For queries that include the shard key, :program:`mongos` can target - the query at a specific shard or set of shards, if the portion - of the shard key included in the query is a *prefix* of the shard - key. For example, if the shard key is: - - .. code-block:: javascript - - { a: 1, b: 1, c: 1 } - - The :program:`mongos` *can* route queries that include the full - shard key or either of the following shard key prefixes at a - specific shard or set of shards: - - .. code-block:: javascript - - { a: 1 } - { a: 1, b: 1 } - - Depending on the distribution of data in the cluster and the - selectivity of the query, :program:`mongos` may still have to - contact multiple shards [#possible-all]_ to fulfill these queries. - -- All :method:`insert() ` operations target to - one shard. - -- All single :method:`update() ` operations - target to one shard. This includes :term:`upsert` operations. - -- The :program:`mongos` broadcasts multi-update operations to every - shard. - -- The :program:`mongos` broadcasts :method:`remove() - ` operations to every shard unless the - operation specifies the shard key in full. - -While some operations must broadcast to all shards, you can improve -performance by using as many targeted operations as possible by -ensuring that your operations include the shard key. - -.. [#namespace-exception] If a shard does not store chunks from a - given collection, queries for documents in that collection are not - broadcast to that shard. - -.. [#a/c-as-a-case-of-a] In this example, a :program:`mongos` could - route a query that included ``{ a: 1, c: 1 }`` fields at a specific - subset of shards using the ``{ a: 1 }`` prefix. A :program:`mongos` - cannot route any of the following queries to specific shards - in the cluster: - - .. code-block:: javascript - - { b: 1 } - { c: 1 } - { b: 1, c: 1 } - -.. [#possible-all] :program:`mongos` will route some queries, even - some that include the shard key, to all shards, if needed. - -Sharded Query Response Process -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To route a query to a :term:`cluster `, -:program:`mongos` uses the following process: - -#. Determine the list of :term:`shards ` that must receive the query. - - In some cases, when the :term:`shard key` or a prefix of the shard - key is a part of the query, the :program:`mongos` can route the - query to a subset of the shards. Otherwise, the :program:`mongos` - must direct the query to *all* shards that hold documents for that - collection. - - .. example:: - - Given the following shard key: - - .. code-block:: javascript - - { zipcode: 1, u_id: 1, c_date: 1 } - - Depending on the distribution of chunks in the cluster, the - :program:`mongos` may be able to target the query at a subset of - shards, if the query contains the following fields: - - .. code-block:: javascript - - { zipcode: 1 } - { zipcode: 1, u_id: 1 } - { zipcode: 1, u_id: 1, c_date: 1 } - -#. Establish a cursor on all targeted shards. - - When the first batch of results returns from the cursors: - - a. For query with sorted results (i.e. using - :method:`cursor.sort()`) the :program:`mongos` performs a merge - sort of all queries. - - b. For a query with unsorted results, the :program:`mongos` returns - a result cursor that "round robins" results from all cursors on - the shards. - - .. versionchanged:: 2.0.5 - Before 2.0.5, the :program:`mongos` exhausted each cursor, - one by one. - .. index:: balancing .. _sharding-balancing: @@ -559,64 +141,3 @@ balancing process from impacting production traffic. is entirely transparent to the user and application layer. This documentation is only included for your edification and possible troubleshooting purposes. - -.. index:: sharding; security -.. _sharding-security: - -Security Considerations for Sharded Clusters --------------------------------------------- - -.. todo:: migrate this content to /administration/security.txt when that - document exists. See DOCS-79 for tracking this document. - -.. note:: - - You should always run all :program:`mongod` components in trusted - networking environments that control access to the cluster using - network rules and restrictions to ensure that only known traffic - reaches your :program:`mongod` and :program:`mongos` instances. - -.. warning:: Limitations - - .. versionchanged:: 2.2 - Read only authentication is fully supported in shard - clusters. Previously, in version 2.0, sharded clusters would not - enforce read-only limitations. - - .. versionchanged:: 2.0 - Sharded clusters support authentication. Previously, in version - 1.8, sharded clusters will not support authentication and access - control. You must run your sharded systems in trusted - environments. - -To control access to a sharded cluster, you must set the -:setting:`keyFile` option on all components of the sharded cluster. Use -the :option:`--keyFile ` run-time option or the -:setting:`keyFile` configuration option for all :program:`mongos`, -configuration instances, and shard :program:`mongod` instances. - -There are two classes of security credentials in a sharded cluster: -credentials for "admin" users (i.e. for the :term:`admin database`) and -credentials for all other databases. These credentials reside in -different locations within the cluster and have different roles: - -#. Admin database credentials reside on the config servers, to receive - admin access to the cluster you *must* authenticate a session while - connected to a :program:`mongos` instance using the - :term:`admin database`. - -#. Other database credentials reside on the *primary* shard for the - database. - -This means that you *can* authenticate to these users and databases -while connected directly to the primary shard for a database. However, -for clarity and consistency all interactions between the client and -the database should use a :program:`mongos` instance. - -.. note:: - - Individual shards can store administrative credentials to their - instance, which only permit access to a single shard. MongoDB - stores these credentials in the shards' :term:`admin databases ` and these - credentials are *completely* distinct from the cluster-wide - administrative credentials. diff --git a/source/faq/sharding.txt b/source/faq/sharding.txt index 6592cea0370..16b4d7a2e9b 100644 --- a/source/faq/sharding.txt +++ b/source/faq/sharding.txt @@ -44,8 +44,7 @@ Can I change the shard key after sharding a collection? No. There is no automatic support in MongoDB for changing a shard key -after :ref:`sharding a collection -`. This reality underscores +after sharding a collection. This reality underscores the important of choosing a good :ref:`shard key `. If you *must* change a shard key after sharding a collection, the best option is to: diff --git a/source/reference/commands.txt b/source/reference/commands.txt index 1e1617f175a..dbfd092a486 100644 --- a/source/reference/commands.txt +++ b/source/reference/commands.txt @@ -77,35 +77,6 @@ includes the relevant :program:`mongo` shell helpers. See User Commands ------------- -.. _sharding-commands: - -Sharding Commands -~~~~~~~~~~~~~~~~~ - -.. seealso:: :doc:`/sharding` for more information about MongoDB's - sharding functionality. - -.. include:: command/addShard.txt - :start-after: mongodb - -.. include:: command/listShards.txt - :start-after: mongodb - -.. include:: command/enableSharding.txt - :start-after: mongodb - -.. include:: command/shardCollection.txt - :start-after: mongodb - -.. include:: command/shardingState.txt - :start-after: mongodb - -.. include:: command/removeShard.txt - :start-after: mongodb - -.. include:: command/printShardingStatus.txt - :start-after: mongodb - Aggregation Commands ~~~~~~~~~~~~~~~~~~~~ diff --git a/source/reference/sharding-commands.txt b/source/reference/sharding-commands.txt new file mode 100644 index 00000000000..68e3a984322 --- /dev/null +++ b/source/reference/sharding-commands.txt @@ -0,0 +1,33 @@ +.. _sharding-commands: + +================= +Sharding Commands +================= + +.. default-domain:: mongodb + +The following database commands support :term:`sharded clusters `. + +.. seealso:: :doc:`/sharding` for more information about MongoDB's + sharding functionality. + +.. include:: command/addShard.txt + :start-after: mongodb + +.. include:: command/listShards.txt + :start-after: mongodb + +.. include:: command/enableSharding.txt + :start-after: mongodb + +.. include:: command/shardCollection.txt + :start-after: mongodb + +.. include:: command/shardingState.txt + :start-after: mongodb + +.. include:: command/removeShard.txt + :start-after: mongodb + +.. include:: command/printShardingStatus.txt + :start-after: mongodb diff --git a/source/sharding.txt b/source/sharding.txt index e4c58e87a62..f12feadfd5a 100644 --- a/source/sharding.txt +++ b/source/sharding.txt @@ -4,56 +4,47 @@ Sharding .. _sharding-background: -Sharding distributes a -single logical database system across a cluster of machines. Sharding -uses range-based portioning to distribute :term:`documents ` -based on a specific :term:`shard key`. +Sharding distributes a single logical database system across a cluster +of machines. Sharding uses range-based portioning to distribute +:term:`documents ` based on a specific :term:`shard key`. -This page lists the documents, tutorials, and reference pages that -describe sharding. - -For an overview, see :doc:`/core/sharding`. To configure, maintain, and -troubleshoot sharded clusters, see :doc:`/administration/sharding`. -For deployment architectures, see :doc:`/administration/sharding-architectures`. -For details on the internal operations of sharding, see :doc:`/core/sharding-internals`. -For procedures for performing certain sharding tasks, see the -:ref:`Tutorials ` list. - -Documentation -------------- - -The following is the outline of the main documentation: +Sharding Concepts +----------------- .. toctree:: - :maxdepth: 2 + :maxdepth: 1 core/sharding - administration/sharding + administration/sharded-clusters + core/sharding-requirements administration/sharding-architectures - core/sharding-internals + core/sharding-security -.. index:: tutorials; sharding -.. _sharding-tutorials: +Set Up Sharded Clusters +----------------------- -Tutorials ---------- +.. toctree:: + :maxdepth: 1 -The following tutorials describe specific sharding procedures: + tutorial/deploy-shard-cluster -Deploying Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~ +Manage Sharded Clusters +----------------------- .. toctree:: :maxdepth: 1 - tutorial/deploy-shard-cluster + tutorial/manage-shards tutorial/add-shards-to-shard-cluster tutorial/remove-shards-from-cluster + tutorial/manage-chunks + tutorial/manage-balancer + administration/sharding-config-server tutorial/enforce-unique-keys-for-sharded-collections tutorial/convert-replica-set-to-replicated-shard-cluster -Backups for Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Backup Sharded Clusters +----------------------- .. toctree:: :maxdepth: 1 @@ -65,13 +56,23 @@ Backups for Sharded Clusters tutorial/restore-sharded-cluster tutorial/schedule-backup-window-for-sharded-clusters -.. _sharding-reference: +FAQs, Advanced Concepts, and Troubleshooting +-------------------------------------------- + +.. toctree:: + :maxdepth: 1 + + faq/sharding + core/sharding-internals + administration/sharding-troubleshooting Reference --------- -- :ref:`sharding-commands` -- :doc:`/faq/sharding` +.. toctree:: + :maxdepth: 1 + + reference/sharding-commands .. STUB tutorial/replace-one-configuration-server-in-a-shard-cluster .. STUB tutorial/replace-all-configuration-servers-in-a-shard-cluster diff --git a/source/tutorial/deploy-shard-cluster.txt b/source/tutorial/deploy-shard-cluster.txt index 335a5e48371..e7eca5e9670 100644 --- a/source/tutorial/deploy-shard-cluster.txt +++ b/source/tutorial/deploy-shard-cluster.txt @@ -1,194 +1,226 @@ +.. _sharding-procedure-setup: + ======================== -Deploy a Sharded Cluster +Set Up a Sharded Cluster ======================== .. default-domain:: mongodb -This document describes how to deploy a :term:`sharded cluster` for a -standalone :program:`mongod` instance. To deploy a cluster for an -existing replica set, see -:doc:`/tutorial/convert-replica-set-to-replicated-shard-cluster`. +The topics on this page are an ordered sequence of the tasks you perform +to set up a sharded cluster. To set up a sharded cluster, follow the +ordered sequence of tasks: + +- :ref:`sharding-setup-start-cfgsrvr` + +- :ref:`sharding-setup-start-mongos` + +- :ref:`sharding-setup-add-shards` + +- :ref:`sharding-setup-enable-sharding` -Procedure ---------- +- :ref:`sharding-setup-shard-collection` -Before deploying a sharded cluster, see the requirements listed in -:ref:`Requirements for Sharded Clusters `. +Before setting up a cluster, see the following: + +- :ref:`sharding-requirements`. + +- :doc:`/administration/replication-architectures` .. include:: /includes/warning-sharding-hostnames.rst +.. _sharding-setup-start-cfgsrvr: + Start the Config Server Database Instances -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------------------ -The config server database processes are small -:program:`mongod` instances that store the cluster's metadata. You must -have exactly *three* instances in production deployments. Each stores a complete copy -of the cluster's metadata. These instances should run on different servers -to assure good uptime and data safety. +The config server processes are :program:`mongod` instances that store +the cluster's metadata. You designate a :program:`mongod` as a config +server using the :option:`--configsvr ` option. Each +config server stores a complete copy of the cluster's metadata. -Since config database :program:`mongod` instances receive relatively -little traffic and demand only a small portion of system resources, you -can run the instances on systems that run other cluster components. +In production deployments, you must deploy exactly three config server +instances, each running on different servers to assure good uptime and +data safety. In test environments, you can run all three instances on a +single server. -By default a :program:`mongod` :option:`--configsvr ` process stores its data files -in the `/data/configdb` directory. You can specify a different -location using the :setting:`dbpath` run-time option. The config :program:`mongod` instance -is accessible via port ``27019``. In addition to :setting:`configsvr`, -use other :program:`mongod` -:doc:`runtime options ` as needed. +Config server instances receive relatively little traffic and demand +only a small portion of system resources. Therefore, you can run an +instance on a system that runs other cluster components. -To create a data directory for each config server, issue a command -similar to the following for each: +1. Create data directories for each of the three config server + instances. By default, a config server stores its data files in the + `/data/configdb` directory. You can choose a different location. To + create a data directory, issue a command similar to the following: -.. code-block:: sh + .. code-block:: sh - mkdir /data/db/config + mkdir /data/db/config -To start each config server, issue a command similar to the following -for each: +#. Start the three config server instances. Start each by issuing a + command using the following syntax: -.. code-block:: sh + .. code-block:: sh + + mongod --configsvr --dbpath --port + + The default port for config servers is ``27019``. You can specify a + different port. The following example starts a config server using + the default port and default data directory: + + .. code-block:: sh - mongod --configsvr --dbpath --port + mongod --configsvr --dbpath /data/configdb --port 27019 + + For additional command options, see :doc:`/reference/mongod` or + :doc:`/reference/configuration-options`. + + .. include:: /includes/note-config-server-startup.rst + +.. _sharding-setup-start-mongos: Start the ``mongos`` Instances -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------ + +The :program:`mongos` instances are lightweight and do not require data +directories. You can run a :program:`mongos` instance on a system that +runs other cluster components, such as on an application server or a +server running a :program:`mongod` process. By default, a +:program:`mongos` instance runs on port ``27017``. -The :program:`mongos` instance routes queries and operations -to the appropriate shards and interacts with the config server instances. -All client operations targeting a cluster go through :program:`mongos` -instances. +When you start the :program:`mongos` instance, specify the hostnames of +the three config servers, either in the configuration file or as command +line parameters. For operational flexibility, use DNS names for the +config servers rather than explicit IP addresses. If you're not using +resolvable hostname, you cannot change the config server names or IP +addresses without a restarting *every* :program:`mongos` and +:program:`mongod` instance. -:program:`mongos` instances are lightweight and do not require data directories. -A cluster typically -has several instances. For example, you might run one :program:`mongos` -instance on each of your application servers, or you might run a :program:`mongos` instance -on each of the servers running a :program:`mongod` process. +To start a :program:`mongos` instance, issue a command using the following syntax: -You must the specify resolvable hostnames [#names]_ for the *3* config servers -when starting the :program:`mongos` instance. You specify the hostnames either in the -configuration file or as command line parameters. +.. code-block:: sh -The :program:`mongos` instance runs on the default MongoDB TCP port: -``27017``. + mongos --configdb -To start :program:`mongos` instance running on the -``mongos0.example.net`` host, that connects to the config server -instances running on the following hosts: +For example, to start a :program:`mongos` that connects to config server +instance running on the following hosts and on the default ports: -- ``mongoc0.example.net`` -- ``mongoc1.example.net`` -- ``mongoc2.example.net`` +- ``cfg0.example.net`` +- ``cfg1.example.net`` +- ``cfg2.example.net`` You would issue the following command: .. code-block:: sh - mongos --configdb mongoc0.example.net,mongoc1.example.net,mongoc2.example.net + mongos --configdb cfg0.example.net:27019,cfg1.example.net:27019,cfg2.example.net:27019 -.. [#names] Use DNS names for the config servers rather than explicit - IP addresses for operational flexibility. If you're not using resolvable - hostname, - you cannot change the config server names or IP addresses - without a restarting *every* :program:`mongos` and - :program:`mongod` instance. +.. _sharding-setup-add-shards: Add Shards to the Cluster -~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------- -You must deploy at least one :term:`shard` or one :term:`replica set` to -begin. In a production cluster, each shard is a replica set. You -may add additional shards to a running cluster later. For instructions -on deploying replica sets, see :doc:`/tutorial/deploy-replica-set`. +A :term:`shard` can be a standalone :program:`mongod` or or a +:term:`replica set`. In a production environment, each shard +should be a replica set. -This procedure assumes you have two active and initiated replica sets -and describes how to add the first two shards to the cluster. +1. From a :program:`mongo` shell, connect to the :program:`mongos` + instance. Issue a command using the following syntax: -First, connect to one of the :program:`mongos` instances. For example, -if a :program:`mongos` is accessible at ``mongos0.example.net`` on -port ``27017``, issue the following command: + .. code-block:: sh -.. code-block:: sh + mongo --host --port - mongo mongos0.example.net + For example, if a :program:`mongos` is accessible at + ``mongos0.example.net`` on port ``27017``, issue the following + command: -Then, from a :program:`mongo` shell connected to the :program:`mongos` -instance, call the :method:`sh.addShard()` method for each shard that -you want to add to the cluster: + .. code-block:: sh -.. code-block:: javascript + mongo --host mongos0.example.net --port 27017 - sh.addShard( "s0/sfo30.example.net" ) - sh.addShard( "s1/sfo40.example.net" ) +#. Add shards to the cluster using the :method:`sh.addShard()` method + issued from the :program:`mongo` shell. Issue :method:`sh.addShard()` + separately for each shard in the cluster. If the shard is a replica + set, specify the name of the set and one member of the set. + :program:`mongos` will discover the names of other members of the + set. In production deployments, all shards should be replica sets. -If the host you are adding is a member of a replica set, you -*must* specify the name of the replica set. :program:`mongos` -will discover the names of other members of the replica set based on -the name and the hostname you provide. + The following example adds a shard for a standalone :program:`mongod`. + To optionally specify a name for the shard or a maximum size, see + :dbcommand:`addShard`: -These operations add two shards, provided by: + .. code-block:: javascript -- the replica set named ``s0``, that includes the - ``sfo30.example.net`` host. + sh.addShard( "mongodb0.example.net:27027" ) -- the replica set name and ``s1``, that includes the - ``sfo40.example.net`` host. + The following example adds a shard for a replica set + named ``rs1``: -.. admonition:: All shards should be replica sets + .. code-block:: javascript - .. versionchanged:: 2.0.3 + sh.addShard( "rs1/mongodb0.example.net:27027" ) - After version 2.0.3, you may use the above form to add replica - sets to a cluster. The cluster will automatically discover - the other members of the replica set and note their names - accordingly. + .. note:: - Before version 2.0.3, you must specify the shard in the - following form: the replica set name, followed by a forward - slash, followed by a comma-separated list of seeds for the - replica set. For example, if the name of the replica set is - ``sh0``, and the replica set were to have three members, then your :method:`sh.addShard` command might resemble: + .. versionchanged:: 2.0.3 - .. code-block:: javascript + For MongoDB versions prior to 2.0.3, if you add a replica set as a + shard, you must specify all members of the replica set. For + example, if the name of the replica set is ``repl0``, then your + :method:`sh.addShard()` command would be: + + .. code-block:: javascript + + sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) + +.. _sharding-setup-enable-sharding: - sh.addShard( "sh0/sfo30.example.net,sfo31.example.net,sfo32.example.net" ) +Enable Sharding for a Database +------------------------------ -The :method:`sh.addShard()` helper in the :program:`mongo` shell is a wrapper for -the :dbcommand:`addShard` :term:`database command`. +Before you can shard a collection, you must enable sharding for the +collection's database. Enabling sharding for a database does not +redistribute data but make it possible to shard the collections in that +database. -Enable Sharding for Databases -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Once you enable sharding for a database, MongoDB assigns a +:term:`primary shard` for that database where MongoDB stores all data +before sharding begins. -While sharding operates on a per-collection basis, you must enable -sharding for each database that holds collections you want to shard. A -single cluster may have many databases, with each database housing -collections. +1. From a :program:`mongo` shell, connect to the :program:`mongos` + instance. Issue a command using the following syntax: -Use the following operation in a :program:`mongo` shell session -connected to a :program:`mongos` instance in your cluster: + .. code-block:: sh + + mongo --host --port + +#. Issue the :method:`sh.enableSharding()` method, specifying the name + of the database for which to enable sharding. Use the following syntax: + + .. code-block:: javascript + + sh.enableSharding() + +Optionally, you can enable sharding for a database using the +:dbcommand:`enableSharding` command, which uses the following syntax: .. code-block:: javascript - sh.enableSharding("records") + db.runCommand( { enableSharding: } ) -Where ``records`` is the name of the database that holds the collection -you want to shard. :method:`sh.enableSharding()` is a wrapper -around the :dbcommand:`enableSharding` :term:`database command`. You -can enable sharding for multiple databases in the cluster. +.. _sharding-setup-shard-collection: Enable Sharding for Collections -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------- -You can enable sharding on a per-collection basis. Because -MongoDB uses "range based sharding," you must specify the :term:`shard -key` MongoDB uses to distribute your documents among the -shards. For more information, see the -:ref:`overview of shard keys `. +You enable sharding on a per-collection basis. When you do, you +specify the :term:`shard key` MongoDB uses to distribute your documents +among the shards. For more information, see the +:ref:`sharding-shard-key`. To enable sharding for a collection, use the -:method:`sh.shardCollection()` helper in the :program:`mongo` shell. -The helper provides a wrapper around the :dbcommand:`shardCollection` -:term:`database command` and has the following prototype form: +:method:`sh.shardCollection()` method in the :program:`mongo` shell. +The command uses the following syntax: .. code-block:: javascript @@ -200,49 +232,48 @@ of your database, which consists of the name of your database, a dot represents your shard key, which you specify in the same form as you would an :method:`index ` key pattern. -Consider the following example invocations of -:method:`sh.shardCollection()`: +.. example:: The following sequence of commands shards four collections: -.. code-block:: javascript + .. code-block:: javascript - sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) - sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) - sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) - sh.shardCollection("events.alerts", { "hashed_id": 1 } ) + sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) + sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) + sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) + sh.shardCollection("events.alerts", { "hashed_id": 1 } ) -In order, these operations shard: + In order, these operations shard: -#. The ``people`` collection in the ``records`` database using the shard key - ``{ "zipcode": 1, "name": 1 }``. + #. The ``people`` collection in the ``records`` database using the + shard key ``{ "zipcode": 1, "name": 1 }``. - This shard key distributes documents by the value of the - ``zipcode`` field. If a number of documents have the same value for - this field, then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``name`` - field. + This shard key distributes documents by the value of the + ``zipcode`` field. If a number of documents have the same value + for this field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``name`` + field. -#. The ``addresses`` collection in the ``people`` database using the shard key - ``{ "state": 1, "_id": 1 }``. + #. The ``addresses`` collection in the ``people`` database using the + shard key ``{ "state": 1, "_id": 1 }``. - This shard key distributes documents by the value of the ``state`` - field. If a number of documents have the same value for this field, - then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``_id`` - field. + This shard key distributes documents by the value of the ``state`` + field. If a number of documents have the same value for this + field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``_id`` + field. -#. The ``chairs`` collection in the ``assets`` database using the shard key - ``{ "type": 1, "_id": 1 }``. + #. The ``chairs`` collection in the ``assets`` database using the shard key + ``{ "type": 1, "_id": 1 }``. - This shard key distributes documents by the value of the ``type`` - field. If a number of documents have the same value for this field, - then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``_id`` - field. + This shard key distributes documents by the value of the ``type`` + field. If a number of documents have the same value for this + field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``_id`` + field. -#. The ``alerts`` collection in the ``events`` database using the shard key - ``{ "hashed_id": 1 }``. + #. The ``alerts`` collection in the ``events`` database using the shard key + ``{ "hashed_id": 1 }``. - This shard key distributes documents by the value of the - ``hashed_id`` field. Presumably this is a computed value that - holds the hash of some value in your documents and is able to - evenly distribute documents throughout your cluster. + This shard key distributes documents by the value of the + ``hashed_id`` field. Presumably this is a computed value that + holds the hash of some value in your documents and is able to + evenly distribute documents throughout your cluster. diff --git a/source/tutorial/manage-balancer.txt b/source/tutorial/manage-balancer.txt new file mode 100644 index 00000000000..99e0f52bc5e --- /dev/null +++ b/source/tutorial/manage-balancer.txt @@ -0,0 +1,204 @@ +.. index:: balancing; operations +.. _sharding-balancing-operations: + +=================== +Manage the Balancer +=================== + +.. default-domain:: mongodb + +This page describes provides common administrative procedures related +to balancing. For an introduction to balancing, see +:ref:`sharding-balancing`. For lower level information on balancing, see +:ref:`sharding-balancing-internals`. + +This page includes the following: + +- :ref:`sharding-balancing-check-lock` + +- :ref:`sharding-schedule-balancing-window` + +- :ref:`sharding-balancing-remove-window` + +- :ref:`sharding-balancing-disable-temporally` + +.. _sharding-balancing-check-lock: + +Check the Balancer Lock +----------------------- + +To see if the balancer process is active in your :term:`cluster +`, do the following: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Use the following query to return the balancer lock: + + .. code-block:: javascript + + db.locks.find( { _id : "balancer" } ).pretty() + +When this command returns, you will see output like the following: + +.. code-block:: javascript + + { "_id" : "balancer", + "process" : "mongos0.example.net:1292810611:1804289383", + "state" : 2, + "ts" : ObjectId("4d0f872630c42d1978be8a2e"), + "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)", + "who" : "mongos0.example.net:1292810611:1804289383:Balancer:846930886", + "why" : "doing balance round" } + +This output confirms that: + +- The balancer originates from the :program:`mongos` running on the + system with the hostname ``mongos0.example.net``. + +- The value in the ``state`` field indicates that a :program:`mongos` + has the lock. For version 2.0 and later, the value of an active lock + is ``2``; for earlier versions the value is ``1``. + +.. optional:: + + You can also use the following shell helper, which returns a + boolean to report if the balancer is active: + + .. code-block:: javascript + + sh.getBalancerState() + +.. _sharding-schedule-balancing-window: + +Schedule the Balancing Window +----------------------------- + +In some situations, particularly when your data set grows slowly and a +migration can impact performance, it's useful to be able to ensure +that the balancer is active only at certain times. Use the following +procedure to specify a window during which the :term:`balancer` will +be able to migrate chunks: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Use an operation modeled on the following example :method:`update() + ` operation to modify the balancer's + window: + + .. code-block:: javascript + + db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "", stop : "" } } }, true ) + + Replace ```` and ```` with time values using + two digit hour and minute values (e.g ``HH:MM``) that describe the + beginning and end boundaries of the balancing window. + These times will be evaluated relative to the time zone of each individual + :program:`mongos` instance in the sharded cluster. + For instance, running the following + will force the balancer to run between 11PM and 6AM local time only: + + .. code-block:: javascript + + db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:00" } } }, true ) + +.. note:: + + The balancer window must be sufficient to *complete* the migration + of all data inserted during the day. + + As data insert rates can change based on activity and usage + patterns, it is important to ensure that the balancing window you + select will be sufficient to support the needs of your deployment. + +.. _sharding-balancing-remove-window: + +Remove a Balancing Window Schedule +---------------------------------- + +If you have :ref:`set the balancing window +` and wish to remove the schedule +so that the balancer is always running, issue the following sequence +of operations: + +.. code-block:: javascript + + use config + db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true }) + +.. _sharding-balancing-disable-temporally: + +Disable the Balancer +-------------------- + +By default the balancer may run at any time and only moves chunks as +needed. To disable the balancer for a short period of time and prevent +all migration, use the following procedure: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue *one* of the following operations to disable the balancer: + + .. code-block:: javascript + + sh.stopBalancer() + +#. Later, issue *one* the following operations to enable the balancer: + + .. code-block:: javascript + + sh.startBalancer() + +.. note:: + + If a migration is in progress, the system will complete + the in-progress migration. After disabling, you can use the + following operation in the :program:`mongo` shell to determine if + there are no migrations in progress: + + .. code-block:: javascript + + use config + while( db.locks.findOne({_id: "balancer"}).state ) { + print("waiting..."); sleep(1000); + } + +The above process and the :method:`sh.setBalancerState()`, +:method:`sh.startBalancer()`, and :method:`sh.stopBalancer()` helpers provide +wrappers on the following process, which may be useful if you need to +run this operation from a driver that does not have helper functions: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Issue the following update to disable the balancer: + + .. code-block:: javascript + + db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ); + +#. To enable the balancer again, alter the value of "stopped" as follows: + + .. code-block:: javascript + + db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true ); diff --git a/source/tutorial/manage-chunks.txt b/source/tutorial/manage-chunks.txt new file mode 100644 index 00000000000..ad4d775384c --- /dev/null +++ b/source/tutorial/manage-chunks.txt @@ -0,0 +1,383 @@ +.. _sharding-manage-chunks: + +================================== +Manage Chunks in a Sharded Cluster +================================== + +.. default-domain:: mongodb + +This page describes various operations on :term:`chunks ` in +:term:`sharded clusters `. MongoDB automates most +chunk management operations. However, these chunk management +operations are accessible to administrators for use in some +situations, typically surrounding initial setup, deployment, and data +ingestion. + +This page includes the following: + +- :ref:`sharding-administration-create-chunks` + +- :ref:`sharding-balancing-modify-chunk-size` + +- :ref:`sharding-balancing-manual-migration` + +- :ref:`sharding-bulk-inserts` + +- :ref:`sharding-procedure-create-split` + +.. _sharding-procedure-create-split: + +Split Chunks +------------ + +Normally, MongoDB splits a :term:`chunk` following inserts when a +chunk exceeds the :ref:`chunk size `. The +:term:`balancer` may migrate recently split chunks to a new shard +immediately if :program:`mongos` predicts future insertions will +benefit from the move. + +MongoDB treats all chunks the same, whether split manually or +automatically by the system. + +.. warning:: + + You cannot merge or combine chunks once you have split them. + +You may want to split chunks manually if: + +- you have a large amount of data in your cluster and very few + :term:`chunks `, + as is the case after deploying a cluster using existing data. + +- you expect to add a large amount of data that would + initially reside in a single chunk or shard. + +.. example:: + + You plan to insert a large amount of data with :term:`shard key` + values between ``300`` and ``400``, *but* all values of your shard + keys are between ``250`` and ``500`` are in a single chunk. + +Use :method:`sh.status()` to determine the current chunks ranges across +the cluster. + +To split chunks manually, use the :dbcommand:`split` command with +operators: ``middle`` and ``find``. The equivalent shell helpers are +:method:`sh.splitAt()` or :method:`sh.splitFind()`. + +.. example:: + + The following command will split the chunk that contains + the value of ``63109`` for the ``zipcode`` field in the ``people`` + collection of the ``records`` database: + + .. code-block:: javascript + + sh.splitFind( "records.people", { "zipcode": 63109 } ) + +:method:`sh.splitFind()` will split the chunk that contains the +*first* document returned that matches this query into two equally +sized chunks. You must specify the full namespace +(i.e. "``.``") of the sharded collection to +:method:`sh.splitFind()`. The query in :method:`sh.splitFind()` need +not contain the shard key, though it almost always makes sense to +query for the shard key in this case, and including the shard key will +expedite the operation. + +Use :method:`sh.splitAt()` to split a chunk in two using the queried +document as the partition point: + +.. code-block:: javascript + + sh.splitAt( "records.people", { "zipcode": 63109 } ) + +However, the location of the document that this query finds with +respect to the other documents in the chunk does not affect how the +chunk splits. + +.. _sharding-administration-pre-splitting: +.. _sharding-administration-create-chunks: + +Create Chunks (Pre-Splitting) +----------------------------- + +In most situations a :term:`sharded cluster` will create and distribute +chunks automatically without user intervention. However, in a limited +number of use profiles, MongoDB cannot create enough chunks or +distribute data fast enough to support required throughput. Consider +the following scenarios: + +- you must partition an existing data collection that resides on a + single shard. + +- you must ingest a large volume of data into a cluster that + isn't balanced, or where the ingestion of data will lead to an + imbalance of data. + + This can arise in an initial data loading, or in a case where you + must insert a large volume of data into a single chunk, as is the + case when you must insert at the beginning or end of the chunk + range, as is the case for monotonically increasing or decreasing + shard keys. + +Preemptively splitting chunks increases cluster throughput for these +operations, by reducing the overhead of migrating chunks that hold +data during the write operation. MongoDB only creates splits after an +insert operation, and can only migrate a single chunk at a time. Chunk +migrations are resource intensive and further complicated by large +write volume to the migrating chunk. + +.. warning:: + + You can only pre-split an empty collection. When you enable + sharding for a collection that contains data MongoDB automatically + creates splits. Subsequent attempts to create splits manually, can + lead to unpredictable chunk ranges and sizes as well as inefficient + or ineffective balancing behavior. + +To create and migrate chunks manually, use the following procedure: + +#. Split empty chunks in your collection by manually performing + :dbcommand:`split` command on chunks. + + .. example:: + + To create chunks for documents in the ``myapp.users`` + collection, using the ``email`` field as the :term:`shard key`, + use the following operation in the :program:`mongo` shell: + + .. code-block:: javascript + + for ( var x=97; x<97+26; x++ ){ + for( var y=97; y<97+26; y+=6 ) { + var prefix = String.fromCharCode(x) + String.fromCharCode(y); + db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); + } + } + + This assumes a collection size of 100 million documents. + +#. Migrate chunks manually using the :dbcommand:`moveChunk` command: + + .. example:: + + To migrate all of the manually created user profiles evenly, + putting each prefix chunk on the next shard from the other, run + the following commands in the mongo shell: + + .. code-block:: javascript + + var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ]; + for ( var x=97; x<97+26; x++ ){ + for( var y=97; y<97+26; y+=6 ) { + var prefix = String.fromCharCode(x) + String.fromCharCode(y); + db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]}) + } + } + + You can also let the balancer automatically distribute the new + chunks. For an introduction to balancing, see + :ref:`sharding-balancing`. For lower level information on balancing, + see :ref:`sharding-balancing-internals`. + +.. _sharding-balancing-modify-chunk-size: + +Modify Chunk Size +----------------- + +When you initialize a sharded cluster, the default chunk size is 64 +megabytes. This default chunk size works well for most deployments. However, if you +notice that automatic migrations are incurring a level of I/O that +your hardware cannot handle, you may want to reduce the chunk +size. For the automatic splits and migrations, a small chunk size +leads to more rapid and frequent migrations. + +To modify the chunk size, use the following procedure: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Issue the following :method:`save() ` + operation: + + .. code-block:: javascript + + db.settings.save( { _id:"chunksize", value: } ) + + Where the value of ```` reflects the new chunk size in + megabytes. Here, you're essentially writing a document whose values + store the global chunk size configuration value. + +.. note:: + + The :setting:`chunkSize` and :option:`--chunkSize ` + options, passed at runtime to the :program:`mongos` **do not** + affect the chunk size after you have initialized the cluster. + + To eliminate confusion you should *always* set chunk size using the + above procedure and never use the runtime options. + +Modifying the chunk size has several limitations: + +- Automatic splitting only occurs when inserting :term:`documents + ` or updating existing documents. + +- If you lower the chunk size it may take time for all chunks to split to + the new size. + +- Splits cannot be "undone." + +If you increase the chunk size, existing chunks must grow through +insertion or updates until they reach the new size. + +.. _sharding-balancing-manual-migration: + +Migrate Chunks +-------------- + +In most circumstances, you should let the automatic balancer +migrate :term:`chunks ` between :term:`shards `. +However, you may want to migrate chunks manually in a few cases: + +- If you create chunks by :term:`pre-splitting` the data in your + collection, you will have to migrate chunks manually to distribute + chunks evenly across the shards. Use pre-splitting in limited + situations, to support bulk data ingestion. + +- If the balancer in an active cluster cannot distribute chunks within + the balancing window, then you will have to migrate chunks manually. + +For more information on how chunks move between shards, see +:ref:`sharding-balancing-internals`, in particular the section +:ref:`sharding-chunk-migration`. + +To migrate chunks, use the :dbcommand:`moveChunk` command. + +.. note:: + + To return a list of shards, use the :dbcommand:`listShards` + command. + + Specify shard names using the :dbcommand:`addShard` command + using the ``name`` argument. If you do not specify a name in the + :dbcommand:`addShard` command, MongoDB will assign a name + automatically. + +The following example assumes that the field ``username`` is the +:term:`shard key` for a collection named ``users`` in the ``myapp`` +database, and that the value ``smith`` exists within the :term:`chunk` +you want to migrate. + +To move this chunk, you would issue the following command from a :program:`mongo` +shell connected to any :program:`mongos` instance. + +.. code-block:: javascript + + db.adminCommand({moveChunk : "myapp.users", find : {username : "smith"}, to : "mongodb-shard3.example.net"}) + +This command moves the chunk that includes the shard key value "smith" to the +:term:`shard` named ``mongodb-shard3.example.net``. The command will +block until the migration is complete. + +See :ref:`sharding-administration-create-chunks` for an introduction +to pre-splitting. + +.. versionadded:: 2.2 + :dbcommand:`moveChunk` command has the: ``_secondaryThrottle`` + parameter. When set to ``true``, MongoDB ensures that + :term:`secondary` members have replicated operations before allowing + new chunk migrations. + +.. warning:: + + The :dbcommand:`moveChunk` command may produce the following error + message: + + .. code-block:: none + + The collection's metadata lock is already taken. + + These errors occur when clients have too many open :term:`cursors + ` that access the chunk you are migrating. You can either + wait until the cursors complete their operation or close the + cursors manually. + + .. todo:: insert link to killing a cursor. + +.. index:: bulk insert +.. _sharding-bulk-inserts: + +Strategies for Bulk Inserts in Sharded Clusters +----------------------------------------------- + +.. todo:: Consider moving to the administrative guide as it's of an + applied nature, or create an applications document for sharding + +.. todo:: link the words "bulk insert" to the bulk insert topic when + it's published + + +Large bulk insert operations including initial data ingestion or +routine data import, can have a significant impact on a :term:`sharded +cluster`. Consider the following strategies and possibilities for +bulk insert operations: + +- If the collection does not have data, then there is only one + :term:`chunk`, which must reside on a single shard. MongoDB must + receive data, create splits, and distribute chunks to the available + shards. To avoid this performance cost, you can pre-split the + collection, as described in :ref:`sharding-administration-pre-splitting`. + +- You can parallelize import processes by sending insert operations to + more than one :program:`mongos` instance. If the collection is + empty, pre-split first, as described in + :ref:`sharding-administration-pre-splitting`. + +- If your shard key increases monotonically during an insert then all + the inserts will go to the last chunk in the collection, which will + always end up on a single shard. Therefore, the insert capacity of the + cluster will never exceed the insert capacity of a single shard. + + If your insert volume is never larger than what a single shard can + process, then there is no problem; however, if the insert volume + exceeds that range, and you cannot avoid a monotonically + increasing shard key, then consider the following modifications to + your application: + + - Reverse all the bits of the shard key to preserve the information + while avoiding the correlation of insertion order and increasing + sequence of values. + + - Swap the first and last 16-bit words to "shuffle" the inserts. + + .. example:: The following example, in C++, swaps the leading and + trailing 16-bit word of :term:`BSON` :term:`ObjectIds ` + generated so that they are no longer monotonically increasing. + + .. code-block:: cpp + + using namespace mongo; + OID make_an_id() { + OID x = OID::gen(); + const unsigned char *p = x.getData(); + swap( (unsigned short&) p[0], (unsigned short&) p[10] ); + return x; + } + + void foo() { + // create an object + BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); + // now we might insert o into a sharded collection... + } + + For information on choosing a shard key, see :ref:`sharding-shard-key` + and see :ref:`Shard Key Internals ` (in + particular, :ref:`sharding-internals-operations-and-reliability` and + :ref:`sharding-internals-choose-shard-key`). + diff --git a/source/tutorial/manage-shards.txt b/source/tutorial/manage-shards.txt new file mode 100644 index 00000000000..07b5f3db6f4 --- /dev/null +++ b/source/tutorial/manage-shards.txt @@ -0,0 +1,308 @@ +.. _sharding-manage-shards: + +============= +Manage Shards +============= + +.. default-domain:: mongodb + +This page includes the following: + +- :ref:`sharding-procedure-add-shard` + +- :ref:`sharding-procedure-remove-shard` + +- :ref:`sharding-procedure-list-databases` + +- :ref:`sharding-procedure-list-shards` + +- :ref:`sharding-procedure-view-clusters` + +.. _sharding-procedure-add-shard: + +Add a Shard to a Cluster +------------------------ + +To add a shard to an *existing* sharded cluster, use the following +procedure: + +#. Connect to a :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. First, you need to tell the cluster where to find the individual + shards. You can do this using the :dbcommand:`addShard` command or + the :method:`sh.addShard()` helper: + + .. code-block:: javascript + + sh.addShard( ":" ) + + Replace ```` and ```` with the hostname and TCP + port number of where the shard is accessible. + Alternately specify a :term:`replica set` name and at least one + hostname which is a member of the replica set. + + For example: + + .. code-block:: javascript + + sh.addShard( "mongodb0.example.net:27027" ) + + .. note:: In production deployments, all shards should be replica sets. + + Repeat for each shard in your cluster. + + .. optional:: + + You may specify a "name" as an argument to the + :dbcommand:`addShard` command, as follows: + + .. code-block:: javascript + + db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) + + You cannot specify a name for a shard using the + :method:`sh.addShard()` helper in the :program:`mongo` shell. If + you use the helper or do not specify a shard name, then MongoDB + will assign a name upon creation. + + .. versionchanged:: 2.0.3 + Before version 2.0.3, you must specify the shard in the + following form: the replica set name, followed by a forward + slash, followed by a comma-separated list of seeds for the + replica set. For example, if the name of the replica set is + "myapp1", then your :method:`sh.addShard()` command might resemble: + + .. code-block:: javascript + + sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) + +.. note:: + + It may take some time for :term:`chunks ` to migrate to the + new shard. + + For an introduction to balancing, see :ref:`sharding-balancing`. For + lower level information on balancing, see :ref:`sharding-balancing-internals`. + +.. _sharding-procedure-remove-shard: + +Remove a Shard from a Cluster +----------------------------- + +To remove a :term:`shard` from a :term:`sharded cluster`, you must: + +- Migrate :term:`chunks ` to another shard or database. + +- Ensure that this shard is not the :term:`primary shard` for any databases in + the cluster. If it is, move the "primary" status for these databases + to other shards. + +- Finally, remove the shard from the cluster's configuration. + +.. note:: + + To successfully migrate data from a shard, the :term:`balancer` + process **must** be active. + +The procedure to remove a shard is as follows: + +#. Connect to a :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Determine the name of the shard you will be removing. + + You must specify the name of the shard. You may have specified this + shard name when you first ran the :dbcommand:`addShard` command. If not, + you can find out the name of the shard by running the + :dbcommand:`listShards` or :dbcommand:`printShardingStatus` + commands or the :method:`sh.status()` shell helper. + + The following examples will remove a shard named ``mongodb0`` from the cluster. + +#. Begin removing chunks from the shard. + + Start by running the :dbcommand:`removeShard` command. This will + start "draining" or migrating chunks from the shard you're removing + to another shard in the cluster. + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + This operation will return the following response immediately: + + .. code-block:: javascript + + { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } + + Depending on your network capacity and the amount of data in the + shard, this operation can take anywhere from a few minutes to several + days to complete. + +#. View progress of the migration. + + You can run the :dbcommand:`removeShard` command again at any stage of the + process to view the progress of the migration, as follows: + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + The output should look something like this: + + .. code-block:: javascript + + { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } + + In the ``remaining`` sub-document ``{ chunks: xx, dbs: y }``, a + counter displays the remaining number of chunks that MongoDB must + migrate to other shards and the number of MongoDB databases that have + "primary" status on this shard. + + Continue checking the status of the :dbcommand:`removeShard` command + until the remaining number of chunks to transfer is 0. + +#. Move any databases to other shards in the cluster as needed. + + This is only necessary when removing a shard that is also the + :term:`primary shard` for one or more databases. + + Issue the following command at the :program:`mongo` shell: + + .. code-block:: javascript + + db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) + + This command will migrate all remaining non-sharded data in the + database named ``myapp`` to the shard named ``mongodb1``. + + .. warning:: + + Do not run the :dbcommand:`movePrimary` command until you have *finished* + draining the shard. + + The command will not return until MongoDB completes moving all + data. The response from this command will resemble the following: + + .. code-block:: javascript + + { "primary" : "mongodb1", "ok" : 1 } + +#. Run :dbcommand:`removeShard` again to clean up all metadata + information and finalize the shard removal, as follows: + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + When successful, this command will return a document like this: + + .. code-block:: javascript + + { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } + +Once the value of the ``stage`` field is "completed," you may safely +stop the processes comprising the ``mongodb0`` shard. + + +.. _sharding-procedure-list-databases: + +List Databases with Sharding Enabled +------------------------------------ + +To list the databases that have sharding enabled, query the +``databases`` collection in the :ref:`config-database`. +A database has sharding enabled if the value of the ``partitioned`` +field is ``true``. Connect to a :program:`mongos` instance with a +:program:`mongo` shell, and run the following operation to get a full +list of databases with sharding enabled: + +.. code-block:: javascript + + use config + db.databases.find( { "partitioned": true } ) + +.. example:: You can use the following sequence of commands when to + return a list of all databases in the cluster: + + .. code-block:: javascript + + use config + db.databases.find() + + If this returns the following result set: + + .. code-block:: javascript + + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" } + { "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" } + + Then sharding is only enabled for the ``animals`` database. + +.. _sharding-procedure-list-shards: + +List Shards +----------- + +To list the current set of configured shards, use the :dbcommand:`listShards` +command, as follows: + +.. code-block:: javascript + + use admin + db.runCommand( { listShards : 1 } ) + +.. _sharding-procedure-view-clusters: + +View Cluster Details +-------------------- + +To view cluster details, issue :method:`db.printShardingStatus()` or +:method:`sh.status()`. Both methods return the same output. + +.. example:: In the following example output from :method:`sh.status()` + + - ``sharding version`` displays the version number of the shard + metadata. + + - ``shards`` displays a list of the :program:`mongod` instances + used as shards in the cluster. + + - ``databases`` displays all databases in the cluster, + including database that do not have sharding enabled. + + - The ``chunks`` information for the ``foo`` database displays how + many chunks are on each shard and displays the range of each chunk. + + .. code-block:: javascript + + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "shard0000", "host" : "m0.example.net:30001" } + { "_id" : "shard0001", "host" : "m3.example2.net:50000" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "animals", "partitioned" : true, "primary" : "shard0000" } + foo.big chunks: + shard0001 1 + shard0000 6 + { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo + { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo + { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo + { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) + { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + foo.large chunks: + shard0001 1 + shard0000 5 + { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) + { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo + { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) + { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + { "_id" : "test", "partitioned" : false, "primary" : "shard0000" }