From 7c66df6da2d7dbbf192c8b4720f85ac13116a8bc Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 15 Jan 2013 14:21:20 -0500 Subject: [PATCH 01/15] DOCS-987 break up shard page into concepts, tutorials, refs --- source/sharding.txt | 79 ++++++++++++++++++++------------------------- 1 file changed, 35 insertions(+), 44 deletions(-) diff --git a/source/sharding.txt b/source/sharding.txt index e4c58e87a62..a3a97d91369 100644 --- a/source/sharding.txt +++ b/source/sharding.txt @@ -9,69 +9,60 @@ single logical database system across a cluster of machines. Sharding uses range-based portioning to distribute :term:`documents ` based on a specific :term:`shard key`. -This page lists the documents, tutorials, and reference pages that -describe sharding. +Concepts +-------- -For an overview, see :doc:`/core/sharding`. To configure, maintain, and -troubleshoot sharded clusters, see :doc:`/administration/sharding`. -For deployment architectures, see :doc:`/administration/sharding-architectures`. -For details on the internal operations of sharding, see :doc:`/core/sharding-internals`. -For procedures for performing certain sharding tasks, see the -:ref:`Tutorials ` list. +Fundamentals +~~~~~~~~~~~~ -Documentation -------------- +:doc:`/core/sharding` -The following is the outline of the main documentation: +:doc:`/administration/sharding-architectures` -.. toctree:: - :maxdepth: 2 +:doc:`/faq/sharding` - core/sharding - administration/sharding - administration/sharding-architectures - core/sharding-internals +Internals +~~~~~~~~~ -.. index:: tutorials; sharding -.. _sharding-tutorials: +:doc:`/core/sharding-internals` -Tutorials ---------- +Procedures and Tutorials +------------------------ + +Create Sharded Clusters +~~~~~~~~~~~~~~~~~~~~~~~ + +:doc:`/administration/sharding` + +:doc:`/tutorial/deploy-shard-cluster` + +:doc:`/tutorial/add-shards-to-shard-cluster` + +:doc:`/tutorial/remove-shards-from-cluster` + +:doc:`/tutorial/enforce-unique-keys-for-sharded-collections` -The following tutorials describe specific sharding procedures: +:doc:`/tutorial/convert-replica-set-to-replicated-shard-cluster` -Deploying Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~ +Backup Sharded Clusters +~~~~~~~~~~~~~~~~~~~~~~~ -.. toctree:: - :maxdepth: 1 +:doc:`/tutorial/backup-small-sharded-cluster-with-mongodump` - tutorial/deploy-shard-cluster - tutorial/add-shards-to-shard-cluster - tutorial/remove-shards-from-cluster - tutorial/enforce-unique-keys-for-sharded-collections - tutorial/convert-replica-set-to-replicated-shard-cluster +:doc:`/tutorial/backup-sharded-cluster-with-filesystem-snapshots` -Backups for Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +:doc:`/tutorial/backup-sharded-cluster-with-database-dumps` -.. toctree:: - :maxdepth: 1 +:doc:`/tutorial/restore-single-shard` - tutorial/backup-small-sharded-cluster-with-mongodump - tutorial/backup-sharded-cluster-with-filesystem-snapshots - tutorial/backup-sharded-cluster-with-database-dumps - tutorial/restore-single-shard - tutorial/restore-sharded-cluster - tutorial/schedule-backup-window-for-sharded-clusters +:doc:`/tutorial/restore-sharded-cluster` -.. _sharding-reference: +:doc:`/tutorial/schedule-backup-window-for-sharded-clusters` Reference --------- -- :ref:`sharding-commands` -- :doc:`/faq/sharding` +:ref:`sharding-commands` .. STUB tutorial/replace-one-configuration-server-in-a-shard-cluster .. STUB tutorial/replace-all-configuration-servers-in-a-shard-cluster From f3ce5360110bfcc50dad4427a7119c5ba53403e0 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 15 Jan 2013 16:38:27 -0500 Subject: [PATCH 02/15] DOCS-987 break out sharding-requirements into own page --- .../administration/sharding-architectures.txt | 5 +- source/core/sharding-requirements.txt | 162 +++++++++++++ source/core/sharding.txt | 218 +++--------------- 3 files changed, 200 insertions(+), 185 deletions(-) create mode 100644 source/core/sharding-requirements.txt diff --git a/source/administration/sharding-architectures.txt b/source/administration/sharding-architectures.txt index a717bf7425e..d327380806a 100644 --- a/source/administration/sharding-architectures.txt +++ b/source/administration/sharding-architectures.txt @@ -9,10 +9,7 @@ Sharded Cluster Architectures .. default-domain:: mongodb This document describes the organization and design of :term:`sharded -cluster` deployments. For documentation of common administrative tasks -related to sharded clusters, see :doc:`/administration/sharding`. For -complete documentation of sharded clusters see the :doc:`/sharding` -section of this manual. +cluster` deployments. .. seealso:: :ref:`sharding-requirements`. diff --git a/source/core/sharding-requirements.txt b/source/core/sharding-requirements.txt new file mode 100644 index 00000000000..b36bb67aec1 --- /dev/null +++ b/source/core/sharding-requirements.txt @@ -0,0 +1,162 @@ +.. index:: sharding; requirements +.. _sharding-requirements: + +============================ +Sharded Cluster Requirements +============================ + +.. default-domain:: mongodb + +This page includes the following: + +- :ref:`sharding-requirements-when-to-use-sharding` + +- :ref:`sharding-requirements-infrastructure` + +- :ref:`sharding-requirements-data` + +- :ref:`sharding-localhost` + +.. _sharding-requirements-when-to-use-sharding: + +When to Use Sharding +-------------------- + +While sharding is a powerful and compelling feature, it comes with +significant :ref:`sharding-requirements-infrastructure` +and some limited complexity costs. As a result, use +sharding only as necessary, and when indicated by actual operational +requirements. Consider the following overview of indications it may be +time to consider sharding. + +You should consider deploying a :term:`sharded cluster`, if: + +- your data set approaches or exceeds the storage capacity of a single + node in your system. + +- the size of your system's active :term:`working set` *will soon* + exceed the capacity of the *maximum* amount of RAM for your system. + +- your system has a large amount of write activity, a single + MongoDB instance cannot write data fast enough to meet demand, and + all other approaches have not reduced contention. + +If these attributes are not present in your system, sharding will only +add additional complexity to your system without providing much +benefit. When designing your data model, if you will eventually need a +sharded cluster, consider which collections you will want to shard and +the corresponding shard keys. + +.. _sharding-capacity-planning: + +.. warning:: + + It takes time and resources to deploy sharding, and if your system + has *already* reached or exceeded its capacity, you will have a + difficult time deploying sharding without impacting your + application. + + As a result, if you think you will need to partition your database + in the future, **do not** wait until your system is overcapacity to + enable sharding. + +.. _sharding-requirements-infrastructure: + +Infrastructure Requirements +--------------------------- + +A :term:`sharded cluster` has the following components: + +- Three :term:`config servers `. + + These special :program:`mongod` instances store the metadata for the + cluster. The :program:`mongos` instances cache this data and use it + to determine which :term:`shard` is responsible for which + :term:`chunk`. + + For development and testing purposes you may deploy a cluster with a single + configuration server process, but always use exactly three config + servers for redundancy and safety in production. + +- Two or more shards. Each shard consists of one or more :program:`mongod` + instances that store the data for the shard. + + These "normal" :program:`mongod` instances hold all of the + actual data for the cluster. + + Typically each shard is a :term:`replica sets `. Each + replica set consists of multiple :program:`mongod` instances. The members + of the replica set provide redundancy and high available for the data in each shard. + + .. warning:: + + MongoDB enables data :term:`partitioning `, or + sharding, on a *per collection* basis. You *must* access all data + in a sharded cluster via the :program:`mongos` instances as below. + If you connect directly to a :program:`mongod` in a sharded cluster + you will see its fraction of the cluster's data. The data on any + given shard may be somewhat random: MongoDB provides no guarantee + that any two contiguous chunks will reside on a single shard. + +- One or more :program:`mongos` instances. + + These instance direct queries from the application layer to the + shards that hold the data. The :program:`mongos` instances have no + persistent state or data files and only cache metadata in RAM from + the config servers. + + .. note:: + + In most situations :program:`mongos` instances use minimal + resources, and you can run them on your application servers + without impacting application performance. However, if you use + the :term:`aggregation framework` some processing may occur on + the :program:`mongos` instances, causing that :program:`mongos` + to require more system resources. + +.. _sharding-requirements-data: + +Data Requirements +----------------- + +Your cluster must manage a significant quantity of data for sharding +to have an effect on your collection. The default :term:`chunk` size +is 64 megabytes, and the :ref:`balancer +` will not begin moving data until the imbalance +of chunks in the cluster exceeds the :ref:`migration threshold +`. + +Practically, this means that unless your cluster has many hundreds of +megabytes of data, chunks will remain on a single shard. + +While there are some exceptional situations where you may need to +shard a small collection of data, most of the time the additional +complexity added by sharding the small collection is not worth the additional +complexity and overhead unless +you need additional concurrency or capacity for some reason. If you +have a small data set, usually a properly configured +single MongoDB instance or replica set will be more than sufficient +for your persistence layer needs. + +:term:`Chunk ` size is :option:`user configurable `. +However, the default value is of 64 megabytes is ideal +for most deployments. See the :ref:`sharding-chunk-size` section in the +:doc:`sharding-internals` document for more information. + +.. index:: sharding; localhost +.. _sharding-localhost: + +Restrictions on the Use of "localhost" +-------------------------------------- + +Because all components of a :term:`sharded cluster` must communicate +with each other over the network, there are special restrictions +regarding the use of localhost addresses: + +If you use either "localhost" or "``127.0.0.1``" as the host +identifier, then you must use "localhost" or "``127.0.0.1``" for *all* +host settings for any MongoDB instances in the cluster. This applies +to both the ``host`` argument to :dbcommand:`addShard` and the value +to the :option:`mongos --configdb` run time option. If you mix +localhost addresses with remote host address, MongoDB will produce +errors. diff --git a/source/core/sharding.txt b/source/core/sharding.txt index a06e4cba27d..35a71f8088b 100644 --- a/source/core/sharding.txt +++ b/source/core/sharding.txt @@ -1,28 +1,45 @@ .. index:: fundamentals; sharding .. _sharding-fundamentals: -===================== -Sharding Fundamentals -===================== +================= +Sharding Overview +================= .. default-domain:: mongodb -This document provides an overview of the fundamental concepts and -operations of sharding with MongoDB. For a list of all sharding -documentation see :doc:`/sharding`. +Sharding is MongoDB’s approach to scaling out. Sharding partitions a +database collection and stores the different portions on different +machines. Sharding automatically balances data and load across machines. +By splitting a collection up across multiple machines, MongoDB increases +write capacity, provides the ability to support larger working sets, and +raises the limits of total data size beyond the physical resources of a +single node. -MongoDB's sharding system allows users to :term:`partition` a -:term:`collection` within a database to distribute the collection's documents -across a number of :program:`mongod` instances or :term:`shards `. -Sharding increases write capacity, provides the ability to -support larger working sets, and raises the limits of total data size beyond -the physical resources of a single node. +When your collections become too large for existing storage, you need +only add a new machine and MongoDB automatically distributes data to the +new server. -Sharding Overview ------------------ +Sharded Cluster +--------------- + +Sharding occurs within a :term:`sharded cluster`. A typical sharded cluster +consists of the following: + +- Two or more :program:`mongod` instances that serve as the + :term:`shards ` that hold your database collections. Usually + each shard comprises a :term:`replica set`, not a lone + :program:`mongod` instance. + +- Three :program:`mongod` instances that serve as config servers to hold + metadata about the cluster. The metadata maps :term:`chunks + ` to shards. -Features -~~~~~~~~ +- Two or more :program:`mongos` instances that route the reads and + writes to the shards. A :program:`mongos` instance is a lightweight + routing process. + +Sharding Features +----------------- With sharding MongoDB automatically distributes data among a collection of :program:`mongod` instances. Sharding, as implemented in @@ -66,165 +83,6 @@ MongoDB has the following features: users to increase the potential amount of data to manage with MongoDB and expand the :term:`working set`. -A typical :term:`sharded cluster` consists of: - -- 3 config servers that store metadata. The metadata maps :term:`chunks - ` to shards. - -- More than one :term:`replica sets ` that hold - data. These are the :term:`shards `. - -- A number of lightweight routing processes, called :doc:`mongos - ` instances. The :program:`mongos` process routes - operations to the correct shard based the cluster configuration. - -When to Use Sharding -~~~~~~~~~~~~~~~~~~~~ - -While sharding is a powerful and compelling feature, it comes with -significant :ref:`sharding-requirements-infrastructure` -and some limited complexity costs. As a result, use -sharding only as necessary, and when indicated by actual operational -requirements. Consider the following overview of indications it may be -time to consider sharding. - -You should consider deploying a :term:`sharded cluster`, if: - -- your data set approaches or exceeds the storage capacity of a single - node in your system. - -- the size of your system's active :term:`working set` *will soon* - exceed the capacity of the *maximum* amount of RAM for your system. - -- your system has a large amount of write activity, a single - MongoDB instance cannot write data fast enough to meet demand, and - all other approaches have not reduced contention. - -If these attributes are not present in your system, sharding will only -add additional complexity to your system without providing much -benefit. When designing your data model, if you will eventually need a -sharded cluster, consider which collections you will want to shard and -the corresponding shard keys. - -.. _sharding-capacity-planning: - -.. warning:: - - It takes time and resources to deploy sharding, and if your system - has *already* reached or exceeded its capacity, you will have a - difficult time deploying sharding without impacting your - application. - - As a result, if you think you will need to partition your database - in the future, **do not** wait until your system is overcapacity to - enable sharding. - -.. index:: sharding; requirements -.. _sharding-requirements: - -Sharding Requirements ---------------------- - -.. _sharding-requirements-infrastructure: - -Infrastructure Requirements -~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -A :term:`sharded cluster` has the following components: - -- Three :term:`config servers `. - - These special :program:`mongod` instances store the metadata for the - cluster. The :program:`mongos` instances cache this data and use it - to determine which :term:`shard` is responsible for which - :term:`chunk`. - - For development and testing purposes you may deploy a cluster with a single - configuration server process, but always use exactly three config - servers for redundancy and safety in production. - -- Two or more shards. Each shard consists of one or more :program:`mongod` - instances that store the data for the shard. - - These "normal" :program:`mongod` instances hold all of the - actual data for the cluster. - - Typically each shard is a :term:`replica sets `. Each - replica set consists of multiple :program:`mongod` instances. The members - of the replica set provide redundancy and high available for the data in each shard. - - .. warning:: - - MongoDB enables data :term:`partitioning `, or - sharding, on a *per collection* basis. You *must* access all data - in a sharded cluster via the :program:`mongos` instances as below. - If you connect directly to a :program:`mongod` in a sharded cluster - you will see its fraction of the cluster's data. The data on any - given shard may be somewhat random: MongoDB provides no guarantee - that any two contiguous chunks will reside on a single shard. - -- One or more :program:`mongos` instances. - - These instance direct queries from the application layer to the - shards that hold the data. The :program:`mongos` instances have no - persistent state or data files and only cache metadata in RAM from - the config servers. - - .. note:: - - In most situations :program:`mongos` instances use minimal - resources, and you can run them on your application servers - without impacting application performance. However, if you use - the :term:`aggregation framework` some processing may occur on - the :program:`mongos` instances, causing that :program:`mongos` - to require more system resources. - -Data Requirements -~~~~~~~~~~~~~~~~~ - -Your cluster must manage a significant quantity of data for sharding -to have an effect on your collection. The default :term:`chunk` size -is 64 megabytes, [#chunk-size]_ and the :ref:`balancer -` will not begin moving data until the imbalance -of chunks in the cluster exceeds the :ref:`migration threshold -`. - -Practically, this means that unless your cluster has many hundreds of -megabytes of data, chunks will remain on a single shard. - -While there are some exceptional situations where you may need to -shard a small collection of data, most of the time the additional -complexity added by sharding the small collection is not worth the additional -complexity and overhead unless -you need additional concurrency or capacity for some reason. If you -have a small data set, usually a properly configured -single MongoDB instance or replica set will be more than sufficient -for your persistence layer needs. - -.. [#chunk-size] :term:`chunk` size is :option:`user configurable - `. However, the default value is of 64 - megabytes is ideal for most deployments. See the - :ref:`sharding-chunk-size` section in the - :doc:`sharding-internals` document for more information. - -.. index:: sharding; localhost -.. _sharding-localhost: - -Sharding and "localhost" Addresses -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Because all components of a :term:`sharded cluster` must communicate -with each other over the network, there are special restrictions -regarding the use of localhost addresses: - -If you use either "localhost" or "``127.0.0.1``" as the host -identifier, then you must use "localhost" or "``127.0.0.1``" for *all* -host settings for any MongoDB instances in the cluster. This applies -to both the ``host`` argument to :dbcommand:`addShard` and the value -to the :option:`mongos --configdb` run time option. If you mix -localhost addresses with remote host address, MongoDB will produce -errors. - .. index:: shard key single: sharding; shard key @@ -234,15 +92,13 @@ errors. Shard Keys ---------- -.. todo:: link this section to - -"Shard keys" refer to the :term:`field` that exists in every -:term:`document` in a collection that MongoDB uses to distribute -documents among the :term:`shards `. Shard keys, like +MongoDB distributes documents among :term:`shards ` based on +:term:`shard keys `. A shard key must be :term:`field` that +exists in every :term:`document` in the collection. Shard keys, like :term:`indexes `, can be either a single field, or may be a compound key, consisting of multiple fields. -Remember, MongoDB's sharding is range-based: each :term:`chunk` holds +MongoDB's sharding is range-based. Each :term:`chunk` holds documents having specific range of values for the "shard key". Thus, choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster. From 614ed023fad45a5669c79ad54c8070beeca9f443 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 15 Jan 2013 17:04:10 -0500 Subject: [PATCH 03/15] DOCS-987 change link format on shard page, create shard command ref --- source/reference/commands.txt | 29 ------------- source/reference/sharding-commands.txt | 33 +++++++++++++++ source/sharding.txt | 56 +++++++++++++------------- 3 files changed, 60 insertions(+), 58 deletions(-) create mode 100644 source/reference/sharding-commands.txt diff --git a/source/reference/commands.txt b/source/reference/commands.txt index 1e1617f175a..dbfd092a486 100644 --- a/source/reference/commands.txt +++ b/source/reference/commands.txt @@ -77,35 +77,6 @@ includes the relevant :program:`mongo` shell helpers. See User Commands ------------- -.. _sharding-commands: - -Sharding Commands -~~~~~~~~~~~~~~~~~ - -.. seealso:: :doc:`/sharding` for more information about MongoDB's - sharding functionality. - -.. include:: command/addShard.txt - :start-after: mongodb - -.. include:: command/listShards.txt - :start-after: mongodb - -.. include:: command/enableSharding.txt - :start-after: mongodb - -.. include:: command/shardCollection.txt - :start-after: mongodb - -.. include:: command/shardingState.txt - :start-after: mongodb - -.. include:: command/removeShard.txt - :start-after: mongodb - -.. include:: command/printShardingStatus.txt - :start-after: mongodb - Aggregation Commands ~~~~~~~~~~~~~~~~~~~~ diff --git a/source/reference/sharding-commands.txt b/source/reference/sharding-commands.txt new file mode 100644 index 00000000000..68e3a984322 --- /dev/null +++ b/source/reference/sharding-commands.txt @@ -0,0 +1,33 @@ +.. _sharding-commands: + +================= +Sharding Commands +================= + +.. default-domain:: mongodb + +The following database commands support :term:`sharded clusters `. + +.. seealso:: :doc:`/sharding` for more information about MongoDB's + sharding functionality. + +.. include:: command/addShard.txt + :start-after: mongodb + +.. include:: command/listShards.txt + :start-after: mongodb + +.. include:: command/enableSharding.txt + :start-after: mongodb + +.. include:: command/shardCollection.txt + :start-after: mongodb + +.. include:: command/shardingState.txt + :start-after: mongodb + +.. include:: command/removeShard.txt + :start-after: mongodb + +.. include:: command/printShardingStatus.txt + :start-after: mongodb diff --git a/source/sharding.txt b/source/sharding.txt index a3a97d91369..29087351ad7 100644 --- a/source/sharding.txt +++ b/source/sharding.txt @@ -12,19 +12,14 @@ based on a specific :term:`shard key`. Concepts -------- -Fundamentals -~~~~~~~~~~~~ +.. toctree:: + :maxdepth: 1 -:doc:`/core/sharding` - -:doc:`/administration/sharding-architectures` - -:doc:`/faq/sharding` - -Internals -~~~~~~~~~ - -:doc:`/core/sharding-internals` + core/sharding + core/sharding-requirements + administration/sharding-architectures + faq/sharding + core/sharding-internals Procedures and Tutorials ------------------------ @@ -32,37 +27,40 @@ Procedures and Tutorials Create Sharded Clusters ~~~~~~~~~~~~~~~~~~~~~~~ -:doc:`/administration/sharding` - -:doc:`/tutorial/deploy-shard-cluster` +.. toctree:: + :maxdepth: 1 -:doc:`/tutorial/add-shards-to-shard-cluster` - -:doc:`/tutorial/remove-shards-from-cluster` - -:doc:`/tutorial/enforce-unique-keys-for-sharded-collections` - -:doc:`/tutorial/convert-replica-set-to-replicated-shard-cluster` + administration/sharding + tutorial/deploy-shard-cluster + tutorial/add-shards-to-shard-cluster + tutorial/remove-shards-from-cluster + tutorial/enforce-unique-keys-for-sharded-collections + tutorial/convert-replica-set-to-replicated-shard-cluster Backup Sharded Clusters ~~~~~~~~~~~~~~~~~~~~~~~ -:doc:`/tutorial/backup-small-sharded-cluster-with-mongodump` +.. toctree:: + :maxdepth: 1 -:doc:`/tutorial/backup-sharded-cluster-with-filesystem-snapshots` + tutorial/backup-small-sharded-cluster-with-mongodump + tutorial/backup-sharded-cluster-with-filesystem-snapshots + tutorial/backup-sharded-cluster-with-database-dumps + tutorial/restore-single-shard + tutorial/restore-sharded-cluster + tutorial/schedule-backup-window-for-sharded-clusters -:doc:`/tutorial/backup-sharded-cluster-with-database-dumps` -:doc:`/tutorial/restore-single-shard` -:doc:`/tutorial/restore-sharded-cluster` -:doc:`/tutorial/schedule-backup-window-for-sharded-clusters` Reference --------- -:ref:`sharding-commands` +.. toctree:: + :maxdepth: 1 + + reference/sharding-commands .. STUB tutorial/replace-one-configuration-server-in-a-shard-cluster .. STUB tutorial/replace-all-configuration-servers-in-a-shard-cluster From 59e2a8f3bd62200da5272346a58088bb1afacac4 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 15 Jan 2013 21:14:01 -0500 Subject: [PATCH 04/15] DOCS-987 many changes --- source/administration/sharded-clusters.txt | 261 ++++++ .../administration/sharding-architectures.txt | 42 +- source/administration/sharding.txt | 840 ------------------ source/core/sharding-security.txt | 63 ++ source/core/sharding.txt | 421 +-------- source/faq/sharding.txt | 3 +- source/sharding.txt | 29 +- source/tutorial/deploy-shard-cluster.txt | 349 ++++---- source/tutorial/manage-chunks.txt | 383 ++++++++ source/tutorial/manage-shards.txt | 308 +++++++ 10 files changed, 1278 insertions(+), 1421 deletions(-) create mode 100644 source/administration/sharded-clusters.txt create mode 100644 source/core/sharding-security.txt create mode 100644 source/tutorial/manage-chunks.txt create mode 100644 source/tutorial/manage-shards.txt diff --git a/source/administration/sharded-clusters.txt b/source/administration/sharded-clusters.txt new file mode 100644 index 00000000000..f8d0bf7045a --- /dev/null +++ b/source/administration/sharded-clusters.txt @@ -0,0 +1,261 @@ +.. index:: sharded clusters +.. _sharding-sharded-cluster: + +================ +Sharded Clusters +================ + +.. default-domain:: mongodb + +Sharding occurs within a :term:`sharded cluster`. A sharded cluster +consists of the following. For specific configurations, see +:ref:`sharding-architecture`: + +- :program:`mongod` instances that serve as the + :term:`shards ` that hold your database collections. + +- :program:`mongod` instances that serve as config servers to hold + metadata about the cluster. The metadata maps :term:`chunks + ` to shards. See :ref:`sharding-config-server`. + +- :program:`mongos` instances that route the reads and + writes to the shards. + +.. index:: sharding; config servers +.. index:: config servers +.. _sharding-config-server: + +Config Servers +-------------- + +Config servers maintain the shard metadata in a config database. The +:term:`config database` stores the relationship between :term:`chunks +` and where they reside within a :term:`sharded cluster`. Without +a config database, the :program:`mongos` instances would be unable to +route queries or write operations within the cluster. + +Config servers *do not* run as replica sets. Instead, a :term:`cluster +` operates with a group of *three* config servers that use a +two-phase commit process that ensures immediate consistency and +reliability. + +For testing purposes you may deploy a cluster with a single +config server, but this is not recommended for production. + +.. warning:: + + If your cluster has a single config server, this + :program:`mongod` is a single point of failure. If the instance is + inaccessible the cluster is not accessible. If you cannot recover + the data on a config server, the cluster will be inoperable. + + **Always** use three config servers for production deployments. + +The actual load on configuration servers is small because each +:program:`mongos` instances maintains a cached copy of the configuration +database. MongoDB only writes data to the config server to: + +- create splits in existing chunks, which happens as data in + existing chunks exceeds the maximum chunk size. + +- migrate a chunk between shards. + +Additionally, all config servers must be available on initial setup +of a sharded cluster, each :program:`mongos` instance must be able +to write to the ``config.version`` collection. + +If one or two configuration instances become unavailable, the +cluster's metadata becomes *read only*. It is still possible to read +and write data from the shards, but no chunk migrations or splits will +occur until all three servers are accessible. At the same time, config +server data is only read in the following situations: + +- A new :program:`mongos` starts for the first time, or an existing + :program:`mongos` restarts. + +- After a chunk migration, the :program:`mongos` instances update + themselves with the new cluster metadata. + +If all three config servers are inaccessible, you can continue to use +the cluster as long as you don't restart the :program:`mongos` +instances until after config servers are accessible again. If you +restart the :program:`mongos` instances and there are no accessible +config servers, the :program:`mongos` would be unable to direct +queries or write operations to the cluster. + +Because the configuration data is small relative to the amount of data +stored in a cluster, the amount of activity is relatively low, and 100% +up time is not required for a functioning sharded cluster. As a result, +backing up the config servers is not difficult. Backups of config +servers are critical as clusters become totally inoperable when +you lose all configuration instances and data. Precautions to ensure +that the config servers remain available and intact are critical. + +.. note:: + + Configuration servers store metadata for a single sharded cluster. + You must have a separate configuration server or servers for each + cluster you administer. + +.. index:: mongos +.. _sharding-mongos: +.. _sharding-read-operations: + +Mongos Instances +---------------- + +The :program:`mongos` provides a single unified interface to a sharded +cluster for applications using MongoDB. Except for the selection of a +:term:`shard key`, application developers and administrators need not +consider any of the :ref:`internal details of sharding `. + +:program:`mongos` caches data from the :ref:`config server +`, and uses this to route operations from +applications and clients to the :program:`mongod` instances. +:program:`mongos` have no *persistent* state and consume +minimal system resources. + +The most common practice is to run :program:`mongos` instances on the +same systems as your application servers, but you can maintain +:program:`mongos` instances on the shards or on other dedicated +resources. + +.. note:: + + .. versionchanged:: 2.1 + + Some aggregation operations using the :dbcommand:`aggregate` + command (i.e. :method:`db.collection.aggregate()`,) will cause + :program:`mongos` instances to require more CPU resources than in + previous versions. This modified performance profile may dictate + alternate architecture decisions if you use the :term:`aggregation + framework` extensively in a sharded environment. + +.. _sharding-query-routing: + +Mongos Routing +~~~~~~~~~~~~~~ + +:program:`mongos` uses information from :ref:`config servers +` to route operations to the cluster as +efficiently as possible. In general, operations in a sharded +environment are either: + +1. Targeted at a single shard or a limited group of shards based on + the shard key. + +2. Broadcast to all shards in the cluster that hold documents in a + collection. + +When possible you should design your operations to be as targeted as +possible. Operations have the following targeting characteristics: + +- Query operations broadcast to all shards [#namespace-exception]_ + **unless** the :program:`mongos` can determine which shard or shard + stores this data. + + For queries that include the shard key, :program:`mongos` can target + the query at a specific shard or set of shards, if the portion + of the shard key included in the query is a *prefix* of the shard + key. For example, if the shard key is: + + .. code-block:: javascript + + { a: 1, b: 1, c: 1 } + + The :program:`mongos` *can* route queries that include the full + shard key or either of the following shard key prefixes at a + specific shard or set of shards: + + .. code-block:: javascript + + { a: 1 } + { a: 1, b: 1 } + + Depending on the distribution of data in the cluster and the + selectivity of the query, :program:`mongos` may still have to + contact multiple shards [#possible-all]_ to fulfill these queries. + +- All :method:`insert() ` operations target to + one shard. + +- All single :method:`update() ` operations + target to one shard. This includes :term:`upsert` operations. + +- The :program:`mongos` broadcasts multi-update operations to every + shard. + +- The :program:`mongos` broadcasts :method:`remove() + ` operations to every shard unless the + operation specifies the shard key in full. + +While some operations must broadcast to all shards, you can improve +performance by using as many targeted operations as possible by +ensuring that your operations include the shard key. + +.. [#namespace-exception] If a shard does not store chunks from a + given collection, queries for documents in that collection are not + broadcast to that shard. + +.. [#a/c-as-a-case-of-a] In this example, a :program:`mongos` could + route a query that included ``{ a: 1, c: 1 }`` fields at a specific + subset of shards using the ``{ a: 1 }`` prefix. A :program:`mongos` + cannot route any of the following queries to specific shards + in the cluster: + + .. code-block:: javascript + + { b: 1 } + { c: 1 } + { b: 1, c: 1 } + +.. [#possible-all] :program:`mongos` will route some queries, even + some that include the shard key, to all shards, if needed. + +Sharded Query Response Process +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +To route a query to a :term:`cluster `, +:program:`mongos` uses the following process: + +#. Determine the list of :term:`shards ` that must receive the query. + + In some cases, when the :term:`shard key` or a prefix of the shard + key is a part of the query, the :program:`mongos` can route the + query to a subset of the shards. Otherwise, the :program:`mongos` + must direct the query to *all* shards that hold documents for that + collection. + + .. example:: + + Given the following shard key: + + .. code-block:: javascript + + { zipcode: 1, u_id: 1, c_date: 1 } + + Depending on the distribution of chunks in the cluster, the + :program:`mongos` may be able to target the query at a subset of + shards, if the query contains the following fields: + + .. code-block:: javascript + + { zipcode: 1 } + { zipcode: 1, u_id: 1 } + { zipcode: 1, u_id: 1, c_date: 1 } + +#. Establish a cursor on all targeted shards. + + When the first batch of results returns from the cursors: + + a. For query with sorted results (i.e. using + :method:`cursor.sort()`) the :program:`mongos` performs a merge + sort of all queries. + + b. For a query with unsorted results, the :program:`mongos` returns + a result cursor that "round robins" results from all cursors on + the shards. + + .. versionchanged:: 2.0.5 + Before 2.0.5, the :program:`mongos` exhausted each cursor, + one by one. diff --git a/source/administration/sharding-architectures.txt b/source/administration/sharding-architectures.txt index d327380806a..b0e4a84a8e1 100644 --- a/source/administration/sharding-architectures.txt +++ b/source/administration/sharding-architectures.txt @@ -11,10 +11,8 @@ Sharded Cluster Architectures This document describes the organization and design of :term:`sharded cluster` deployments. -.. seealso:: :ref:`sharding-requirements`. - -Deploying a Test Cluster ------------------------- +Test Cluster +------------ .. warning:: Use this architecture for testing and development only. @@ -22,24 +20,23 @@ You can deploy a very minimal cluster for testing and development. These *non-production* clusters have the following components: -- 1 :ref:`config server `. +- One :ref:`config server `. - At least one :program:`mongod` instance (either :term:`replica sets ` or as a standalone node.) -- 1 :program:`mongos` instance. +- One :program:`mongos` instance. .. _sharding-production-architecture: -Deploying a Production Cluster ------------------------------- +Production Cluster +------------------ -When deploying a production cluster, you must ensure that the -data is redundant and that your systems are highly available. To that -end, a production-level cluster must have the following -components: +In a production cluster, you must ensure that data is redundant and that +your systems are highly available. To that end, a production-level +cluster must have the following components: -- 3 :ref:`config servers `, each residing on a +- Three :ref:`config servers `, each residing on a discrete system. .. note:: @@ -49,21 +46,18 @@ components: multiple shards, you will need to have a group of config servers for each cluster. -- 2 or more :term:`replica sets `, for the :term:`shards +- Two or more :term:`replica sets `, for the :term:`shards `. .. see:: For more information on replica sets see :doc:`/administration/replication-architectures` and :doc:`/replication`. -- :program:`mongos` instances. Typically, you will deploy a single +- Two or more :program:`mongos` instances. Typically, you will deploy a single :program:`mongos` instance on each application server. Alternatively, you may deploy several `mongos` nodes and let your application connect to these via a load balancer. -.. seealso:: :ref:`sharding-procedure-add-shard` and - :ref:`sharding-procedure-remove-shard`. - Sharded and Non-Sharded Data ---------------------------- @@ -74,12 +68,10 @@ deployments some databases and collections will use sharding, while other databases and collections will only reside on a single database instance or replica set (i.e. a :term:`shard`.) -.. note:: - - Regardless of the data architecture of your :term:`sharded cluster`, - ensure that all queries and operations use the :term:`mongos` - router to access the data cluster. Use the :program:`mongos` even - for operations that do not impact the sharded data. +Regardless of the data architecture of your :term:`sharded cluster`, +ensure that all queries and operations use the :term:`mongos` router to +access the data cluster. Use the :program:`mongos` even for operations +that do not impact the sharded data. Every database has a "primary" [#overloaded-primary-term]_ shard that holds all un-sharded collections in that database. All collections @@ -116,7 +108,7 @@ High Availability and MongoDB A :ref:`production ` :term:`cluster` has no single point of failure. This section introduces the -availability concerns for MongoDB deployments, and highlights +availability concerns for MongoDB deployments and highlights potential failure scenarios and available resolutions: - Application servers or :program:`mongos` instances become unavailable. diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index d218684fac8..849a2688acd 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -7,847 +7,7 @@ Sharded Cluster Administration .. default-domain:: mongodb -This document describes common administrative tasks for sharded -clusters. For complete documentation of sharded clusters see the -:doc:`/sharding` section of this manual. -.. contents:: Sharding Procedures: - :backlinks: none - :local: - -.. _sharding-procedure-setup: - -Set up a Sharded Cluster ------------------------- - -Before deploying a cluster, see :ref:`sharding-requirements`. - -For testing purposes, you can run all the required shard :program:`mongod` processes on a -single server. For production, use the configurations described in -:doc:`/administration/replication-architectures`. - -.. include:: /includes/warning-sharding-hostnames.rst - -If you have an existing replica set, you can use the -:doc:`/tutorial/convert-replica-set-to-replicated-shard-cluster` -tutorial as a guide. If you're deploying a cluster from scratch, see -the :doc:`/tutorial/deploy-shard-cluster` tutorial for more detail or -use the following procedure as a quick starting point: - -1. Create data directories for each of the three (3) config server instances. - -#. Start the three config server instances. For example, to start a - config server instance running on TCP port ``27019`` with the data - stored in ``/data/configdb``, type the following: - - .. code-block:: sh - - mongod --configsvr --dbpath /data/configdb --port 27019 - - For additional command options, see :doc:`/reference/mongod` - and :doc:`/reference/configuration-options`. - - .. include:: /includes/note-config-server-startup.rst - -#. Start a :program:`mongos` instance. For example, to start a - :program:`mongos` that connects to config server instance running on the following hosts: - - - ``mongoc0.example.net`` - - ``mongoc1.example.net`` - - ``mongoc2.example.net`` - - You would issue the following command: - - .. code-block:: sh - - mongos --configdb mongoc0.example.net:27019,mongoc1.example.net:27019,mongoc2.example.net:27019 - -#. Connect to one of the :program:`mongos` instances. For example, if - a :program:`mongos` is accessible at ``mongos0.example.net`` on - port ``27017``, issue the following command: - - .. code-block:: sh - - mongo mongos0.example.net - -#. Add shards to the cluster. - - .. note:: In production deployments, all shards should be replica sets. - - To deploy a replica set, see the - :doc:`/tutorial/deploy-replica-set` tutorial. - - From the :program:`mongo` shell connected - to the :program:`mongos` instance, call the :method:`sh.addShard()` - method for each shard to add to the cluster. - - For example: - - .. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - - If ``mongodb0.example.net:27027`` is a member of a replica - set, call the :method:`sh.addShard()` method with an argument that - resembles the following: - - .. code-block:: javascript - - sh.addShard( "/mongodb0.example.net:27027" ) - - Replace, ```` with the name of the replica set, and - MongoDB will discover all other members of the replica set. - Repeat this step for each new shard in your cluster. - - .. optional:: - - You can specify a name for the shard and a maximum size. See - :dbcommand:`addShard`. - - .. note:: - - .. versionchanged:: 2.0.3 - - Before version 2.0.3, you must specify the shard in the following - form: - - .. code-block:: sh - - replicaSetName/,, - - For example, if the name of the replica set is ``repl0``, then - your :method:`sh.addShard()` command would be: - - .. code-block:: javascript - - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) - -#. Enable sharding for each database you want to shard. - While sharding operates on a per-collection basis, you must enable - sharding for each database that holds collections you want to shard. - This step is a meta-data change and will not redistribute your data. - - MongoDB enables sharding on a per-database basis. This is only a - meta-data change and will not redistribute your data. To enable - sharding for a given database, use the :dbcommand:`enableSharding` - command or the :method:`sh.enableSharding()` shell helper. - - .. code-block:: javascript - - db.runCommand( { enableSharding: } ) - - Or: - - .. code-block:: javascript - - sh.enableSharding() - - .. note:: - - MongoDB creates databases automatically upon their first use. - - Once you enable sharding for a database, MongoDB assigns a - :term:`primary shard` for that database, where MongoDB stores all data - before sharding begins. - -.. _sharding-administration-shard-collection: - -#. Enable sharding on a per-collection basis. - - Finally, you must explicitly specify collections to shard. The - collections must belong to a database for which you have enabled - sharding. When you shard a collection, you also choose the shard - key. To shard a collection, run the :dbcommand:`shardCollection` - command or the :method:`sh.shardCollection()` shell helper. - - .. code-block:: javascript - - db.runCommand( { shardCollection: ".", key: { : 1 } } ) - - Or: - - .. code-block:: javascript - - sh.shardCollection(".", ) - - For example: - - .. code-block:: javascript - - db.runCommand( { shardCollection: "myapp.users", key: { username: 1 } } ) - - Or: - - .. code-block:: javascript - - sh.shardCollection("myapp.users", { username: 1 }) - - The choice of shard key is incredibly important: it affects - everything about the cluster from the efficiency of your queries to - the distribution of data. Furthermore, you cannot change a - collection's shard key after setting it. - - See the :ref:`Shard Key Overview ` and the - more in depth documentation of :ref:`Shard Key Qualities - ` to help you select better shard - keys. - - If you do not specify a shard key, MongoDB will shard the - collection using the ``_id`` field. - -Cluster Management ------------------- - -This section outlines procedures for adding and remove shards, as well -as general monitoring and maintenance of a :term:`sharded cluster`. - -.. _sharding-procedure-add-shard: - -Add a Shard to a Cluster -~~~~~~~~~~~~~~~~~~~~~~~~ - -To add a shard to an *existing* sharded cluster, use the following -procedure: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. First, you need to tell the cluster where to find the individual - shards. You can do this using the :dbcommand:`addShard` command or - the :method:`sh.addShard()` helper: - - .. code-block:: javascript - - sh.addShard( ":" ) - - Replace ```` and ```` with the hostname and TCP - port number of where the shard is accessible. - Alternately specify a :term:`replica set` name and at least one - hostname which is a member of the replica set. - - For example: - - .. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - - .. note:: In production deployments, all shards should be replica sets. - - Repeat for each shard in your cluster. - - .. optional:: - - You may specify a "name" as an argument to the - :dbcommand:`addShard` command, as follows: - - .. code-block:: javascript - - db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) - - You cannot specify a name for a shard using the - :method:`sh.addShard()` helper in the :program:`mongo` shell. If - you use the helper or do not specify a shard name, then MongoDB - will assign a name upon creation. - - .. versionchanged:: 2.0.3 - Before version 2.0.3, you must specify the shard in the - following form: the replica set name, followed by a forward - slash, followed by a comma-separated list of seeds for the - replica set. For example, if the name of the replica set is - "myapp1", then your :method:`sh.addShard()` command might resemble: - - .. code-block:: javascript - - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) - -.. note:: - - It may take some time for :term:`chunks ` to migrate to the - new shard. - - For an introduction to balancing, see :ref:`sharding-balancing`. For - lower level information on balancing, see :ref:`sharding-balancing-internals`. - -.. _sharding-procedure-remove-shard: - -Remove a Shard from a Cluster -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To remove a :term:`shard` from a :term:`sharded cluster`, you must: - -- Migrate :term:`chunks ` to another shard or database. - -- Ensure that this shard is not the :term:`primary shard` for any databases in - the cluster. If it is, move the "primary" status for these databases - to other shards. - -- Finally, remove the shard from the cluster's configuration. - -.. note:: - - To successfully migrate data from a shard, the :term:`balancer` - process **must** be active. - -The procedure to remove a shard is as follows: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Determine the name of the shard you will be removing. - - You must specify the name of the shard. You may have specified this - shard name when you first ran the :dbcommand:`addShard` command. If not, - you can find out the name of the shard by running the - :dbcommand:`listShards` or :dbcommand:`printShardingStatus` - commands or the :method:`sh.status()` shell helper. - - The following examples will remove a shard named ``mongodb0`` from the cluster. - -#. Begin removing chunks from the shard. - - Start by running the :dbcommand:`removeShard` command. This will - start "draining" or migrating chunks from the shard you're removing - to another shard in the cluster. - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - This operation will return the following response immediately: - - .. code-block:: javascript - - { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } - - Depending on your network capacity and the amount of data in the - shard, this operation can take anywhere from a few minutes to several - days to complete. - -#. View progress of the migration. - - You can run the :dbcommand:`removeShard` command again at any stage of the - process to view the progress of the migration, as follows: - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - The output should look something like this: - - .. code-block:: javascript - - { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } - - In the ``remaining`` sub-document ``{ chunks: xx, dbs: y }``, a - counter displays the remaining number of chunks that MongoDB must - migrate to other shards and the number of MongoDB databases that have - "primary" status on this shard. - - Continue checking the status of the :dbcommand:`removeShard` command - until the remaining number of chunks to transfer is 0. - -#. Move any databases to other shards in the cluster as needed. - - This is only necessary when removing a shard that is also the - :term:`primary shard` for one or more databases. - - Issue the following command at the :program:`mongo` shell: - - .. code-block:: javascript - - db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) - - This command will migrate all remaining non-sharded data in the - database named ``myapp`` to the shard named ``mongodb1``. - - .. warning:: - - Do not run the :dbcommand:`movePrimary` command until you have *finished* - draining the shard. - - The command will not return until MongoDB completes moving all - data. The response from this command will resemble the following: - - .. code-block:: javascript - - { "primary" : "mongodb1", "ok" : 1 } - -#. Run :dbcommand:`removeShard` again to clean up all metadata - information and finalize the shard removal, as follows: - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - When successful, this command will return a document like this: - - .. code-block:: javascript - - { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } - -Once the value of the ``stage`` field is "completed," you may safely -stop the processes comprising the ``mongodb0`` shard. - -List Databases with Sharding Enabled -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To list the databases that have sharding enabled, query the -``databases`` collection in the :ref:`config-database`. -A database has sharding enabled if the value of the ``partitioned`` -field is ``true``. Connect to a :program:`mongos` instance with a -:program:`mongo` shell, and run the following operation to get a full -list of databases with sharding enabled: - -.. code-block:: javascript - - use config - db.databases.find( { "partitioned": true } ) - -.. example:: You can use the following sequence of commands when to - return a list of all databases in the cluster: - - .. code-block:: javascript - - use config - db.databases.find() - - If this returns the following result set: - - .. code-block:: javascript - - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" } - { "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" } - - Then sharding is only enabled for the ``animals`` database. - -List Shards -~~~~~~~~~~~ - -To list the current set of configured shards, use the :dbcommand:`listShards` -command, as follows: - -.. code-block:: javascript - - use admin - db.runCommand( { listShards : 1 } ) - -View Cluster Details -~~~~~~~~~~~~~~~~~~~~ - -To view cluster details, issue :method:`db.printShardingStatus()` or -:method:`sh.status()`. Both methods return the same output. - -.. example:: In the following example output from :method:`sh.status()` - - - ``sharding version`` displays the version number of the shard - metadata. - - - ``shards`` displays a list of the :program:`mongod` instances - used as shards in the cluster. - - - ``databases`` displays all databases in the cluster, - including database that do not have sharding enabled. - - - The ``chunks`` information for the ``foo`` database displays how - many chunks are on each shard and displays the range of each chunk. - - .. code-block:: javascript - - --- Sharding Status --- - sharding version: { "_id" : 1, "version" : 3 } - shards: - { "_id" : "shard0000", "host" : "m0.example.net:30001" } - { "_id" : "shard0001", "host" : "m3.example2.net:50000" } - databases: - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "animals", "partitioned" : true, "primary" : "shard0000" } - foo.big chunks: - shard0001 1 - shard0000 6 - { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo - { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo - { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo - { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) - { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - foo.large chunks: - shard0001 1 - shard0000 5 - { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) - { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo - { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) - { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } - -Chunk Management ----------------- - -This section describes various operations on :term:`chunks ` in -:term:`sharded clusters `. MongoDB automates most -chunk management operations. However, these chunk management -operations are accessible to administrators for use in some -situations, typically surrounding initial setup, deployment, and data -ingestion. - -.. _sharding-procedure-create-split: - -Split Chunks -~~~~~~~~~~~~ - -Normally, MongoDB splits a :term:`chunk` following inserts when a -chunk exceeds the :ref:`chunk size `. The -:term:`balancer` may migrate recently split chunks to a new shard -immediately if :program:`mongos` predicts future insertions will -benefit from the move. - -MongoDB treats all chunks the same, whether split manually or -automatically by the system. - -.. warning:: - - You cannot merge or combine chunks once you have split them. - -You may want to split chunks manually if: - -- you have a large amount of data in your cluster and very few - :term:`chunks `, - as is the case after deploying a cluster using existing data. - -- you expect to add a large amount of data that would - initially reside in a single chunk or shard. - -.. example:: - - You plan to insert a large amount of data with :term:`shard key` - values between ``300`` and ``400``, *but* all values of your shard - keys are between ``250`` and ``500`` are in a single chunk. - -Use :method:`sh.status()` to determine the current chunks ranges across -the cluster. - -To split chunks manually, use the :dbcommand:`split` command with -operators: ``middle`` and ``find``. The equivalent shell helpers are -:method:`sh.splitAt()` or :method:`sh.splitFind()`. - -.. example:: - - The following command will split the chunk that contains - the value of ``63109`` for the ``zipcode`` field in the ``people`` - collection of the ``records`` database: - - .. code-block:: javascript - - sh.splitFind( "records.people", { "zipcode": 63109 } ) - -:method:`sh.splitFind()` will split the chunk that contains the -*first* document returned that matches this query into two equally -sized chunks. You must specify the full namespace -(i.e. "``.``") of the sharded collection to -:method:`sh.splitFind()`. The query in :method:`sh.splitFind()` need -not contain the shard key, though it almost always makes sense to -query for the shard key in this case, and including the shard key will -expedite the operation. - -Use :method:`sh.splitAt()` to split a chunk in two using the queried -document as the partition point: - -.. code-block:: javascript - - sh.splitAt( "records.people", { "zipcode": 63109 } ) - -However, the location of the document that this query finds with -respect to the other documents in the chunk does not affect how the -chunk splits. - -.. _sharding-administration-pre-splitting: -.. _sharding-administration-create-chunks: - -Create Chunks (Pre-Splitting) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In most situations a :term:`sharded cluster` will create and distribute -chunks automatically without user intervention. However, in a limited -number of use profiles, MongoDB cannot create enough chunks or -distribute data fast enough to support required throughput. Consider -the following scenarios: - -- you must partition an existing data collection that resides on a - single shard. - -- you must ingest a large volume of data into a cluster that - isn't balanced, or where the ingestion of data will lead to an - imbalance of data. - - This can arise in an initial data loading, or in a case where you - must insert a large volume of data into a single chunk, as is the - case when you must insert at the beginning or end of the chunk - range, as is the case for monotonically increasing or decreasing - shard keys. - -Preemptively splitting chunks increases cluster throughput for these -operations, by reducing the overhead of migrating chunks that hold -data during the write operation. MongoDB only creates splits after an -insert operation, and can only migrate a single chunk at a time. Chunk -migrations are resource intensive and further complicated by large -write volume to the migrating chunk. - -.. warning:: - - You can only pre-split an empty collection. When you enable - sharding for a collection that contains data MongoDB automatically - creates splits. Subsequent attempts to create splits manually, can - lead to unpredictable chunk ranges and sizes as well as inefficient - or ineffective balancing behavior. - -To create and migrate chunks manually, use the following procedure: - -#. Split empty chunks in your collection by manually performing - :dbcommand:`split` command on chunks. - - .. example:: - - To create chunks for documents in the ``myapp.users`` - collection, using the ``email`` field as the :term:`shard key`, - use the following operation in the :program:`mongo` shell: - - .. code-block:: javascript - - for ( var x=97; x<97+26; x++ ){ - for( var y=97; y<97+26; y+=6 ) { - var prefix = String.fromCharCode(x) + String.fromCharCode(y); - db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); - } - } - - This assumes a collection size of 100 million documents. - -#. Migrate chunks manually using the :dbcommand:`moveChunk` command: - - .. example:: - - To migrate all of the manually created user profiles evenly, - putting each prefix chunk on the next shard from the other, run - the following commands in the mongo shell: - - .. code-block:: javascript - - var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ]; - for ( var x=97; x<97+26; x++ ){ - for( var y=97; y<97+26; y+=6 ) { - var prefix = String.fromCharCode(x) + String.fromCharCode(y); - db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]}) - } - } - - You can also let the balancer automatically distribute the new - chunks. For an introduction to balancing, see - :ref:`sharding-balancing`. For lower level information on balancing, - see :ref:`sharding-balancing-internals`. - -.. _sharding-balancing-modify-chunk-size: - -Modify Chunk Size -~~~~~~~~~~~~~~~~~ - -When you initialize a sharded cluster, the default chunk size is 64 -megabytes. This default chunk size works well for most deployments. However, if you -notice that automatic migrations are incurring a level of I/O that -your hardware cannot handle, you may want to reduce the chunk -size. For the automatic splits and migrations, a small chunk size -leads to more rapid and frequent migrations. - -To modify the chunk size, use the following procedure: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Issue the following :method:`save() ` - operation: - - .. code-block:: javascript - - db.settings.save( { _id:"chunksize", value: } ) - - Where the value of ```` reflects the new chunk size in - megabytes. Here, you're essentially writing a document whose values - store the global chunk size configuration value. - -.. note:: - - The :setting:`chunkSize` and :option:`--chunkSize ` - options, passed at runtime to the :program:`mongos` **do not** - affect the chunk size after you have initialized the cluster. - - To eliminate confusion you should *always* set chunk size using the - above procedure and never use the runtime options. - -Modifying the chunk size has several limitations: - -- Automatic splitting only occurs when inserting :term:`documents - ` or updating existing documents. - -- If you lower the chunk size it may take time for all chunks to split to - the new size. - -- Splits cannot be "undone." - -If you increase the chunk size, existing chunks must grow through -insertion or updates until they reach the new size. - -.. _sharding-balancing-manual-migration: - -Migrate Chunks -~~~~~~~~~~~~~~ - -In most circumstances, you should let the automatic balancer -migrate :term:`chunks ` between :term:`shards `. -However, you may want to migrate chunks manually in a few cases: - -- If you create chunks by :term:`pre-splitting` the data in your - collection, you will have to migrate chunks manually to distribute - chunks evenly across the shards. Use pre-splitting in limited - situations, to support bulk data ingestion. - -- If the balancer in an active cluster cannot distribute chunks within - the balancing window, then you will have to migrate chunks manually. - -For more information on how chunks move between shards, see -:ref:`sharding-balancing-internals`, in particular the section -:ref:`sharding-chunk-migration`. - -To migrate chunks, use the :dbcommand:`moveChunk` command. - -.. note:: - - To return a list of shards, use the :dbcommand:`listShards` - command. - - Specify shard names using the :dbcommand:`addShard` command - using the ``name`` argument. If you do not specify a name in the - :dbcommand:`addShard` command, MongoDB will assign a name - automatically. - -The following example assumes that the field ``username`` is the -:term:`shard key` for a collection named ``users`` in the ``myapp`` -database, and that the value ``smith`` exists within the :term:`chunk` -you want to migrate. - -To move this chunk, you would issue the following command from a :program:`mongo` -shell connected to any :program:`mongos` instance. - -.. code-block:: javascript - - db.adminCommand({moveChunk : "myapp.users", find : {username : "smith"}, to : "mongodb-shard3.example.net"}) - -This command moves the chunk that includes the shard key value "smith" to the -:term:`shard` named ``mongodb-shard3.example.net``. The command will -block until the migration is complete. - -See :ref:`sharding-administration-create-chunks` for an introduction -to pre-splitting. - -.. versionadded:: 2.2 - :dbcommand:`moveChunk` command has the: ``_secondaryThrottle`` - parameter. When set to ``true``, MongoDB ensures that - :term:`secondary` members have replicated operations before allowing - new chunk migrations. - -.. warning:: - - The :dbcommand:`moveChunk` command may produce the following error - message: - - .. code-block:: none - - The collection's metadata lock is already taken. - - These errors occur when clients have too many open :term:`cursors - ` that access the chunk you are migrating. You can either - wait until the cursors complete their operation or close the - cursors manually. - - .. todo:: insert link to killing a cursor. - -.. index:: bulk insert -.. _sharding-bulk-inserts: - -Strategies for Bulk Inserts in Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -.. todo:: Consider moving to the administrative guide as it's of an - applied nature, or create an applications document for sharding - -.. todo:: link the words "bulk insert" to the bulk insert topic when - it's published - - -Large bulk insert operations including initial data ingestion or -routine data import, can have a significant impact on a :term:`sharded -cluster`. Consider the following strategies and possibilities for -bulk insert operations: - -- If the collection does not have data, then there is only one - :term:`chunk`, which must reside on a single shard. MongoDB must - receive data, create splits, and distribute chunks to the available - shards. To avoid this performance cost, you can pre-split the - collection, as described in :ref:`sharding-administration-pre-splitting`. - -- You can parallelize import processes by sending insert operations to - more than one :program:`mongos` instance. If the collection is - empty, pre-split first, as described in - :ref:`sharding-administration-pre-splitting`. - -- If your shard key increases monotonically during an insert then all - the inserts will go to the last chunk in the collection, which will - always end up on a single shard. Therefore, the insert capacity of the - cluster will never exceed the insert capacity of a single shard. - - If your insert volume is never larger than what a single shard can - process, then there is no problem; however, if the insert volume - exceeds that range, and you cannot avoid a monotonically - increasing shard key, then consider the following modifications to - your application: - - - Reverse all the bits of the shard key to preserve the information - while avoiding the correlation of insertion order and increasing - sequence of values. - - - Swap the first and last 16-bit words to "shuffle" the inserts. - - .. example:: The following example, in C++, swaps the leading and - trailing 16-bit word of :term:`BSON` :term:`ObjectIds ` - generated so that they are no longer monotonically increasing. - - .. code-block:: cpp - - using namespace mongo; - OID make_an_id() { - OID x = OID::gen(); - const unsigned char *p = x.getData(); - swap( (unsigned short&) p[0], (unsigned short&) p[10] ); - return x; - } - - void foo() { - // create an object - BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); - // now we might insert o into a sharded collection... - } - - For information on choosing a shard key, see :ref:`sharding-shard-key` - and see :ref:`Shard Key Internals ` (in - particular, :ref:`sharding-internals-operations-and-reliability` and - :ref:`sharding-internals-choose-shard-key`). .. index:: balancing; operations .. _sharding-balancing-operations: diff --git a/source/core/sharding-security.txt b/source/core/sharding-security.txt new file mode 100644 index 00000000000..c2eca455b68 --- /dev/null +++ b/source/core/sharding-security.txt @@ -0,0 +1,63 @@ +.. index:: sharding; security +.. _sharding-security: + +============================================ +Security Considerations for Sharded Clusters +============================================ + +.. default-domain:: mongodb + +.. todo:: migrate this content to /administration/security.txt when that + document exists. See DOCS-79 for tracking this document. + +.. note:: + + You should always run all :program:`mongod` components in trusted + networking environments that control access to the cluster using + network rules and restrictions to ensure that only known traffic + reaches your :program:`mongod` and :program:`mongos` instances. + +.. warning:: Limitations + + .. versionchanged:: 2.2 + Read only authentication is fully supported in shard + clusters. Previously, in version 2.0, sharded clusters would not + enforce read-only limitations. + + .. versionchanged:: 2.0 + Sharded clusters support authentication. Previously, in version + 1.8, sharded clusters will not support authentication and access + control. You must run your sharded systems in trusted + environments. + +To control access to a sharded cluster, you must set the +:setting:`keyFile` option on all components of the sharded cluster. Use +the :option:`--keyFile ` run-time option or the +:setting:`keyFile` configuration option for all :program:`mongos`, +configuration instances, and shard :program:`mongod` instances. + +There are two classes of security credentials in a sharded cluster: +credentials for "admin" users (i.e. for the :term:`admin database`) and +credentials for all other databases. These credentials reside in +different locations within the cluster and have different roles: + +#. Admin database credentials reside on the config servers, to receive + admin access to the cluster you *must* authenticate a session while + connected to a :program:`mongos` instance using the + :term:`admin database`. + +#. Other database credentials reside on the *primary* shard for the + database. + +This means that you *can* authenticate to these users and databases +while connected directly to the primary shard for a database. However, +for clarity and consistency all interactions between the client and +the database should use a :program:`mongos` instance. + +.. note:: + + Individual shards can store administrative credentials to their + instance, which only permit access to a single shard. MongoDB + stores these credentials in the shards' :term:`admin databases ` and these + credentials are *completely* distinct from the cluster-wide + administrative credentials. diff --git a/source/core/sharding.txt b/source/core/sharding.txt index 35a71f8088b..27730096b10 100644 --- a/source/core/sharding.txt +++ b/source/core/sharding.txt @@ -7,81 +7,53 @@ Sharding Overview .. default-domain:: mongodb -Sharding is MongoDB’s approach to scaling out. Sharding partitions a -database collection and stores the different portions on different -machines. Sharding automatically balances data and load across machines. -By splitting a collection up across multiple machines, MongoDB increases +Sharding is MongoDB’s approach to scaling out. Sharding partitions +database collections and stores the different portions of a collection +on different machines. When your collections become too large for +existing storage, you need only add a new machine and MongoDB +automatically distributes data to the new server. + +Sharding automatically balances the data and load across machines. By +splitting a collection across multiple machines, sharding increases write capacity, provides the ability to support larger working sets, and raises the limits of total data size beyond the physical resources of a single node. -When your collections become too large for existing storage, you need -only add a new machine and MongoDB automatically distributes data to the -new server. - -Sharded Cluster ---------------- - -Sharding occurs within a :term:`sharded cluster`. A typical sharded cluster -consists of the following: - -- Two or more :program:`mongod` instances that serve as the - :term:`shards ` that hold your database collections. Usually - each shard comprises a :term:`replica set`, not a lone - :program:`mongod` instance. - -- Three :program:`mongod` instances that serve as config servers to hold - metadata about the cluster. The metadata maps :term:`chunks - ` to shards. - -- Two or more :program:`mongos` instances that route the reads and - writes to the shards. A :program:`mongos` instance is a lightweight - routing process. - -Sharding Features ------------------ - -With sharding MongoDB automatically distributes data among a -collection of :program:`mongod` instances. Sharding, as implemented in -MongoDB has the following features: - -.. glossary:: - - Range-based Data Partitioning - MongoDB distributes documents among :term:`shards ` based - on the value of the :ref:`shard key `. Each - :term:`chunk` represents a block of :term:`documents ` - with values that fall within a specific range. When chunks grow - beyond the :ref:`chunk size `, MongoDB - divides the chunks into smaller chunks (i.e. :term:`splitting - `) based on the shard key. - - Automatic Data Volume Distribution - The sharding system automatically balances data across the - cluster without intervention from the application - layer. Effective automatic sharding depends on a well chosen - :ref:`shard key `, but requires no - additional complexity, modifications, or intervention from - developers. - - Transparent Query Routing - Sharding is completely transparent to the application layer, - because all connections to a cluster go through - :program:`mongos`. Sharding in MongoDB requires some - :ref:`basic initial configuration `, - but ongoing function is entirely transparent to the application. - - Horizontal Capacity - Sharding increases capacity in two ways: - - #. Effective partitioning of data can provide additional write - capacity by distributing the write load over a number of - :program:`mongod` instances. - - #. Given a shard key with sufficient :ref:`cardinality - `, partitioning data allows - users to increase the potential amount of data to manage - with MongoDB and expand the :term:`working set`. +How Sharding Works +------------------ + +Sharding is enabled on a per-database basis. Once a database is enabled +for sharding, you then choose which collections to shard. + +MongoDB distributes documents among :term:`shards ` based on the +value of the :ref:`shard key `. Each :term:`chunk` +represents a block of :term:`documents ` with values that fall +within a specific range. When chunks grow beyond the :ref:`chunk size +`, MongoDB divides the chunks into smaller chunks +(i.e. :term:`splitting `) based on the shard key. + +The sharding system automatically balances data across the cluster +without intervention from the application layer. Effective automatic +sharding depends on a well chosen :ref:`shard key `, +but requires no additional complexity, modifications, or intervention +from developers. + +Sharding is completely transparent to the application layer, because all +connections to a cluster go through :program:`mongos`. Sharding in +MongoDB requires some :ref:`basic initial configuration +`, but ongoing function is entirely +transparent to the application. + +Sharding increases capacity in two ways: + +#. Effective partitioning of data can provide additional write + capacity by distributing the write load over a number of + :program:`mongod` instances. + +#. Given a shard key with sufficient :ref:`cardinality + `, partitioning data allows users to + increase the potential amount of data to manage with MongoDB and + expand the :term:`working set`. .. index:: shard key single: sharding; shard key @@ -134,252 +106,6 @@ the optimal key. In those situations, computing a special purpose shard key into an additional field or using a compound shard key may help produce one that is more ideal. -.. index:: sharding; config servers -.. index:: config servers -.. _sharding-config-server: - -Config Servers --------------- - -Config servers maintain the shard metadata in a config -database. The :term:`config database ` stores -the relationship between :term:`chunks ` and where they reside -within a :term:`sharded cluster`. Without a config database, the -:program:`mongos` instances would be unable to route queries or write -operations within the cluster. - -Config servers *do not* run as replica sets. Instead, a :term:`cluster -` operates with a group of *three* config servers that use a -two-phase commit process that ensures immediate consistency and -reliability. - -For testing purposes you may deploy a cluster with a single -config server, but this is not recommended for production. - -.. warning:: - - If your cluster has a single config server, this - :program:`mongod` is a single point of failure. If the instance is - inaccessible the cluster is not accessible. If you cannot recover - the data on a config server, the cluster will be inoperable. - - **Always** use three config servers for production deployments. - -The actual load on configuration servers is small because each -:program:`mongos` instances maintains a cached copy of the configuration -database. MongoDB only writes data to the config server to: - -- create splits in existing chunks, which happens as data in - existing chunks exceeds the maximum chunk size. - -- migrate a chunk between shards. - -Additionally, all config servers must be available on initial setup -of a sharded cluster, each :program:`mongos` instance must be able -to write to the ``config.version`` collection. - -If one or two configuration instances become unavailable, the -cluster's metadata becomes *read only*. It is still possible to read -and write data from the shards, but no chunk migrations or splits will -occur until all three servers are accessible. At the same time, config -server data is only read in the following situations: - -- A new :program:`mongos` starts for the first time, or an existing - :program:`mongos` restarts. - -- After a chunk migration, the :program:`mongos` instances update - themselves with the new cluster metadata. - -If all three config servers are inaccessible, you can continue to use -the cluster as long as you don't restart the :program:`mongos` -instances until after config servers are accessible again. If you -restart the :program:`mongos` instances and there are no accessible -config servers, the :program:`mongos` would be unable to direct -queries or write operations to the cluster. - -Because the configuration data is small relative to the amount of data -stored in a cluster, the amount of activity is relatively low, and 100% -up time is not required for a functioning sharded cluster. As a result, -backing up the config servers is not difficult. Backups of config -servers are critical as clusters become totally inoperable when -you lose all configuration instances and data. Precautions to ensure -that the config servers remain available and intact are critical. - -.. note:: - - Configuration servers store metadata for a single sharded cluster. - You must have a separate configuration server or servers for each - cluster you administer. - -.. index:: mongos -.. _sharding-mongos: -.. _sharding-read-operations: - -:program:`mongos` and Querying ------------------------------- - -.. seealso:: :doc:`/reference/mongos` and the :program:`mongos`\-only - settings: :setting:`test` and :setting:`chunkSize`. - -Operations -~~~~~~~~~~ - -The :program:`mongos` provides a single unified interface to a sharded -cluster for applications using MongoDB. Except for the selection of a -:term:`shard key`, application developers and administrators need not -consider any of the :doc:`internal details of sharding `. - -:program:`mongos` caches data from the :ref:`config server -`, and uses this to route operations from -applications and clients to the :program:`mongod` instances. -:program:`mongos` have no *persistent* state and consume -minimal system resources. - -The most common practice is to run :program:`mongos` instances on the -same systems as your application servers, but you can maintain -:program:`mongos` instances on the shards or on other dedicated -resources. - -.. note:: - - .. versionchanged:: 2.1 - - Some aggregation operations using the :dbcommand:`aggregate` - command (i.e. :method:`db.collection.aggregate()`,) will cause - :program:`mongos` instances to require more CPU resources than in - previous versions. This modified performance profile may dictate - alternate architecture decisions if you use the :term:`aggregation - framework` extensively in a sharded environment. - -.. _sharding-query-routing: - -Routing -~~~~~~~ - -:program:`mongos` uses information from :ref:`config servers -` to route operations to the cluster as -efficiently as possible. In general, operations in a sharded -environment are either: - -1. Targeted at a single shard or a limited group of shards based on - the shard key. - -2. Broadcast to all shards in the cluster that hold documents in a - collection. - -When possible you should design your operations to be as targeted as -possible. Operations have the following targeting characteristics: - -- Query operations broadcast to all shards [#namespace-exception]_ - **unless** the :program:`mongos` can determine which shard or shard - stores this data. - - For queries that include the shard key, :program:`mongos` can target - the query at a specific shard or set of shards, if the portion - of the shard key included in the query is a *prefix* of the shard - key. For example, if the shard key is: - - .. code-block:: javascript - - { a: 1, b: 1, c: 1 } - - The :program:`mongos` *can* route queries that include the full - shard key or either of the following shard key prefixes at a - specific shard or set of shards: - - .. code-block:: javascript - - { a: 1 } - { a: 1, b: 1 } - - Depending on the distribution of data in the cluster and the - selectivity of the query, :program:`mongos` may still have to - contact multiple shards [#possible-all]_ to fulfill these queries. - -- All :method:`insert() ` operations target to - one shard. - -- All single :method:`update() ` operations - target to one shard. This includes :term:`upsert` operations. - -- The :program:`mongos` broadcasts multi-update operations to every - shard. - -- The :program:`mongos` broadcasts :method:`remove() - ` operations to every shard unless the - operation specifies the shard key in full. - -While some operations must broadcast to all shards, you can improve -performance by using as many targeted operations as possible by -ensuring that your operations include the shard key. - -.. [#namespace-exception] If a shard does not store chunks from a - given collection, queries for documents in that collection are not - broadcast to that shard. - -.. [#a/c-as-a-case-of-a] In this example, a :program:`mongos` could - route a query that included ``{ a: 1, c: 1 }`` fields at a specific - subset of shards using the ``{ a: 1 }`` prefix. A :program:`mongos` - cannot route any of the following queries to specific shards - in the cluster: - - .. code-block:: javascript - - { b: 1 } - { c: 1 } - { b: 1, c: 1 } - -.. [#possible-all] :program:`mongos` will route some queries, even - some that include the shard key, to all shards, if needed. - -Sharded Query Response Process -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -To route a query to a :term:`cluster `, -:program:`mongos` uses the following process: - -#. Determine the list of :term:`shards ` that must receive the query. - - In some cases, when the :term:`shard key` or a prefix of the shard - key is a part of the query, the :program:`mongos` can route the - query to a subset of the shards. Otherwise, the :program:`mongos` - must direct the query to *all* shards that hold documents for that - collection. - - .. example:: - - Given the following shard key: - - .. code-block:: javascript - - { zipcode: 1, u_id: 1, c_date: 1 } - - Depending on the distribution of chunks in the cluster, the - :program:`mongos` may be able to target the query at a subset of - shards, if the query contains the following fields: - - .. code-block:: javascript - - { zipcode: 1 } - { zipcode: 1, u_id: 1 } - { zipcode: 1, u_id: 1, c_date: 1 } - -#. Establish a cursor on all targeted shards. - - When the first batch of results returns from the cursors: - - a. For query with sorted results (i.e. using - :method:`cursor.sort()`) the :program:`mongos` performs a merge - sort of all queries. - - b. For a query with unsorted results, the :program:`mongos` returns - a result cursor that "round robins" results from all cursors on - the shards. - - .. versionchanged:: 2.0.5 - Before 2.0.5, the :program:`mongos` exhausted each cursor, - one by one. - .. index:: balancing .. _sharding-balancing: @@ -415,64 +141,3 @@ balancing process from impacting production traffic. is entirely transparent to the user and application layer. This documentation is only included for your edification and possible troubleshooting purposes. - -.. index:: sharding; security -.. _sharding-security: - -Security Considerations for Sharded Clusters --------------------------------------------- - -.. todo:: migrate this content to /administration/security.txt when that - document exists. See DOCS-79 for tracking this document. - -.. note:: - - You should always run all :program:`mongod` components in trusted - networking environments that control access to the cluster using - network rules and restrictions to ensure that only known traffic - reaches your :program:`mongod` and :program:`mongos` instances. - -.. warning:: Limitations - - .. versionchanged:: 2.2 - Read only authentication is fully supported in shard - clusters. Previously, in version 2.0, sharded clusters would not - enforce read-only limitations. - - .. versionchanged:: 2.0 - Sharded clusters support authentication. Previously, in version - 1.8, sharded clusters will not support authentication and access - control. You must run your sharded systems in trusted - environments. - -To control access to a sharded cluster, you must set the -:setting:`keyFile` option on all components of the sharded cluster. Use -the :option:`--keyFile ` run-time option or the -:setting:`keyFile` configuration option for all :program:`mongos`, -configuration instances, and shard :program:`mongod` instances. - -There are two classes of security credentials in a sharded cluster: -credentials for "admin" users (i.e. for the :term:`admin database`) and -credentials for all other databases. These credentials reside in -different locations within the cluster and have different roles: - -#. Admin database credentials reside on the config servers, to receive - admin access to the cluster you *must* authenticate a session while - connected to a :program:`mongos` instance using the - :term:`admin database`. - -#. Other database credentials reside on the *primary* shard for the - database. - -This means that you *can* authenticate to these users and databases -while connected directly to the primary shard for a database. However, -for clarity and consistency all interactions between the client and -the database should use a :program:`mongos` instance. - -.. note:: - - Individual shards can store administrative credentials to their - instance, which only permit access to a single shard. MongoDB - stores these credentials in the shards' :term:`admin databases ` and these - credentials are *completely* distinct from the cluster-wide - administrative credentials. diff --git a/source/faq/sharding.txt b/source/faq/sharding.txt index 6592cea0370..16b4d7a2e9b 100644 --- a/source/faq/sharding.txt +++ b/source/faq/sharding.txt @@ -44,8 +44,7 @@ Can I change the shard key after sharding a collection? No. There is no automatic support in MongoDB for changing a shard key -after :ref:`sharding a collection -`. This reality underscores +after sharding a collection. This reality underscores the important of choosing a good :ref:`shard key `. If you *must* change a shard key after sharding a collection, the best option is to: diff --git a/source/sharding.txt b/source/sharding.txt index 29087351ad7..62cbb93b559 100644 --- a/source/sharding.txt +++ b/source/sharding.txt @@ -4,41 +4,40 @@ Sharding .. _sharding-background: -Sharding distributes a -single logical database system across a cluster of machines. Sharding -uses range-based portioning to distribute :term:`documents ` -based on a specific :term:`shard key`. +Sharding distributes a single logical database system across a cluster +of machines. Sharding uses range-based portioning to distribute +:term:`documents ` based on a specific :term:`shard key`. -Concepts --------- +Sharding Concepts +----------------- .. toctree:: :maxdepth: 1 core/sharding + administration/sharded-clusters core/sharding-requirements administration/sharding-architectures + core/sharding-security faq/sharding core/sharding-internals -Procedures and Tutorials ------------------------- - -Create Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~ +Set Up and Manage Sharded Clusters +---------------------------------- .. toctree:: :maxdepth: 1 - administration/sharding tutorial/deploy-shard-cluster + tutorial/manage-shards + tutorial/manage-chunks tutorial/add-shards-to-shard-cluster tutorial/remove-shards-from-cluster tutorial/enforce-unique-keys-for-sharded-collections tutorial/convert-replica-set-to-replicated-shard-cluster Backup Sharded Clusters -~~~~~~~~~~~~~~~~~~~~~~~ +----------------------- .. toctree:: :maxdepth: 1 @@ -50,10 +49,6 @@ Backup Sharded Clusters tutorial/restore-sharded-cluster tutorial/schedule-backup-window-for-sharded-clusters - - - - Reference --------- diff --git a/source/tutorial/deploy-shard-cluster.txt b/source/tutorial/deploy-shard-cluster.txt index 335a5e48371..e7eca5e9670 100644 --- a/source/tutorial/deploy-shard-cluster.txt +++ b/source/tutorial/deploy-shard-cluster.txt @@ -1,194 +1,226 @@ +.. _sharding-procedure-setup: + ======================== -Deploy a Sharded Cluster +Set Up a Sharded Cluster ======================== .. default-domain:: mongodb -This document describes how to deploy a :term:`sharded cluster` for a -standalone :program:`mongod` instance. To deploy a cluster for an -existing replica set, see -:doc:`/tutorial/convert-replica-set-to-replicated-shard-cluster`. +The topics on this page are an ordered sequence of the tasks you perform +to set up a sharded cluster. To set up a sharded cluster, follow the +ordered sequence of tasks: + +- :ref:`sharding-setup-start-cfgsrvr` + +- :ref:`sharding-setup-start-mongos` + +- :ref:`sharding-setup-add-shards` + +- :ref:`sharding-setup-enable-sharding` -Procedure ---------- +- :ref:`sharding-setup-shard-collection` -Before deploying a sharded cluster, see the requirements listed in -:ref:`Requirements for Sharded Clusters `. +Before setting up a cluster, see the following: + +- :ref:`sharding-requirements`. + +- :doc:`/administration/replication-architectures` .. include:: /includes/warning-sharding-hostnames.rst +.. _sharding-setup-start-cfgsrvr: + Start the Config Server Database Instances -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------------------ -The config server database processes are small -:program:`mongod` instances that store the cluster's metadata. You must -have exactly *three* instances in production deployments. Each stores a complete copy -of the cluster's metadata. These instances should run on different servers -to assure good uptime and data safety. +The config server processes are :program:`mongod` instances that store +the cluster's metadata. You designate a :program:`mongod` as a config +server using the :option:`--configsvr ` option. Each +config server stores a complete copy of the cluster's metadata. -Since config database :program:`mongod` instances receive relatively -little traffic and demand only a small portion of system resources, you -can run the instances on systems that run other cluster components. +In production deployments, you must deploy exactly three config server +instances, each running on different servers to assure good uptime and +data safety. In test environments, you can run all three instances on a +single server. -By default a :program:`mongod` :option:`--configsvr ` process stores its data files -in the `/data/configdb` directory. You can specify a different -location using the :setting:`dbpath` run-time option. The config :program:`mongod` instance -is accessible via port ``27019``. In addition to :setting:`configsvr`, -use other :program:`mongod` -:doc:`runtime options ` as needed. +Config server instances receive relatively little traffic and demand +only a small portion of system resources. Therefore, you can run an +instance on a system that runs other cluster components. -To create a data directory for each config server, issue a command -similar to the following for each: +1. Create data directories for each of the three config server + instances. By default, a config server stores its data files in the + `/data/configdb` directory. You can choose a different location. To + create a data directory, issue a command similar to the following: -.. code-block:: sh + .. code-block:: sh - mkdir /data/db/config + mkdir /data/db/config -To start each config server, issue a command similar to the following -for each: +#. Start the three config server instances. Start each by issuing a + command using the following syntax: -.. code-block:: sh + .. code-block:: sh + + mongod --configsvr --dbpath --port + + The default port for config servers is ``27019``. You can specify a + different port. The following example starts a config server using + the default port and default data directory: + + .. code-block:: sh - mongod --configsvr --dbpath --port + mongod --configsvr --dbpath /data/configdb --port 27019 + + For additional command options, see :doc:`/reference/mongod` or + :doc:`/reference/configuration-options`. + + .. include:: /includes/note-config-server-startup.rst + +.. _sharding-setup-start-mongos: Start the ``mongos`` Instances -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------ + +The :program:`mongos` instances are lightweight and do not require data +directories. You can run a :program:`mongos` instance on a system that +runs other cluster components, such as on an application server or a +server running a :program:`mongod` process. By default, a +:program:`mongos` instance runs on port ``27017``. -The :program:`mongos` instance routes queries and operations -to the appropriate shards and interacts with the config server instances. -All client operations targeting a cluster go through :program:`mongos` -instances. +When you start the :program:`mongos` instance, specify the hostnames of +the three config servers, either in the configuration file or as command +line parameters. For operational flexibility, use DNS names for the +config servers rather than explicit IP addresses. If you're not using +resolvable hostname, you cannot change the config server names or IP +addresses without a restarting *every* :program:`mongos` and +:program:`mongod` instance. -:program:`mongos` instances are lightweight and do not require data directories. -A cluster typically -has several instances. For example, you might run one :program:`mongos` -instance on each of your application servers, or you might run a :program:`mongos` instance -on each of the servers running a :program:`mongod` process. +To start a :program:`mongos` instance, issue a command using the following syntax: -You must the specify resolvable hostnames [#names]_ for the *3* config servers -when starting the :program:`mongos` instance. You specify the hostnames either in the -configuration file or as command line parameters. +.. code-block:: sh -The :program:`mongos` instance runs on the default MongoDB TCP port: -``27017``. + mongos --configdb -To start :program:`mongos` instance running on the -``mongos0.example.net`` host, that connects to the config server -instances running on the following hosts: +For example, to start a :program:`mongos` that connects to config server +instance running on the following hosts and on the default ports: -- ``mongoc0.example.net`` -- ``mongoc1.example.net`` -- ``mongoc2.example.net`` +- ``cfg0.example.net`` +- ``cfg1.example.net`` +- ``cfg2.example.net`` You would issue the following command: .. code-block:: sh - mongos --configdb mongoc0.example.net,mongoc1.example.net,mongoc2.example.net + mongos --configdb cfg0.example.net:27019,cfg1.example.net:27019,cfg2.example.net:27019 -.. [#names] Use DNS names for the config servers rather than explicit - IP addresses for operational flexibility. If you're not using resolvable - hostname, - you cannot change the config server names or IP addresses - without a restarting *every* :program:`mongos` and - :program:`mongod` instance. +.. _sharding-setup-add-shards: Add Shards to the Cluster -~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------- -You must deploy at least one :term:`shard` or one :term:`replica set` to -begin. In a production cluster, each shard is a replica set. You -may add additional shards to a running cluster later. For instructions -on deploying replica sets, see :doc:`/tutorial/deploy-replica-set`. +A :term:`shard` can be a standalone :program:`mongod` or or a +:term:`replica set`. In a production environment, each shard +should be a replica set. -This procedure assumes you have two active and initiated replica sets -and describes how to add the first two shards to the cluster. +1. From a :program:`mongo` shell, connect to the :program:`mongos` + instance. Issue a command using the following syntax: -First, connect to one of the :program:`mongos` instances. For example, -if a :program:`mongos` is accessible at ``mongos0.example.net`` on -port ``27017``, issue the following command: + .. code-block:: sh -.. code-block:: sh + mongo --host --port - mongo mongos0.example.net + For example, if a :program:`mongos` is accessible at + ``mongos0.example.net`` on port ``27017``, issue the following + command: -Then, from a :program:`mongo` shell connected to the :program:`mongos` -instance, call the :method:`sh.addShard()` method for each shard that -you want to add to the cluster: + .. code-block:: sh -.. code-block:: javascript + mongo --host mongos0.example.net --port 27017 - sh.addShard( "s0/sfo30.example.net" ) - sh.addShard( "s1/sfo40.example.net" ) +#. Add shards to the cluster using the :method:`sh.addShard()` method + issued from the :program:`mongo` shell. Issue :method:`sh.addShard()` + separately for each shard in the cluster. If the shard is a replica + set, specify the name of the set and one member of the set. + :program:`mongos` will discover the names of other members of the + set. In production deployments, all shards should be replica sets. -If the host you are adding is a member of a replica set, you -*must* specify the name of the replica set. :program:`mongos` -will discover the names of other members of the replica set based on -the name and the hostname you provide. + The following example adds a shard for a standalone :program:`mongod`. + To optionally specify a name for the shard or a maximum size, see + :dbcommand:`addShard`: -These operations add two shards, provided by: + .. code-block:: javascript -- the replica set named ``s0``, that includes the - ``sfo30.example.net`` host. + sh.addShard( "mongodb0.example.net:27027" ) -- the replica set name and ``s1``, that includes the - ``sfo40.example.net`` host. + The following example adds a shard for a replica set + named ``rs1``: -.. admonition:: All shards should be replica sets + .. code-block:: javascript - .. versionchanged:: 2.0.3 + sh.addShard( "rs1/mongodb0.example.net:27027" ) - After version 2.0.3, you may use the above form to add replica - sets to a cluster. The cluster will automatically discover - the other members of the replica set and note their names - accordingly. + .. note:: - Before version 2.0.3, you must specify the shard in the - following form: the replica set name, followed by a forward - slash, followed by a comma-separated list of seeds for the - replica set. For example, if the name of the replica set is - ``sh0``, and the replica set were to have three members, then your :method:`sh.addShard` command might resemble: + .. versionchanged:: 2.0.3 - .. code-block:: javascript + For MongoDB versions prior to 2.0.3, if you add a replica set as a + shard, you must specify all members of the replica set. For + example, if the name of the replica set is ``repl0``, then your + :method:`sh.addShard()` command would be: + + .. code-block:: javascript + + sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) + +.. _sharding-setup-enable-sharding: - sh.addShard( "sh0/sfo30.example.net,sfo31.example.net,sfo32.example.net" ) +Enable Sharding for a Database +------------------------------ -The :method:`sh.addShard()` helper in the :program:`mongo` shell is a wrapper for -the :dbcommand:`addShard` :term:`database command`. +Before you can shard a collection, you must enable sharding for the +collection's database. Enabling sharding for a database does not +redistribute data but make it possible to shard the collections in that +database. -Enable Sharding for Databases -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Once you enable sharding for a database, MongoDB assigns a +:term:`primary shard` for that database where MongoDB stores all data +before sharding begins. -While sharding operates on a per-collection basis, you must enable -sharding for each database that holds collections you want to shard. A -single cluster may have many databases, with each database housing -collections. +1. From a :program:`mongo` shell, connect to the :program:`mongos` + instance. Issue a command using the following syntax: -Use the following operation in a :program:`mongo` shell session -connected to a :program:`mongos` instance in your cluster: + .. code-block:: sh + + mongo --host --port + +#. Issue the :method:`sh.enableSharding()` method, specifying the name + of the database for which to enable sharding. Use the following syntax: + + .. code-block:: javascript + + sh.enableSharding() + +Optionally, you can enable sharding for a database using the +:dbcommand:`enableSharding` command, which uses the following syntax: .. code-block:: javascript - sh.enableSharding("records") + db.runCommand( { enableSharding: } ) -Where ``records`` is the name of the database that holds the collection -you want to shard. :method:`sh.enableSharding()` is a wrapper -around the :dbcommand:`enableSharding` :term:`database command`. You -can enable sharding for multiple databases in the cluster. +.. _sharding-setup-shard-collection: Enable Sharding for Collections -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +------------------------------- -You can enable sharding on a per-collection basis. Because -MongoDB uses "range based sharding," you must specify the :term:`shard -key` MongoDB uses to distribute your documents among the -shards. For more information, see the -:ref:`overview of shard keys `. +You enable sharding on a per-collection basis. When you do, you +specify the :term:`shard key` MongoDB uses to distribute your documents +among the shards. For more information, see the +:ref:`sharding-shard-key`. To enable sharding for a collection, use the -:method:`sh.shardCollection()` helper in the :program:`mongo` shell. -The helper provides a wrapper around the :dbcommand:`shardCollection` -:term:`database command` and has the following prototype form: +:method:`sh.shardCollection()` method in the :program:`mongo` shell. +The command uses the following syntax: .. code-block:: javascript @@ -200,49 +232,48 @@ of your database, which consists of the name of your database, a dot represents your shard key, which you specify in the same form as you would an :method:`index ` key pattern. -Consider the following example invocations of -:method:`sh.shardCollection()`: +.. example:: The following sequence of commands shards four collections: -.. code-block:: javascript + .. code-block:: javascript - sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) - sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) - sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) - sh.shardCollection("events.alerts", { "hashed_id": 1 } ) + sh.shardCollection("records.people", { "zipcode": 1, "name": 1 } ) + sh.shardCollection("people.addresses", { "state": 1, "_id": 1 } ) + sh.shardCollection("assets.chairs", { "type": 1, "_id": 1 } ) + sh.shardCollection("events.alerts", { "hashed_id": 1 } ) -In order, these operations shard: + In order, these operations shard: -#. The ``people`` collection in the ``records`` database using the shard key - ``{ "zipcode": 1, "name": 1 }``. + #. The ``people`` collection in the ``records`` database using the + shard key ``{ "zipcode": 1, "name": 1 }``. - This shard key distributes documents by the value of the - ``zipcode`` field. If a number of documents have the same value for - this field, then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``name`` - field. + This shard key distributes documents by the value of the + ``zipcode`` field. If a number of documents have the same value + for this field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``name`` + field. -#. The ``addresses`` collection in the ``people`` database using the shard key - ``{ "state": 1, "_id": 1 }``. + #. The ``addresses`` collection in the ``people`` database using the + shard key ``{ "state": 1, "_id": 1 }``. - This shard key distributes documents by the value of the ``state`` - field. If a number of documents have the same value for this field, - then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``_id`` - field. + This shard key distributes documents by the value of the ``state`` + field. If a number of documents have the same value for this + field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``_id`` + field. -#. The ``chairs`` collection in the ``assets`` database using the shard key - ``{ "type": 1, "_id": 1 }``. + #. The ``chairs`` collection in the ``assets`` database using the shard key + ``{ "type": 1, "_id": 1 }``. - This shard key distributes documents by the value of the ``type`` - field. If a number of documents have the same value for this field, - then that :term:`chunk` will be :ref:`splitable - ` by the values of the ``_id`` - field. + This shard key distributes documents by the value of the ``type`` + field. If a number of documents have the same value for this + field, then that :term:`chunk` will be :ref:`splitable + ` by the values of the ``_id`` + field. -#. The ``alerts`` collection in the ``events`` database using the shard key - ``{ "hashed_id": 1 }``. + #. The ``alerts`` collection in the ``events`` database using the shard key + ``{ "hashed_id": 1 }``. - This shard key distributes documents by the value of the - ``hashed_id`` field. Presumably this is a computed value that - holds the hash of some value in your documents and is able to - evenly distribute documents throughout your cluster. + This shard key distributes documents by the value of the + ``hashed_id`` field. Presumably this is a computed value that + holds the hash of some value in your documents and is able to + evenly distribute documents throughout your cluster. diff --git a/source/tutorial/manage-chunks.txt b/source/tutorial/manage-chunks.txt new file mode 100644 index 00000000000..ad4d775384c --- /dev/null +++ b/source/tutorial/manage-chunks.txt @@ -0,0 +1,383 @@ +.. _sharding-manage-chunks: + +================================== +Manage Chunks in a Sharded Cluster +================================== + +.. default-domain:: mongodb + +This page describes various operations on :term:`chunks ` in +:term:`sharded clusters `. MongoDB automates most +chunk management operations. However, these chunk management +operations are accessible to administrators for use in some +situations, typically surrounding initial setup, deployment, and data +ingestion. + +This page includes the following: + +- :ref:`sharding-administration-create-chunks` + +- :ref:`sharding-balancing-modify-chunk-size` + +- :ref:`sharding-balancing-manual-migration` + +- :ref:`sharding-bulk-inserts` + +- :ref:`sharding-procedure-create-split` + +.. _sharding-procedure-create-split: + +Split Chunks +------------ + +Normally, MongoDB splits a :term:`chunk` following inserts when a +chunk exceeds the :ref:`chunk size `. The +:term:`balancer` may migrate recently split chunks to a new shard +immediately if :program:`mongos` predicts future insertions will +benefit from the move. + +MongoDB treats all chunks the same, whether split manually or +automatically by the system. + +.. warning:: + + You cannot merge or combine chunks once you have split them. + +You may want to split chunks manually if: + +- you have a large amount of data in your cluster and very few + :term:`chunks `, + as is the case after deploying a cluster using existing data. + +- you expect to add a large amount of data that would + initially reside in a single chunk or shard. + +.. example:: + + You plan to insert a large amount of data with :term:`shard key` + values between ``300`` and ``400``, *but* all values of your shard + keys are between ``250`` and ``500`` are in a single chunk. + +Use :method:`sh.status()` to determine the current chunks ranges across +the cluster. + +To split chunks manually, use the :dbcommand:`split` command with +operators: ``middle`` and ``find``. The equivalent shell helpers are +:method:`sh.splitAt()` or :method:`sh.splitFind()`. + +.. example:: + + The following command will split the chunk that contains + the value of ``63109`` for the ``zipcode`` field in the ``people`` + collection of the ``records`` database: + + .. code-block:: javascript + + sh.splitFind( "records.people", { "zipcode": 63109 } ) + +:method:`sh.splitFind()` will split the chunk that contains the +*first* document returned that matches this query into two equally +sized chunks. You must specify the full namespace +(i.e. "``.``") of the sharded collection to +:method:`sh.splitFind()`. The query in :method:`sh.splitFind()` need +not contain the shard key, though it almost always makes sense to +query for the shard key in this case, and including the shard key will +expedite the operation. + +Use :method:`sh.splitAt()` to split a chunk in two using the queried +document as the partition point: + +.. code-block:: javascript + + sh.splitAt( "records.people", { "zipcode": 63109 } ) + +However, the location of the document that this query finds with +respect to the other documents in the chunk does not affect how the +chunk splits. + +.. _sharding-administration-pre-splitting: +.. _sharding-administration-create-chunks: + +Create Chunks (Pre-Splitting) +----------------------------- + +In most situations a :term:`sharded cluster` will create and distribute +chunks automatically without user intervention. However, in a limited +number of use profiles, MongoDB cannot create enough chunks or +distribute data fast enough to support required throughput. Consider +the following scenarios: + +- you must partition an existing data collection that resides on a + single shard. + +- you must ingest a large volume of data into a cluster that + isn't balanced, or where the ingestion of data will lead to an + imbalance of data. + + This can arise in an initial data loading, or in a case where you + must insert a large volume of data into a single chunk, as is the + case when you must insert at the beginning or end of the chunk + range, as is the case for monotonically increasing or decreasing + shard keys. + +Preemptively splitting chunks increases cluster throughput for these +operations, by reducing the overhead of migrating chunks that hold +data during the write operation. MongoDB only creates splits after an +insert operation, and can only migrate a single chunk at a time. Chunk +migrations are resource intensive and further complicated by large +write volume to the migrating chunk. + +.. warning:: + + You can only pre-split an empty collection. When you enable + sharding for a collection that contains data MongoDB automatically + creates splits. Subsequent attempts to create splits manually, can + lead to unpredictable chunk ranges and sizes as well as inefficient + or ineffective balancing behavior. + +To create and migrate chunks manually, use the following procedure: + +#. Split empty chunks in your collection by manually performing + :dbcommand:`split` command on chunks. + + .. example:: + + To create chunks for documents in the ``myapp.users`` + collection, using the ``email`` field as the :term:`shard key`, + use the following operation in the :program:`mongo` shell: + + .. code-block:: javascript + + for ( var x=97; x<97+26; x++ ){ + for( var y=97; y<97+26; y+=6 ) { + var prefix = String.fromCharCode(x) + String.fromCharCode(y); + db.runCommand( { split : "myapp.users" , middle : { email : prefix } } ); + } + } + + This assumes a collection size of 100 million documents. + +#. Migrate chunks manually using the :dbcommand:`moveChunk` command: + + .. example:: + + To migrate all of the manually created user profiles evenly, + putting each prefix chunk on the next shard from the other, run + the following commands in the mongo shell: + + .. code-block:: javascript + + var shServer = [ "sh0.example.net", "sh1.example.net", "sh2.example.net", "sh3.example.net", "sh4.example.net" ]; + for ( var x=97; x<97+26; x++ ){ + for( var y=97; y<97+26; y+=6 ) { + var prefix = String.fromCharCode(x) + String.fromCharCode(y); + db.adminCommand({moveChunk : "myapp.users", find : {email : prefix}, to : shServer[(y-97)/6]}) + } + } + + You can also let the balancer automatically distribute the new + chunks. For an introduction to balancing, see + :ref:`sharding-balancing`. For lower level information on balancing, + see :ref:`sharding-balancing-internals`. + +.. _sharding-balancing-modify-chunk-size: + +Modify Chunk Size +----------------- + +When you initialize a sharded cluster, the default chunk size is 64 +megabytes. This default chunk size works well for most deployments. However, if you +notice that automatic migrations are incurring a level of I/O that +your hardware cannot handle, you may want to reduce the chunk +size. For the automatic splits and migrations, a small chunk size +leads to more rapid and frequent migrations. + +To modify the chunk size, use the following procedure: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Issue the following :method:`save() ` + operation: + + .. code-block:: javascript + + db.settings.save( { _id:"chunksize", value: } ) + + Where the value of ```` reflects the new chunk size in + megabytes. Here, you're essentially writing a document whose values + store the global chunk size configuration value. + +.. note:: + + The :setting:`chunkSize` and :option:`--chunkSize ` + options, passed at runtime to the :program:`mongos` **do not** + affect the chunk size after you have initialized the cluster. + + To eliminate confusion you should *always* set chunk size using the + above procedure and never use the runtime options. + +Modifying the chunk size has several limitations: + +- Automatic splitting only occurs when inserting :term:`documents + ` or updating existing documents. + +- If you lower the chunk size it may take time for all chunks to split to + the new size. + +- Splits cannot be "undone." + +If you increase the chunk size, existing chunks must grow through +insertion or updates until they reach the new size. + +.. _sharding-balancing-manual-migration: + +Migrate Chunks +-------------- + +In most circumstances, you should let the automatic balancer +migrate :term:`chunks ` between :term:`shards `. +However, you may want to migrate chunks manually in a few cases: + +- If you create chunks by :term:`pre-splitting` the data in your + collection, you will have to migrate chunks manually to distribute + chunks evenly across the shards. Use pre-splitting in limited + situations, to support bulk data ingestion. + +- If the balancer in an active cluster cannot distribute chunks within + the balancing window, then you will have to migrate chunks manually. + +For more information on how chunks move between shards, see +:ref:`sharding-balancing-internals`, in particular the section +:ref:`sharding-chunk-migration`. + +To migrate chunks, use the :dbcommand:`moveChunk` command. + +.. note:: + + To return a list of shards, use the :dbcommand:`listShards` + command. + + Specify shard names using the :dbcommand:`addShard` command + using the ``name`` argument. If you do not specify a name in the + :dbcommand:`addShard` command, MongoDB will assign a name + automatically. + +The following example assumes that the field ``username`` is the +:term:`shard key` for a collection named ``users`` in the ``myapp`` +database, and that the value ``smith`` exists within the :term:`chunk` +you want to migrate. + +To move this chunk, you would issue the following command from a :program:`mongo` +shell connected to any :program:`mongos` instance. + +.. code-block:: javascript + + db.adminCommand({moveChunk : "myapp.users", find : {username : "smith"}, to : "mongodb-shard3.example.net"}) + +This command moves the chunk that includes the shard key value "smith" to the +:term:`shard` named ``mongodb-shard3.example.net``. The command will +block until the migration is complete. + +See :ref:`sharding-administration-create-chunks` for an introduction +to pre-splitting. + +.. versionadded:: 2.2 + :dbcommand:`moveChunk` command has the: ``_secondaryThrottle`` + parameter. When set to ``true``, MongoDB ensures that + :term:`secondary` members have replicated operations before allowing + new chunk migrations. + +.. warning:: + + The :dbcommand:`moveChunk` command may produce the following error + message: + + .. code-block:: none + + The collection's metadata lock is already taken. + + These errors occur when clients have too many open :term:`cursors + ` that access the chunk you are migrating. You can either + wait until the cursors complete their operation or close the + cursors manually. + + .. todo:: insert link to killing a cursor. + +.. index:: bulk insert +.. _sharding-bulk-inserts: + +Strategies for Bulk Inserts in Sharded Clusters +----------------------------------------------- + +.. todo:: Consider moving to the administrative guide as it's of an + applied nature, or create an applications document for sharding + +.. todo:: link the words "bulk insert" to the bulk insert topic when + it's published + + +Large bulk insert operations including initial data ingestion or +routine data import, can have a significant impact on a :term:`sharded +cluster`. Consider the following strategies and possibilities for +bulk insert operations: + +- If the collection does not have data, then there is only one + :term:`chunk`, which must reside on a single shard. MongoDB must + receive data, create splits, and distribute chunks to the available + shards. To avoid this performance cost, you can pre-split the + collection, as described in :ref:`sharding-administration-pre-splitting`. + +- You can parallelize import processes by sending insert operations to + more than one :program:`mongos` instance. If the collection is + empty, pre-split first, as described in + :ref:`sharding-administration-pre-splitting`. + +- If your shard key increases monotonically during an insert then all + the inserts will go to the last chunk in the collection, which will + always end up on a single shard. Therefore, the insert capacity of the + cluster will never exceed the insert capacity of a single shard. + + If your insert volume is never larger than what a single shard can + process, then there is no problem; however, if the insert volume + exceeds that range, and you cannot avoid a monotonically + increasing shard key, then consider the following modifications to + your application: + + - Reverse all the bits of the shard key to preserve the information + while avoiding the correlation of insertion order and increasing + sequence of values. + + - Swap the first and last 16-bit words to "shuffle" the inserts. + + .. example:: The following example, in C++, swaps the leading and + trailing 16-bit word of :term:`BSON` :term:`ObjectIds ` + generated so that they are no longer monotonically increasing. + + .. code-block:: cpp + + using namespace mongo; + OID make_an_id() { + OID x = OID::gen(); + const unsigned char *p = x.getData(); + swap( (unsigned short&) p[0], (unsigned short&) p[10] ); + return x; + } + + void foo() { + // create an object + BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); + // now we might insert o into a sharded collection... + } + + For information on choosing a shard key, see :ref:`sharding-shard-key` + and see :ref:`Shard Key Internals ` (in + particular, :ref:`sharding-internals-operations-and-reliability` and + :ref:`sharding-internals-choose-shard-key`). + diff --git a/source/tutorial/manage-shards.txt b/source/tutorial/manage-shards.txt new file mode 100644 index 00000000000..07b5f3db6f4 --- /dev/null +++ b/source/tutorial/manage-shards.txt @@ -0,0 +1,308 @@ +.. _sharding-manage-shards: + +============= +Manage Shards +============= + +.. default-domain:: mongodb + +This page includes the following: + +- :ref:`sharding-procedure-add-shard` + +- :ref:`sharding-procedure-remove-shard` + +- :ref:`sharding-procedure-list-databases` + +- :ref:`sharding-procedure-list-shards` + +- :ref:`sharding-procedure-view-clusters` + +.. _sharding-procedure-add-shard: + +Add a Shard to a Cluster +------------------------ + +To add a shard to an *existing* sharded cluster, use the following +procedure: + +#. Connect to a :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. First, you need to tell the cluster where to find the individual + shards. You can do this using the :dbcommand:`addShard` command or + the :method:`sh.addShard()` helper: + + .. code-block:: javascript + + sh.addShard( ":" ) + + Replace ```` and ```` with the hostname and TCP + port number of where the shard is accessible. + Alternately specify a :term:`replica set` name and at least one + hostname which is a member of the replica set. + + For example: + + .. code-block:: javascript + + sh.addShard( "mongodb0.example.net:27027" ) + + .. note:: In production deployments, all shards should be replica sets. + + Repeat for each shard in your cluster. + + .. optional:: + + You may specify a "name" as an argument to the + :dbcommand:`addShard` command, as follows: + + .. code-block:: javascript + + db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) + + You cannot specify a name for a shard using the + :method:`sh.addShard()` helper in the :program:`mongo` shell. If + you use the helper or do not specify a shard name, then MongoDB + will assign a name upon creation. + + .. versionchanged:: 2.0.3 + Before version 2.0.3, you must specify the shard in the + following form: the replica set name, followed by a forward + slash, followed by a comma-separated list of seeds for the + replica set. For example, if the name of the replica set is + "myapp1", then your :method:`sh.addShard()` command might resemble: + + .. code-block:: javascript + + sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) + +.. note:: + + It may take some time for :term:`chunks ` to migrate to the + new shard. + + For an introduction to balancing, see :ref:`sharding-balancing`. For + lower level information on balancing, see :ref:`sharding-balancing-internals`. + +.. _sharding-procedure-remove-shard: + +Remove a Shard from a Cluster +----------------------------- + +To remove a :term:`shard` from a :term:`sharded cluster`, you must: + +- Migrate :term:`chunks ` to another shard or database. + +- Ensure that this shard is not the :term:`primary shard` for any databases in + the cluster. If it is, move the "primary" status for these databases + to other shards. + +- Finally, remove the shard from the cluster's configuration. + +.. note:: + + To successfully migrate data from a shard, the :term:`balancer` + process **must** be active. + +The procedure to remove a shard is as follows: + +#. Connect to a :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Determine the name of the shard you will be removing. + + You must specify the name of the shard. You may have specified this + shard name when you first ran the :dbcommand:`addShard` command. If not, + you can find out the name of the shard by running the + :dbcommand:`listShards` or :dbcommand:`printShardingStatus` + commands or the :method:`sh.status()` shell helper. + + The following examples will remove a shard named ``mongodb0`` from the cluster. + +#. Begin removing chunks from the shard. + + Start by running the :dbcommand:`removeShard` command. This will + start "draining" or migrating chunks from the shard you're removing + to another shard in the cluster. + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + This operation will return the following response immediately: + + .. code-block:: javascript + + { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } + + Depending on your network capacity and the amount of data in the + shard, this operation can take anywhere from a few minutes to several + days to complete. + +#. View progress of the migration. + + You can run the :dbcommand:`removeShard` command again at any stage of the + process to view the progress of the migration, as follows: + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + The output should look something like this: + + .. code-block:: javascript + + { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } + + In the ``remaining`` sub-document ``{ chunks: xx, dbs: y }``, a + counter displays the remaining number of chunks that MongoDB must + migrate to other shards and the number of MongoDB databases that have + "primary" status on this shard. + + Continue checking the status of the :dbcommand:`removeShard` command + until the remaining number of chunks to transfer is 0. + +#. Move any databases to other shards in the cluster as needed. + + This is only necessary when removing a shard that is also the + :term:`primary shard` for one or more databases. + + Issue the following command at the :program:`mongo` shell: + + .. code-block:: javascript + + db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) + + This command will migrate all remaining non-sharded data in the + database named ``myapp`` to the shard named ``mongodb1``. + + .. warning:: + + Do not run the :dbcommand:`movePrimary` command until you have *finished* + draining the shard. + + The command will not return until MongoDB completes moving all + data. The response from this command will resemble the following: + + .. code-block:: javascript + + { "primary" : "mongodb1", "ok" : 1 } + +#. Run :dbcommand:`removeShard` again to clean up all metadata + information and finalize the shard removal, as follows: + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + When successful, this command will return a document like this: + + .. code-block:: javascript + + { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } + +Once the value of the ``stage`` field is "completed," you may safely +stop the processes comprising the ``mongodb0`` shard. + + +.. _sharding-procedure-list-databases: + +List Databases with Sharding Enabled +------------------------------------ + +To list the databases that have sharding enabled, query the +``databases`` collection in the :ref:`config-database`. +A database has sharding enabled if the value of the ``partitioned`` +field is ``true``. Connect to a :program:`mongos` instance with a +:program:`mongo` shell, and run the following operation to get a full +list of databases with sharding enabled: + +.. code-block:: javascript + + use config + db.databases.find( { "partitioned": true } ) + +.. example:: You can use the following sequence of commands when to + return a list of all databases in the cluster: + + .. code-block:: javascript + + use config + db.databases.find() + + If this returns the following result set: + + .. code-block:: javascript + + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" } + { "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" } + + Then sharding is only enabled for the ``animals`` database. + +.. _sharding-procedure-list-shards: + +List Shards +----------- + +To list the current set of configured shards, use the :dbcommand:`listShards` +command, as follows: + +.. code-block:: javascript + + use admin + db.runCommand( { listShards : 1 } ) + +.. _sharding-procedure-view-clusters: + +View Cluster Details +-------------------- + +To view cluster details, issue :method:`db.printShardingStatus()` or +:method:`sh.status()`. Both methods return the same output. + +.. example:: In the following example output from :method:`sh.status()` + + - ``sharding version`` displays the version number of the shard + metadata. + + - ``shards`` displays a list of the :program:`mongod` instances + used as shards in the cluster. + + - ``databases`` displays all databases in the cluster, + including database that do not have sharding enabled. + + - The ``chunks`` information for the ``foo`` database displays how + many chunks are on each shard and displays the range of each chunk. + + .. code-block:: javascript + + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "shard0000", "host" : "m0.example.net:30001" } + { "_id" : "shard0001", "host" : "m3.example2.net:50000" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "animals", "partitioned" : true, "primary" : "shard0000" } + foo.big chunks: + shard0001 1 + shard0000 6 + { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo + { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo + { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo + { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) + { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + foo.large chunks: + shard0001 1 + shard0000 5 + { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) + { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo + { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) + { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } From aa96bd88ea5238110b9159abc0c88c29847c7eff Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 15 Jan 2013 21:47:39 -0500 Subject: [PATCH 05/15] DOCS-987 ongoing --- .../administration/sharding-config-server.txt | 225 ++++++++++ .../sharding-troubleshooting.txt | 152 +++++++ source/administration/sharding.txt | 416 +----------------- source/sharding.txt | 27 +- source/tutorial/manage-balancer.txt | 204 +++++++++ 5 files changed, 608 insertions(+), 416 deletions(-) create mode 100644 source/administration/sharding-config-server.txt create mode 100644 source/administration/sharding-troubleshooting.txt create mode 100644 source/tutorial/manage-balancer.txt diff --git a/source/administration/sharding-config-server.txt b/source/administration/sharding-config-server.txt new file mode 100644 index 00000000000..13cb51cb6ad --- /dev/null +++ b/source/administration/sharding-config-server.txt @@ -0,0 +1,225 @@ +.. index:: config servers; operations +.. _sharding-procedure-config-server: + +===================== +Manage Config Servers +===================== + +.. default-domain:: mongodb + +:ref:`Config servers ` store all cluster metadata, most importantly, +the mapping from :term:`chunks ` to :term:`shards `. +This section provides an overview of the basic +procedures to migrate, replace, and maintain these servers. + +This page includes the following: + +- :ref:`sharding-config-server-deploy-three` + +- :ref:`sharding-process-config-server-migrate-same-hostname` + +- :ref:`sharding-process-config-server-migrate-different-hostname` + +- :ref:`sharding-config-server-replace` + +- :ref:`sharding-config-server-backup` + +.. _sharding-config-server-deploy-three: + +Deploy Three Config Servers for Production Deployments +------------------------------------------------------ + +For redundancy, all production :term:`sharded clusters ` +should deploy three config servers processes on three different +machines. + +Do not use only a single config server for production deployments. +Only use a single config server deployments for testing. You should +upgrade to three config servers immediately if you are shifting to +production. The following process shows how to convert a test +deployment with only one config server to production deployment with +three config servers. + +#. Shut down all existing MongoDB processes. This includes: + + - all :program:`mongod` instances or :term:`replica sets ` + that provide your shards. + + - the :program:`mongod` instance that provides your existing config + database. + + - all :program:`mongos` instances in your cluster. + +#. Copy the entire :setting:`dbpath` file system tree from the + existing config server to the two machines that will provide the + additional config servers. These commands, issued on the system + with the existing :ref:`config-database`, ``mongo-config0.example.net`` may + resemble the following: + + .. code-block:: sh + + rsync -az /data/configdb mongo-config1.example.net:/data/configdb + rsync -az /data/configdb mongo-config2.example.net:/data/configdb + +#. Start all three config servers, using the same invocation that you + used for the single config server. + + .. code-block:: sh + + mongod --configsvr + +#. Restart all shard :program:`mongod` and :program:`mongos` processes. + +.. _sharding-process-config-server-migrate-same-hostname: + +Migrate Config Servers with the Same Hostname +--------------------------------------------- + +Use this process when you need to migrate a config server to a new +system but the new system will be accessible using the same host +name. + +#. Shut down the config server that you're moving. + + This will render all config data for your cluster :ref:`read only + `. + +#. Change the DNS entry that points to the system that provided the old + config server, so that the *same* hostname points to the new + system. + + How you do this depends on how you organize your DNS and + hostname resolution services. + +#. Move the entire :setting:`dbpath` file system tree from the old + config server to the new config server. This command, issued on the + old config server system, may resemble the following: + + .. code-block:: sh + + rsync -az /data/configdb mongo-config0.example.net:/data/configdb + +#. Start the config instance on the new system. The default invocation + is: + + .. code-block:: sh + + mongod --configsvr + +When you start the third config server, your cluster will become +writable and it will be able to create new splits and migrate chunks +as needed. + +.. _sharding-process-config-server-migrate-different-hostname: + +Migrate Config Servers with Different Hostnames +----------------------------------------------- + +Use this process when you need to migrate a :ref:`config-database` to a new +server and it *will not* be accessible via the same hostname. If +possible, avoid changing the hostname so that you can use the +:ref:`previous procedure `. + +#. Shut down the :ref:`config server ` you're moving. + + This will render all config data for your cluster "read only:" + + .. code-block:: sh + + rsync -az /data/configdb mongodb.config2.example.net:/data/configdb + +#. Start the config instance on the new system. The default invocation + is: + + .. code-block:: sh + + mongod --configsvr + +#. Shut down all existing MongoDB processes. This includes: + + - all :program:`mongod` instances or :term:`replica sets ` + that provide your shards. + + - the :program:`mongod` instances that provide your existing + :ref:`config databases `. + + - all :program:`mongos` instances in your cluster. + +#. Restart all :program:`mongod` processes that provide the shard + servers. + +#. Update the :option:`--configdb ` parameter (or + :setting:`configdb`) for all :program:`mongos` instances and + restart all :program:`mongos` instances. + +.. _sharding-config-server-replace: + +Replace a Config Server +----------------------- + +Use this procedure only if you need to replace one of your config +servers after it becomes inoperable (e.g. hardware failure.) This +process assumes that the hostname of the instance will not change. If +you must change the hostname of the instance, use the process for +:ref:`migrating a config server to a different hostname +`. + +#. Provision a new system, with the same hostname as the previous + host. + + You will have to ensure that the new system has the same IP address + and hostname as the system it's replacing *or* you will need to + modify the DNS records and wait for them to propagate. + +#. Shut down *one* (and only one) of the existing config servers. Copy + all this host's :setting:`dbpath` file system tree from the current system + to the system that will provide the new config server. This + command, issued on the system with the data files, may resemble the + following: + + .. code-block:: sh + + rsync -az /data/configdb mongodb.config2.example.net:/data/configdb + +#. Restart the config server process that you used in the previous + step to copy the data files to the new config server instance. + +#. Start the new config server instance. The default invocation is: + + .. code-block:: sh + + mongod --configsvr + +.. note:: + + In the course of this procedure *never* remove a config server from + the :setting:`configdb` parameter on any of the :program:`mongos` + instances. If you need to change the name of a config server, + always make sure that all :program:`mongos` instances have three + config servers specified in the :setting:`configdb` setting at all + times. + +.. _sharding-config-server-backup: + +Backup Cluster Metadata +----------------------- + +The cluster will remain operational [#read-only]_ without one +of the config database's :program:`mongod` instances, creating a backup +of the cluster metadata from the config database is straight forward: + +#. Shut down one of the :term:`config databases `. + +#. Create a full copy of the data files (i.e. the path specified by + the :setting:`dbpath` option for the config instance.) + +#. Restart the original configuration server. + +.. seealso:: :doc:`backups`. + +.. [#read-only] While one of the three config servers is unavailable, + the cluster cannot split any chunks nor can it migrate chunks + between shards. Your application will be able to write data to the + cluster. The :ref:`sharding-config-server` section of the + documentation provides more information on this topic. + diff --git a/source/administration/sharding-troubleshooting.txt b/source/administration/sharding-troubleshooting.txt new file mode 100644 index 00000000000..99505e99a2e --- /dev/null +++ b/source/administration/sharding-troubleshooting.txt @@ -0,0 +1,152 @@ +.. index:: troubleshooting; sharding +.. index:: sharding; troubleshooting +.. _sharding-troubleshooting: + +================================ +Troubleshooting Sharded Clusters +================================ + +.. default-domain:: mongodb + +The two most important factors in maintaining a successful sharded cluster are: + +- :ref:`choosing an appropriate shard key ` and + +- :ref:`sufficient capacity to support current and future operations + `. + +You can prevent most issues encountered with sharding by ensuring that +you choose the best possible :term:`shard key` for your deployment and +ensure that you are always adding additional capacity to your cluster +well before the current resources become saturated. Continue reading +for specific issues you may encounter in a production environment. + +.. _sharding-troubleshooting-not-splitting: + +All Data Remains on One Shard +----------------------------- + +Your cluster must have sufficient data for sharding to make +sense. Sharding works by migrating chunks between the shards until +each shard has roughly the same number of chunks. + +The default chunk size is 64 megabytes. MongoDB will not begin +migrations until the imbalance of chunks in the cluster exceeds the +:ref:`migration threshold `. While the +default chunk size is configurable with the :setting:`chunkSize` +setting, these behaviors help prevent unnecessary chunk migrations, +which can degrade the performance of your cluster as a whole. + +If you have just deployed a sharded cluster, make sure that you have +enough data to make sharding effective. If you do not have sufficient +data to create more than eight 64 megabyte chunks, then all data will +remain on one shard. Either lower the :ref:`chunk size +` setting, or add more data to the cluster. + +As a related problem, the system will split chunks only on +inserts or updates, which means that if you configure sharding and do not +continue to issue insert and update operations, the database will not +create any chunks. You can either wait until your application inserts +data *or* :ref:`split chunks manually `. + +Finally, if your shard key has a low :ref:`cardinality +`, MongoDB may not be able to create +sufficient splits among the data. + +One Shard Receives Too Much Traffic +----------------------------------- + +In some situations, a single shard or a subset of the cluster will +receive a disproportionate portion of the traffic and workload. In +almost all cases this is the result of a shard key that does not +effectively allow :ref:`write scaling `. + +It's also possible that you have "hot chunks." In this case, you may +be able to solve the problem by splitting and then migrating parts of +these chunks. + +In the worst case, you may have to consider re-sharding your data +and :ref:`choosing a different shard key ` +to correct this pattern. + +The Cluster Does Not Balance +---------------------------- + +If you have just deployed your sharded cluster, you may want to +consider the :ref:`troubleshooting suggestions for a new cluster where +data remains on a single shard `. + +If the cluster was initially balanced, but later developed an uneven +distribution of data, consider the following possible causes: + +- You have deleted or removed a significant amount of data from the + cluster. If you have added additional data, it may have a + different distribution with regards to its shard key. + +- Your :term:`shard key` has low :ref:`cardinality ` + and MongoDB cannot split the chunks any further. + +- Your data set is growing faster than the balancer can distribute + data around the cluster. This is uncommon and + typically is the result of: + + - a :ref:`balancing window ` that + is too short, given the rate of data growth. + + - an uneven distribution of :ref:`write operations + ` that requires more data + migration. You may have to choose a different shard key to resolve + this issue. + + - poor network connectivity between shards, which may lead to chunk + migrations that take too long to complete. Investigate your + network configuration and interconnections between shards. + +Migrations Render Cluster Unusable +---------------------------------- + +If migrations impact your cluster or application's performance, +consider the following options, depending on the nature of the impact: + +#. If migrations only interrupt your clusters sporadically, you can + limit the :ref:`balancing window + ` to prevent balancing activity + during peak hours. Ensure that there is enough time remaining to + keep the data from becoming out of balance again. + +#. If the balancer is always migrating chunks to the detriment of + overall cluster performance: + + - You may want to attempt :ref:`decreasing the chunk size ` + to limit the size of the migration. + + - Your cluster may be over capacity, and you may want to attempt to + :ref:`add one or two shards ` to + the cluster to distribute load. + +It's also possible that your shard key causes your +application to direct all writes to a single shard. This kind of +activity pattern can require the balancer to migrate most data soon after writing +it. Consider redeploying your cluster with a shard key that provides +better :ref:`write scaling `. + +Disable Balancing During Backups +-------------------------------- + +If MongoDB migrates a :term:`chunk` during a :doc:`backup +`, you can end with an inconsistent snapshot +of your :term:`sharded cluster`. Never run a backup while the balancer is +active. To ensure that the balancer is inactive during your backup +operation: + +- Set the :ref:`balancing window ` + so that the balancer is inactive during the backup. Ensure that the + backup can complete while you have the balancer disabled. + +- :ref:`manually disable the balancer ` + for the duration of the backup procedure. + +Confirm that the balancer is not active using the +:method:`sh.getBalancerState()` method before starting a backup +operation. When the backup procedure is complete you can reactivate +the balancer process. diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index 849a2688acd..572a2af9914 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -1,418 +1,12 @@ -.. index:: administration; sharding -.. _sharding-administration: - -============================== -Sharded Cluster Administration -============================== - -.. default-domain:: mongodb - - - -.. index:: balancing; operations -.. _sharding-balancing-operations: - -Balancer Operations -------------------- - -This section describes provides common administrative procedures related -to balancing. For an introduction to balancing, see -:ref:`sharding-balancing`. For lower level information on balancing, see -:ref:`sharding-balancing-internals`. - -.. _sharding-balancing-check-lock: - -Check the Balancer Lock -~~~~~~~~~~~~~~~~~~~~~~~ - -To see if the balancer process is active in your :term:`cluster -`, do the following: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Use the following query to return the balancer lock: - - .. code-block:: javascript - - db.locks.find( { _id : "balancer" } ).pretty() - -When this command returns, you will see output like the following: - -.. code-block:: javascript - - { "_id" : "balancer", - "process" : "mongos0.example.net:1292810611:1804289383", - "state" : 2, - "ts" : ObjectId("4d0f872630c42d1978be8a2e"), - "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)", - "who" : "mongos0.example.net:1292810611:1804289383:Balancer:846930886", - "why" : "doing balance round" } - - -This output confirms that: - -- The balancer originates from the :program:`mongos` running on the - system with the hostname ``mongos0.example.net``. - -- The value in the ``state`` field indicates that a :program:`mongos` - has the lock. For version 2.0 and later, the value of an active lock - is ``2``; for earlier versions the value is ``1``. - -.. optional:: - - You can also use the following shell helper, which returns a - boolean to report if the balancer is active: - - .. code-block:: javascript - - sh.getBalancerState() - -.. _sharding-schedule-balancing-window: - -Schedule the Balancing Window -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In some situations, particularly when your data set grows slowly and a -migration can impact performance, it's useful to be able to ensure -that the balancer is active only at certain times. Use the following -procedure to specify a window during which the :term:`balancer` will -be able to migrate chunks: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Use an operation modeled on the following example :method:`update() - ` operation to modify the balancer's - window: - - .. code-block:: javascript - - db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "", stop : "" } } }, true ) - - Replace ```` and ```` with time values using - two digit hour and minute values (e.g ``HH:MM``) that describe the - beginning and end boundaries of the balancing window. - These times will be evaluated relative to the time zone of each individual - :program:`mongos` instance in the sharded cluster. - For instance, running the following - will force the balancer to run between 11PM and 6AM local time only: - - .. code-block:: javascript - - db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:00" } } }, true ) - -.. note:: - - The balancer window must be sufficient to *complete* the migration - of all data inserted during the day. - - As data insert rates can change based on activity and usage - patterns, it is important to ensure that the balancing window you - select will be sufficient to support the needs of your deployment. - -Remove a Balancing Window Schedule -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If you have :ref:`set the balancing window -` and wish to remove the schedule -so that the balancer is always running, issue the following sequence -of operations: - -.. code-block:: javascript - - use config - db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true }) - -.. _sharding-balancing-disable-temporally: - -Disable the Balancer -~~~~~~~~~~~~~~~~~~~~ - -By default the balancer may run at any time and only moves chunks as -needed. To disable the balancer for a short period of time and prevent -all migration, use the following procedure: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue *one* of the following operations to disable the balancer: - - .. code-block:: javascript - - sh.stopBalancer() - -#. Later, issue *one* the following operations to enable the balancer: - - .. code-block:: javascript - - sh.startBalancer() - -.. note:: - - If a migration is in progress, the system will complete - the in-progress migration. After disabling, you can use the - following operation in the :program:`mongo` shell to determine if - there are no migrations in progress: - - .. code-block:: javascript - - use config - while( db.locks.findOne({_id: "balancer"}).state ) { - print("waiting..."); sleep(1000); - } - - -The above process and the :method:`sh.setBalancerState()`, -:method:`sh.startBalancer()`, and :method:`sh.stopBalancer()` helpers provide -wrappers on the following process, which may be useful if you need to -run this operation from a driver that does not have helper functions: - -#. Connect to any :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Issue the following command to switch to the :ref:`config-database`: - - .. code-block:: javascript - - use config - -#. Issue the following update to disable the balancer: - - .. code-block:: javascript - - db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ); - -#. To enable the balancer again, alter the value of "stopped" as follows: - - .. code-block:: javascript - - db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true ); - -.. index:: config servers; operations -.. _sharding-procedure-config-server: - -Config Server Maintenance -------------------------- - -Config servers store all cluster metadata, most importantly, -the mapping from :term:`chunks ` to :term:`shards `. -This section provides an overview of the basic -procedures to migrate, replace, and maintain these servers. - -.. seealso:: :ref:`sharding-config-server` - -Deploy Three Config Servers for Production Deployments -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -For redundancy, all production :term:`sharded clusters ` -should deploy three config servers processes on three different -machines. - -Do not use only a single config server for production deployments. -Only use a single config server deployments for testing. You should -upgrade to three config servers immediately if you are shifting to -production. The following process shows how to convert a test -deployment with only one config server to production deployment with -three config servers. - -#. Shut down all existing MongoDB processes. This includes: - - - all :program:`mongod` instances or :term:`replica sets ` - that provide your shards. - - - the :program:`mongod` instance that provides your existing config - database. - - - all :program:`mongos` instances in your cluster. - -#. Copy the entire :setting:`dbpath` file system tree from the - existing config server to the two machines that will provide the - additional config servers. These commands, issued on the system - with the existing :ref:`config-database`, ``mongo-config0.example.net`` may - resemble the following: - - .. code-block:: sh - - rsync -az /data/configdb mongo-config1.example.net:/data/configdb - rsync -az /data/configdb mongo-config2.example.net:/data/configdb - -#. Start all three config servers, using the same invocation that you - used for the single config server. - - .. code-block:: sh - - mongod --configsvr - -#. Restart all shard :program:`mongod` and :program:`mongos` processes. - -.. _sharding-process-config-server-migrate-same-hostname: - -Migrate Config Servers with the Same Hostname -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Use this process when you need to migrate a config server to a new -system but the new system will be accessible using the same host -name. - -#. Shut down the config server that you're moving. - - This will render all config data for your cluster :ref:`read only - `. - -#. Change the DNS entry that points to the system that provided the old - config server, so that the *same* hostname points to the new - system. - - How you do this depends on how you organize your DNS and - hostname resolution services. - -#. Move the entire :setting:`dbpath` file system tree from the old - config server to the new config server. This command, issued on the - old config server system, may resemble the following: - - .. code-block:: sh - - rsync -az /data/configdb mongo-config0.example.net:/data/configdb - -#. Start the config instance on the new system. The default invocation - is: - - .. code-block:: sh - - mongod --configsvr - -When you start the third config server, your cluster will become -writable and it will be able to create new splits and migrate chunks -as needed. - -.. _sharding-process-config-server-migrate-different-hostname: - -Migrate Config Servers with Different Hostnames -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Use this process when you need to migrate a :ref:`config-database` to a new -server and it *will not* be accessible via the same hostname. If -possible, avoid changing the hostname so that you can use the -:ref:`previous procedure `. - -#. Shut down the :ref:`config server ` you're moving. - - This will render all config data for your cluster "read only:" - - .. code-block:: sh - - rsync -az /data/configdb mongodb.config2.example.net:/data/configdb - -#. Start the config instance on the new system. The default invocation - is: - - .. code-block:: sh - - mongod --configsvr - -#. Shut down all existing MongoDB processes. This includes: - - - all :program:`mongod` instances or :term:`replica sets ` - that provide your shards. - - - the :program:`mongod` instances that provide your existing - :ref:`config databases `. - - - all :program:`mongos` instances in your cluster. - -#. Restart all :program:`mongod` processes that provide the shard - servers. - -#. Update the :option:`--configdb ` parameter (or - :setting:`configdb`) for all :program:`mongos` instances and - restart all :program:`mongos` instances. - -Replace a Config Server -~~~~~~~~~~~~~~~~~~~~~~~ - -Use this procedure only if you need to replace one of your config -servers after it becomes inoperable (e.g. hardware failure.) This -process assumes that the hostname of the instance will not change. If -you must change the hostname of the instance, use the process for -:ref:`migrating a config server to a different hostname -`. - -#. Provision a new system, with the same hostname as the previous - host. - - You will have to ensure that the new system has the same IP address - and hostname as the system it's replacing *or* you will need to - modify the DNS records and wait for them to propagate. - -#. Shut down *one* (and only one) of the existing config servers. Copy - all this host's :setting:`dbpath` file system tree from the current system - to the system that will provide the new config server. This - command, issued on the system with the data files, may resemble the - following: - - .. code-block:: sh - - rsync -az /data/configdb mongodb.config2.example.net:/data/configdb - -#. Restart the config server process that you used in the previous - step to copy the data files to the new config server instance. - -#. Start the new config server instance. The default invocation is: - - .. code-block:: sh - - mongod --configsvr - -.. note:: - - In the course of this procedure *never* remove a config server from - the :setting:`configdb` parameter on any of the :program:`mongos` - instances. If you need to change the name of a config server, - always make sure that all :program:`mongos` instances have three - config servers specified in the :setting:`configdb` setting at all - times. - -Backup Cluster Metadata -~~~~~~~~~~~~~~~~~~~~~~~ - -The cluster will remain operational [#read-only]_ without one -of the config database's :program:`mongod` instances, creating a backup -of the cluster metadata from the config database is straight forward: - -#. Shut down one of the :term:`config databases `. - -#. Create a full copy of the data files (i.e. the path specified by - the :setting:`dbpath` option for the config instance.) - -#. Restart the original configuration server. - -.. seealso:: :doc:`backups`. - -.. [#read-only] While one of the three config servers is unavailable, - the cluster cannot split any chunks nor can it migrate chunks - between shards. Your application will be able to write data to the - cluster. The :ref:`sharding-config-server` section of the - documentation provides more information on this topic. - .. index:: troubleshooting; sharding .. index:: sharding; troubleshooting .. _sharding-troubleshooting: -Troubleshooting ---------------- +================================ +Troubleshooting Sharded Clusters +================================ + +.. default-domain:: mongodb The two most important factors in maintaining a successful sharded cluster are: diff --git a/source/sharding.txt b/source/sharding.txt index 62cbb93b559..f12feadfd5a 100644 --- a/source/sharding.txt +++ b/source/sharding.txt @@ -19,20 +19,27 @@ Sharding Concepts core/sharding-requirements administration/sharding-architectures core/sharding-security - faq/sharding - core/sharding-internals -Set Up and Manage Sharded Clusters ----------------------------------- +Set Up Sharded Clusters +----------------------- .. toctree:: :maxdepth: 1 tutorial/deploy-shard-cluster + +Manage Sharded Clusters +----------------------- + +.. toctree:: + :maxdepth: 1 + tutorial/manage-shards - tutorial/manage-chunks tutorial/add-shards-to-shard-cluster tutorial/remove-shards-from-cluster + tutorial/manage-chunks + tutorial/manage-balancer + administration/sharding-config-server tutorial/enforce-unique-keys-for-sharded-collections tutorial/convert-replica-set-to-replicated-shard-cluster @@ -49,6 +56,16 @@ Backup Sharded Clusters tutorial/restore-sharded-cluster tutorial/schedule-backup-window-for-sharded-clusters +FAQs, Advanced Concepts, and Troubleshooting +-------------------------------------------- + +.. toctree:: + :maxdepth: 1 + + faq/sharding + core/sharding-internals + administration/sharding-troubleshooting + Reference --------- diff --git a/source/tutorial/manage-balancer.txt b/source/tutorial/manage-balancer.txt new file mode 100644 index 00000000000..99e0f52bc5e --- /dev/null +++ b/source/tutorial/manage-balancer.txt @@ -0,0 +1,204 @@ +.. index:: balancing; operations +.. _sharding-balancing-operations: + +=================== +Manage the Balancer +=================== + +.. default-domain:: mongodb + +This page describes provides common administrative procedures related +to balancing. For an introduction to balancing, see +:ref:`sharding-balancing`. For lower level information on balancing, see +:ref:`sharding-balancing-internals`. + +This page includes the following: + +- :ref:`sharding-balancing-check-lock` + +- :ref:`sharding-schedule-balancing-window` + +- :ref:`sharding-balancing-remove-window` + +- :ref:`sharding-balancing-disable-temporally` + +.. _sharding-balancing-check-lock: + +Check the Balancer Lock +----------------------- + +To see if the balancer process is active in your :term:`cluster +`, do the following: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Use the following query to return the balancer lock: + + .. code-block:: javascript + + db.locks.find( { _id : "balancer" } ).pretty() + +When this command returns, you will see output like the following: + +.. code-block:: javascript + + { "_id" : "balancer", + "process" : "mongos0.example.net:1292810611:1804289383", + "state" : 2, + "ts" : ObjectId("4d0f872630c42d1978be8a2e"), + "when" : "Mon Dec 20 2010 11:41:10 GMT-0500 (EST)", + "who" : "mongos0.example.net:1292810611:1804289383:Balancer:846930886", + "why" : "doing balance round" } + +This output confirms that: + +- The balancer originates from the :program:`mongos` running on the + system with the hostname ``mongos0.example.net``. + +- The value in the ``state`` field indicates that a :program:`mongos` + has the lock. For version 2.0 and later, the value of an active lock + is ``2``; for earlier versions the value is ``1``. + +.. optional:: + + You can also use the following shell helper, which returns a + boolean to report if the balancer is active: + + .. code-block:: javascript + + sh.getBalancerState() + +.. _sharding-schedule-balancing-window: + +Schedule the Balancing Window +----------------------------- + +In some situations, particularly when your data set grows slowly and a +migration can impact performance, it's useful to be able to ensure +that the balancer is active only at certain times. Use the following +procedure to specify a window during which the :term:`balancer` will +be able to migrate chunks: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Use an operation modeled on the following example :method:`update() + ` operation to modify the balancer's + window: + + .. code-block:: javascript + + db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "", stop : "" } } }, true ) + + Replace ```` and ```` with time values using + two digit hour and minute values (e.g ``HH:MM``) that describe the + beginning and end boundaries of the balancing window. + These times will be evaluated relative to the time zone of each individual + :program:`mongos` instance in the sharded cluster. + For instance, running the following + will force the balancer to run between 11PM and 6AM local time only: + + .. code-block:: javascript + + db.settings.update({ _id : "balancer" }, { $set : { activeWindow : { start : "23:00", stop : "6:00" } } }, true ) + +.. note:: + + The balancer window must be sufficient to *complete* the migration + of all data inserted during the day. + + As data insert rates can change based on activity and usage + patterns, it is important to ensure that the balancing window you + select will be sufficient to support the needs of your deployment. + +.. _sharding-balancing-remove-window: + +Remove a Balancing Window Schedule +---------------------------------- + +If you have :ref:`set the balancing window +` and wish to remove the schedule +so that the balancer is always running, issue the following sequence +of operations: + +.. code-block:: javascript + + use config + db.settings.update({ _id : "balancer" }, { $unset : { activeWindow : true }) + +.. _sharding-balancing-disable-temporally: + +Disable the Balancer +-------------------- + +By default the balancer may run at any time and only moves chunks as +needed. To disable the balancer for a short period of time and prevent +all migration, use the following procedure: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue *one* of the following operations to disable the balancer: + + .. code-block:: javascript + + sh.stopBalancer() + +#. Later, issue *one* the following operations to enable the balancer: + + .. code-block:: javascript + + sh.startBalancer() + +.. note:: + + If a migration is in progress, the system will complete + the in-progress migration. After disabling, you can use the + following operation in the :program:`mongo` shell to determine if + there are no migrations in progress: + + .. code-block:: javascript + + use config + while( db.locks.findOne({_id: "balancer"}).state ) { + print("waiting..."); sleep(1000); + } + +The above process and the :method:`sh.setBalancerState()`, +:method:`sh.startBalancer()`, and :method:`sh.stopBalancer()` helpers provide +wrappers on the following process, which may be useful if you need to +run this operation from a driver that does not have helper functions: + +#. Connect to any :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Issue the following command to switch to the :ref:`config-database`: + + .. code-block:: javascript + + use config + +#. Issue the following update to disable the balancer: + + .. code-block:: javascript + + db.settings.update( { _id: "balancer" }, { $set : { stopped: true } } , true ); + +#. To enable the balancer again, alter the value of "stopped" as follows: + + .. code-block:: javascript + + db.settings.update( { _id: "balancer" }, { $set : { stopped: false } } , true ); From 23943c2fd7256ca342469f43847e9319d5aeda9d Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 15 Jan 2013 21:50:58 -0500 Subject: [PATCH 06/15] DOCS-987 ongoing --- source/administration/sharding.txt | 152 ----------------------------- 1 file changed, 152 deletions(-) delete mode 100644 source/administration/sharding.txt diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt deleted file mode 100644 index 572a2af9914..00000000000 --- a/source/administration/sharding.txt +++ /dev/null @@ -1,152 +0,0 @@ -.. index:: troubleshooting; sharding -.. index:: sharding; troubleshooting -.. _sharding-troubleshooting: - -================================ -Troubleshooting Sharded Clusters -================================ - -.. default-domain:: mongodb - -The two most important factors in maintaining a successful sharded cluster are: - -- :ref:`choosing an appropriate shard key ` and - -- :ref:`sufficient capacity to support current and future operations - `. - -You can prevent most issues encountered with sharding by ensuring that -you choose the best possible :term:`shard key` for your deployment and -ensure that you are always adding additional capacity to your cluster -well before the current resources become saturated. Continue reading -for specific issues you may encounter in a production environment. - -.. _sharding-troubleshooting-not-splitting: - -All Data Remains on One Shard -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Your cluster must have sufficient data for sharding to make -sense. Sharding works by migrating chunks between the shards until -each shard has roughly the same number of chunks. - -The default chunk size is 64 megabytes. MongoDB will not begin -migrations until the imbalance of chunks in the cluster exceeds the -:ref:`migration threshold `. While the -default chunk size is configurable with the :setting:`chunkSize` -setting, these behaviors help prevent unnecessary chunk migrations, -which can degrade the performance of your cluster as a whole. - -If you have just deployed a sharded cluster, make sure that you have -enough data to make sharding effective. If you do not have sufficient -data to create more than eight 64 megabyte chunks, then all data will -remain on one shard. Either lower the :ref:`chunk size -` setting, or add more data to the cluster. - -As a related problem, the system will split chunks only on -inserts or updates, which means that if you configure sharding and do not -continue to issue insert and update operations, the database will not -create any chunks. You can either wait until your application inserts -data *or* :ref:`split chunks manually `. - -Finally, if your shard key has a low :ref:`cardinality -`, MongoDB may not be able to create -sufficient splits among the data. - -One Shard Receives Too Much Traffic -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -In some situations, a single shard or a subset of the cluster will -receive a disproportionate portion of the traffic and workload. In -almost all cases this is the result of a shard key that does not -effectively allow :ref:`write scaling `. - -It's also possible that you have "hot chunks." In this case, you may -be able to solve the problem by splitting and then migrating parts of -these chunks. - -In the worst case, you may have to consider re-sharding your data -and :ref:`choosing a different shard key ` -to correct this pattern. - -The Cluster Does Not Balance -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If you have just deployed your sharded cluster, you may want to -consider the :ref:`troubleshooting suggestions for a new cluster where -data remains on a single shard `. - -If the cluster was initially balanced, but later developed an uneven -distribution of data, consider the following possible causes: - -- You have deleted or removed a significant amount of data from the - cluster. If you have added additional data, it may have a - different distribution with regards to its shard key. - -- Your :term:`shard key` has low :ref:`cardinality ` - and MongoDB cannot split the chunks any further. - -- Your data set is growing faster than the balancer can distribute - data around the cluster. This is uncommon and - typically is the result of: - - - a :ref:`balancing window ` that - is too short, given the rate of data growth. - - - an uneven distribution of :ref:`write operations - ` that requires more data - migration. You may have to choose a different shard key to resolve - this issue. - - - poor network connectivity between shards, which may lead to chunk - migrations that take too long to complete. Investigate your - network configuration and interconnections between shards. - -Migrations Render Cluster Unusable -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If migrations impact your cluster or application's performance, -consider the following options, depending on the nature of the impact: - -#. If migrations only interrupt your clusters sporadically, you can - limit the :ref:`balancing window - ` to prevent balancing activity - during peak hours. Ensure that there is enough time remaining to - keep the data from becoming out of balance again. - -#. If the balancer is always migrating chunks to the detriment of - overall cluster performance: - - - You may want to attempt :ref:`decreasing the chunk size ` - to limit the size of the migration. - - - Your cluster may be over capacity, and you may want to attempt to - :ref:`add one or two shards ` to - the cluster to distribute load. - -It's also possible that your shard key causes your -application to direct all writes to a single shard. This kind of -activity pattern can require the balancer to migrate most data soon after writing -it. Consider redeploying your cluster with a shard key that provides -better :ref:`write scaling `. - -Disable Balancing During Backups -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If MongoDB migrates a :term:`chunk` during a :doc:`backup -`, you can end with an inconsistent snapshot -of your :term:`sharded cluster`. Never run a backup while the balancer is -active. To ensure that the balancer is inactive during your backup -operation: - -- Set the :ref:`balancing window ` - so that the balancer is inactive during the backup. Ensure that the - backup can complete while you have the balancer disabled. - -- :ref:`manually disable the balancer ` - for the duration of the backup procedure. - -Confirm that the balancer is not active using the -:method:`sh.getBalancerState()` method before starting a backup -operation. When the backup procedure is complete you can reactivate -the balancer process. From 8be29eb497194f3cb1a2c65b5dbee4954466e06e Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 15 Jan 2013 22:02:35 -0500 Subject: [PATCH 07/15] DOCS-987 semi-solution for administration/sharding doc --- source/administration/sharding.txt | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 source/administration/sharding.txt diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt new file mode 100644 index 00000000000..216badc4e9b --- /dev/null +++ b/source/administration/sharding.txt @@ -0,0 +1,30 @@ +======================= +Sharding Administration +======================= + +This page lists administrative documentation for sharding. + +.. toctree:: + :maxdepth: 2 + + core/sharding + administration/sharded-clusters + core/sharding-requirements + administration/sharding-architectures + core/sharding-security + tutorial/deploy-shard-cluster + tutorial/manage-shards + tutorial/add-shards-to-shard-cluster + tutorial/remove-shards-from-cluster + tutorial/manage-chunks + tutorial/manage-balancer + administration/sharding-config-server + tutorial/enforce-unique-keys-for-sharded-collections + tutorial/convert-replica-set-to-replicated-shard-cluster + tutorial/backup-small-sharded-cluster-with-mongodump + tutorial/backup-sharded-cluster-with-filesystem-snapshots + tutorial/backup-sharded-cluster-with-database-dumps + tutorial/restore-single-shard + tutorial/restore-sharded-cluster + tutorial/schedule-backup-window-for-sharded-clusters + administration/sharding-troubleshooting From 69fd4a42ded1f166625f94cd7171a63fdaa0e396 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Tue, 15 Jan 2013 22:23:15 -0500 Subject: [PATCH 08/15] DOCS-987 small --- source/administration/sharding-config-server.txt | 7 +++---- source/core/sharding-security.txt | 6 +++--- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/source/administration/sharding-config-server.txt b/source/administration/sharding-config-server.txt index 13cb51cb6ad..ea1509f2301 100644 --- a/source/administration/sharding-config-server.txt +++ b/source/administration/sharding-config-server.txt @@ -1,9 +1,9 @@ .. index:: config servers; operations .. _sharding-procedure-config-server: -===================== -Manage Config Servers -===================== +========================= +Manage the Config Servers +========================= .. default-domain:: mongodb @@ -222,4 +222,3 @@ of the cluster metadata from the config database is straight forward: between shards. Your application will be able to write data to the cluster. The :ref:`sharding-config-server` section of the documentation provides more information on this topic. - diff --git a/source/core/sharding-security.txt b/source/core/sharding-security.txt index c2eca455b68..74d8b1f09bd 100644 --- a/source/core/sharding-security.txt +++ b/source/core/sharding-security.txt @@ -1,9 +1,9 @@ .. index:: sharding; security .. _sharding-security: -============================================ -Security Considerations for Sharded Clusters -============================================ +======================== +Sharded Cluster Security +======================== .. default-domain:: mongodb From adfe6263dea78c2013552732720611bdb943f81f Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 16 Jan 2013 14:59:51 -0500 Subject: [PATCH 09/15] DOCS-987 ongoing edits --- source/administration/sharded-clusters.txt | 40 +++++--- source/administration/sharding.txt | 27 +----- source/core/sharding.txt | 107 +++++++++------------ source/tutorial/deploy-shard-cluster.txt | 11 ++- 4 files changed, 83 insertions(+), 102 deletions(-) diff --git a/source/administration/sharded-clusters.txt b/source/administration/sharded-clusters.txt index f8d0bf7045a..fdab0543804 100644 --- a/source/administration/sharded-clusters.txt +++ b/source/administration/sharded-clusters.txt @@ -1,25 +1,41 @@ .. index:: sharded clusters .. _sharding-sharded-cluster: -================ -Sharded Clusters -================ +=============================== +Components in a Sharded Cluster +=============================== .. default-domain:: mongodb Sharding occurs within a :term:`sharded cluster`. A sharded cluster -consists of the following. For specific configurations, see -:ref:`sharding-architecture`: +consists of the following components: -- :program:`mongod` instances that serve as the - :term:`shards ` that hold your database collections. +- :ref:`Shards `. Each shard is a separate + :program:`mongod` instance or :term:`replica set` that holds a portion + of the your database collections. -- :program:`mongod` instances that serve as config servers to hold - metadata about the cluster. The metadata maps :term:`chunks - ` to shards. See :ref:`sharding-config-server`. +- :ref:`Config servers `. Each config server is + a :program:`mongod` instances that holds metadata about the cluster. + The metadata maps :term:`chunks ` to shards. -- :program:`mongos` instances that route the reads and - writes to the shards. +- :ref:`mongos instances `. The :program:`mongos` + instances route the reads and writes to the shards. + +.. seealso:: + + - For specific configurations, see :ref:`sharding-architecture`. + + - To set up sharded clusters, see :ref:`sharding-procedure-setup`. + +.. index:: sharding; shards +.. index:: shards +.. _sharding-shards: + +Shards +------ + +A shard is a :program:`mongod` instance that stores some portion of a +sharded cluster’s total data set. .. index:: sharding; config servers .. index:: config servers diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index 216badc4e9b..39190dcdf1a 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -2,29 +2,4 @@ Sharding Administration ======================= -This page lists administrative documentation for sharding. - -.. toctree:: - :maxdepth: 2 - - core/sharding - administration/sharded-clusters - core/sharding-requirements - administration/sharding-architectures - core/sharding-security - tutorial/deploy-shard-cluster - tutorial/manage-shards - tutorial/add-shards-to-shard-cluster - tutorial/remove-shards-from-cluster - tutorial/manage-chunks - tutorial/manage-balancer - administration/sharding-config-server - tutorial/enforce-unique-keys-for-sharded-collections - tutorial/convert-replica-set-to-replicated-shard-cluster - tutorial/backup-small-sharded-cluster-with-mongodump - tutorial/backup-sharded-cluster-with-filesystem-snapshots - tutorial/backup-sharded-cluster-with-database-dumps - tutorial/restore-single-shard - tutorial/restore-sharded-cluster - tutorial/schedule-backup-window-for-sharded-clusters - administration/sharding-troubleshooting +This page is no longer used. \ No newline at end of file diff --git a/source/core/sharding.txt b/source/core/sharding.txt index 27730096b10..71d42968a5c 100644 --- a/source/core/sharding.txt +++ b/source/core/sharding.txt @@ -7,48 +7,19 @@ Sharding Overview .. default-domain:: mongodb -Sharding is MongoDB’s approach to scaling out. Sharding partitions -database collections and stores the different portions of a collection -on different machines. When your collections become too large for -existing storage, you need only add a new machine and MongoDB -automatically distributes data to the new server. - -Sharding automatically balances the data and load across machines. By -splitting a collection across multiple machines, sharding increases -write capacity, provides the ability to support larger working sets, and -raises the limits of total data size beyond the physical resources of a -single node. +Sharding is MongoDB’s approach to scaling out. Sharding partitions a +database collection and stores the different portions on different +machines. When collections become too large for existing storage, you +need only add a new machine and sharding automatically distributes +collection data to the new server. -How Sharding Works ------------------- - -Sharding is enabled on a per-database basis. Once a database is enabled -for sharding, you then choose which collections to shard. - -MongoDB distributes documents among :term:`shards ` based on the -value of the :ref:`shard key `. Each :term:`chunk` -represents a block of :term:`documents ` with values that fall -within a specific range. When chunks grow beyond the :ref:`chunk size -`, MongoDB divides the chunks into smaller chunks -(i.e. :term:`splitting `) based on the shard key. +Sharding balances the data and load across machines. By splitting a +collection across multiple machines, sharding increases write capacity +and supports larger working sets. Specifically: -The sharding system automatically balances data across the cluster -without intervention from the application layer. Effective automatic -sharding depends on a well chosen :ref:`shard key `, -but requires no additional complexity, modifications, or intervention -from developers. - -Sharding is completely transparent to the application layer, because all -connections to a cluster go through :program:`mongos`. Sharding in -MongoDB requires some :ref:`basic initial configuration -`, but ongoing function is entirely -transparent to the application. - -Sharding increases capacity in two ways: - -#. Effective partitioning of data can provide additional write - capacity by distributing the write load over a number of - :program:`mongod` instances. +#. Partitioning of data provides additional write capacity by + distributing the write load over a number of :program:`mongod` + instances. #. Given a shard key with sufficient :ref:`cardinality `, partitioning data allows users to @@ -58,25 +29,41 @@ Sharding increases capacity in two ways: .. index:: shard key single: sharding; shard key +How Sharding Works +------------------ + +To run sharding, you set up a sharded cluster. For a description of +sharded clusters, see :doc:`/administration/sharded-clusters`. + +Within a sharded cluster, you enable sharding on a per-database basis. +Once a database is enabled for sharding, you choose which collections to +shard. For each sharded collection, you specify a :term:`shard key`. + +The shard key determines the distribution of the collection's +:term:`documents ` among the cluster's :term:`shards `. +The shard key is a :term:`field` that exists in every document in the +collection. MongoDB distributes documents according to ranges of values +in the shard key. A given shard holds documents for which the shard key +falls within a specific range of values. Shard keys, like :term:`indexes +`, can be either a single field or multiple fields. + +Within a shard, MongoDB further partitions documents into :term:`chunks +`. Each chunk represents a smaller range of values within the +shard's range. When a chunk grows beyond the :ref:`chunk size +`, MongoDB :term:`splits ` the chunk into +smaller chunks, always based on ranges in the shard key. + +.. _sharding-shard-key-selection: .. _sharding-shard-key: .. _shard-key: -Shard Keys ----------- +Shard Key Selection +------------------- -MongoDB distributes documents among :term:`shards ` based on -:term:`shard keys `. A shard key must be :term:`field` that -exists in every :term:`document` in the collection. Shard keys, like -:term:`indexes `, can be either a single field, or may be a -compound key, consisting of multiple fields. - -MongoDB's sharding is range-based. Each :term:`chunk` holds -documents having specific range of values for the "shard key". Thus, -choosing the correct shard key can have a great impact on the +Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster. - -Appropriate shard key choice depends on the schema of your data and -the way that your application queries and writes data to the database. +Appropriate shard key choice depends on the schema of your data and the +way that your application queries and writes data to the database. The ideal shard key: @@ -109,16 +96,16 @@ help produce one that is more ideal. .. index:: balancing .. _sharding-balancing: -Balancing and Distribution --------------------------- +Shard Balancing +--------------- Balancing is the process MongoDB uses to redistribute data within a :term:`sharded cluster`. When a :term:`shard` has a too many: -term:`chunks ` when compared to other shards, MongoDB balances -the shards. +term:`chunks ` when compared to other shards, MongoDB +automatically balances the shards. MongoDB balances the shards without +intervention from the application layer. -The -balancing process attempts to minimize the impact that balancing can +The balancing process attempts to minimize the impact that balancing can have on the cluster, by: - Moving only one chunk at a time. diff --git a/source/tutorial/deploy-shard-cluster.txt b/source/tutorial/deploy-shard-cluster.txt index e7eca5e9670..88dd59ffed2 100644 --- a/source/tutorial/deploy-shard-cluster.txt +++ b/source/tutorial/deploy-shard-cluster.txt @@ -213,10 +213,13 @@ Optionally, you can enable sharding for a database using the Enable Sharding for Collections ------------------------------- -You enable sharding on a per-collection basis. When you do, you -specify the :term:`shard key` MongoDB uses to distribute your documents -among the shards. For more information, see the -:ref:`sharding-shard-key`. +You enable sharding on a per-collection basis. + +1. Determine what you will use for the :term:`shard key`. Your selection + of the shard key affects the efficiency of sharding. See the + selection considerations listed in the :ref:`sharding-shard-key-selection`. + + To enable sharding for a collection, use the :method:`sh.shardCollection()` method in the :program:`mongo` shell. From 9685360a01018dd5c24db2ad990a85e2cfbffd70 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 16 Jan 2013 15:47:40 -0500 Subject: [PATCH 10/15] DOCS-987 added topic: view cluster configuration --- source/administration/sharded-clusters.txt | 9 +- .../administration/sharding-architectures.txt | 39 +-- source/core/sharding-security.txt | 12 +- source/core/sharding.txt | 26 +- source/sharding.txt | 15 +- source/tutorial/manage-shards.txt | 308 ------------------ .../tutorial/view-cluster-configuration.txt | 108 ++++++ 7 files changed, 151 insertions(+), 366 deletions(-) delete mode 100644 source/tutorial/manage-shards.txt create mode 100644 source/tutorial/view-cluster-configuration.txt diff --git a/source/administration/sharded-clusters.txt b/source/administration/sharded-clusters.txt index fdab0543804..69840e5aa15 100644 --- a/source/administration/sharded-clusters.txt +++ b/source/administration/sharded-clusters.txt @@ -34,8 +34,13 @@ consists of the following components: Shards ------ -A shard is a :program:`mongod` instance that stores some portion of a -sharded cluster’s total data set. +A shard is a container that holds a subset of a collection’s data. Each +shard is either a single :program:`mongod` instance or a :term:`replica +set`. In production, all shards should be replica sets. + +Applications do not access the shards directly. Instead, the +:ref:`mongos instances ` routes reads and writes from +applications to the shards. .. index:: sharding; config servers .. index:: config servers diff --git a/source/administration/sharding-architectures.txt b/source/administration/sharding-architectures.txt index b0e4a84a8e1..9772ca117ca 100644 --- a/source/administration/sharding-architectures.txt +++ b/source/administration/sharding-architectures.txt @@ -11,10 +11,8 @@ Sharded Cluster Architectures This document describes the organization and design of :term:`sharded cluster` deployments. -Test Cluster ------------- - -.. warning:: Use this architecture for testing and development only. +Test Cluster Architecture +------------------------- You can deploy a very minimal cluster for testing and development. These *non-production* clusters have the following @@ -27,10 +25,12 @@ components: - One :program:`mongos` instance. +.. warning:: Use the test cluster architecture for testing and development only. + .. _sharding-production-architecture: -Production Cluster ------------------- +Production Cluster Architecture +------------------------------- In a production cluster, you must ensure that data is redundant and that your systems are highly available. To that end, a production-level @@ -39,24 +39,19 @@ cluster must have the following components: - Three :ref:`config servers `, each residing on a discrete system. - .. note:: - - A single :term:`sharded cluster` must have exclusive use of its - :ref:`config servers `. If you have - multiple shards, you will need to have a group of config servers - for each cluster. - -- Two or more :term:`replica sets `, for the :term:`shards - `. + A single :term:`sharded cluster` must have exclusive use of its + :ref:`config servers `. If you have multiple + shards, you will need to have a group of config servers for each + cluster. - .. see:: For more information on replica sets see - :doc:`/administration/replication-architectures` and - :doc:`/replication`. +- Two or more :term:`replica sets ` to serve as + :term:`shards `. For information on replica sets, see + :doc:`/replication`. -- Two or more :program:`mongos` instances. Typically, you will deploy a single - :program:`mongos` instance on each application server. Alternatively, - you may deploy several `mongos` nodes and let your application connect - to these via a load balancer. +- Two or more :program:`mongos` instances. Typically, you deploy a + single :program:`mongos` instance on each application server. + Alternatively, you may deploy several :program:`mongos` nodes and let + your application connect to these via a load balancer. Sharded and Non-Sharded Data ---------------------------- diff --git a/source/core/sharding-security.txt b/source/core/sharding-security.txt index 74d8b1f09bd..836047ef863 100644 --- a/source/core/sharding-security.txt +++ b/source/core/sharding-security.txt @@ -41,13 +41,13 @@ credentials for "admin" users (i.e. for the :term:`admin database`) and credentials for all other databases. These credentials reside in different locations within the cluster and have different roles: -#. Admin database credentials reside on the config servers, to receive - admin access to the cluster you *must* authenticate a session while - connected to a :program:`mongos` instance using the - :term:`admin database`. +- Admin database credentials reside on the config servers, to receive + admin access to the cluster you *must* authenticate a session while + connected to a :program:`mongos` instance using the :term:`admin + database`. -#. Other database credentials reside on the *primary* shard for the - database. +- Other database credentials reside on the *primary* shard for the + database. This means that you *can* authenticate to these users and databases while connected directly to the primary shard for a database. However, diff --git a/source/core/sharding.txt b/source/core/sharding.txt index 71d42968a5c..6d93837af13 100644 --- a/source/core/sharding.txt +++ b/source/core/sharding.txt @@ -8,23 +8,15 @@ Sharding Overview .. default-domain:: mongodb Sharding is MongoDB’s approach to scaling out. Sharding partitions a -database collection and stores the different portions on different -machines. When collections become too large for existing storage, you -need only add a new machine and sharding automatically distributes -collection data to the new server. - -Sharding balances the data and load across machines. By splitting a -collection across multiple machines, sharding increases write capacity -and supports larger working sets. Specifically: - -#. Partitioning of data provides additional write capacity by - distributing the write load over a number of :program:`mongod` - instances. - -#. Given a shard key with sufficient :ref:`cardinality - `, partitioning data allows users to - increase the potential amount of data to manage with MongoDB and - expand the :term:`working set`. +collection and stores the different portions on different machines. When +a database's collections become too large for existing storage, you need only add a +new machine. Sharding automatically distributes collection data to +the new server. + +Sharding automatically balances data and load across machines. Sharding +provides additional write capacity by distributing the write load over a +number of :program:`mongod` instances. Sharding allows users to increase +the potential amount of data in the :term:`working set`. .. index:: shard key single: sharding; shard key diff --git a/source/sharding.txt b/source/sharding.txt index f12feadfd5a..6ea0571ce21 100644 --- a/source/sharding.txt +++ b/source/sharding.txt @@ -14,29 +14,22 @@ Sharding Concepts .. toctree:: :maxdepth: 1 - core/sharding + core/sharding administration/sharded-clusters core/sharding-requirements administration/sharding-architectures core/sharding-security -Set Up Sharded Clusters ------------------------ +Set Up and Manage Sharded Clusters +---------------------------------- .. toctree:: :maxdepth: 1 tutorial/deploy-shard-cluster - -Manage Sharded Clusters ------------------------ - -.. toctree:: - :maxdepth: 1 - - tutorial/manage-shards tutorial/add-shards-to-shard-cluster tutorial/remove-shards-from-cluster + tutorial/view-cluster-configuration tutorial/manage-chunks tutorial/manage-balancer administration/sharding-config-server diff --git a/source/tutorial/manage-shards.txt b/source/tutorial/manage-shards.txt deleted file mode 100644 index 07b5f3db6f4..00000000000 --- a/source/tutorial/manage-shards.txt +++ /dev/null @@ -1,308 +0,0 @@ -.. _sharding-manage-shards: - -============= -Manage Shards -============= - -.. default-domain:: mongodb - -This page includes the following: - -- :ref:`sharding-procedure-add-shard` - -- :ref:`sharding-procedure-remove-shard` - -- :ref:`sharding-procedure-list-databases` - -- :ref:`sharding-procedure-list-shards` - -- :ref:`sharding-procedure-view-clusters` - -.. _sharding-procedure-add-shard: - -Add a Shard to a Cluster ------------------------- - -To add a shard to an *existing* sharded cluster, use the following -procedure: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. First, you need to tell the cluster where to find the individual - shards. You can do this using the :dbcommand:`addShard` command or - the :method:`sh.addShard()` helper: - - .. code-block:: javascript - - sh.addShard( ":" ) - - Replace ```` and ```` with the hostname and TCP - port number of where the shard is accessible. - Alternately specify a :term:`replica set` name and at least one - hostname which is a member of the replica set. - - For example: - - .. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - - .. note:: In production deployments, all shards should be replica sets. - - Repeat for each shard in your cluster. - - .. optional:: - - You may specify a "name" as an argument to the - :dbcommand:`addShard` command, as follows: - - .. code-block:: javascript - - db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) - - You cannot specify a name for a shard using the - :method:`sh.addShard()` helper in the :program:`mongo` shell. If - you use the helper or do not specify a shard name, then MongoDB - will assign a name upon creation. - - .. versionchanged:: 2.0.3 - Before version 2.0.3, you must specify the shard in the - following form: the replica set name, followed by a forward - slash, followed by a comma-separated list of seeds for the - replica set. For example, if the name of the replica set is - "myapp1", then your :method:`sh.addShard()` command might resemble: - - .. code-block:: javascript - - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) - -.. note:: - - It may take some time for :term:`chunks ` to migrate to the - new shard. - - For an introduction to balancing, see :ref:`sharding-balancing`. For - lower level information on balancing, see :ref:`sharding-balancing-internals`. - -.. _sharding-procedure-remove-shard: - -Remove a Shard from a Cluster ------------------------------ - -To remove a :term:`shard` from a :term:`sharded cluster`, you must: - -- Migrate :term:`chunks ` to another shard or database. - -- Ensure that this shard is not the :term:`primary shard` for any databases in - the cluster. If it is, move the "primary" status for these databases - to other shards. - -- Finally, remove the shard from the cluster's configuration. - -.. note:: - - To successfully migrate data from a shard, the :term:`balancer` - process **must** be active. - -The procedure to remove a shard is as follows: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Determine the name of the shard you will be removing. - - You must specify the name of the shard. You may have specified this - shard name when you first ran the :dbcommand:`addShard` command. If not, - you can find out the name of the shard by running the - :dbcommand:`listShards` or :dbcommand:`printShardingStatus` - commands or the :method:`sh.status()` shell helper. - - The following examples will remove a shard named ``mongodb0`` from the cluster. - -#. Begin removing chunks from the shard. - - Start by running the :dbcommand:`removeShard` command. This will - start "draining" or migrating chunks from the shard you're removing - to another shard in the cluster. - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - This operation will return the following response immediately: - - .. code-block:: javascript - - { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } - - Depending on your network capacity and the amount of data in the - shard, this operation can take anywhere from a few minutes to several - days to complete. - -#. View progress of the migration. - - You can run the :dbcommand:`removeShard` command again at any stage of the - process to view the progress of the migration, as follows: - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - The output should look something like this: - - .. code-block:: javascript - - { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } - - In the ``remaining`` sub-document ``{ chunks: xx, dbs: y }``, a - counter displays the remaining number of chunks that MongoDB must - migrate to other shards and the number of MongoDB databases that have - "primary" status on this shard. - - Continue checking the status of the :dbcommand:`removeShard` command - until the remaining number of chunks to transfer is 0. - -#. Move any databases to other shards in the cluster as needed. - - This is only necessary when removing a shard that is also the - :term:`primary shard` for one or more databases. - - Issue the following command at the :program:`mongo` shell: - - .. code-block:: javascript - - db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) - - This command will migrate all remaining non-sharded data in the - database named ``myapp`` to the shard named ``mongodb1``. - - .. warning:: - - Do not run the :dbcommand:`movePrimary` command until you have *finished* - draining the shard. - - The command will not return until MongoDB completes moving all - data. The response from this command will resemble the following: - - .. code-block:: javascript - - { "primary" : "mongodb1", "ok" : 1 } - -#. Run :dbcommand:`removeShard` again to clean up all metadata - information and finalize the shard removal, as follows: - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - When successful, this command will return a document like this: - - .. code-block:: javascript - - { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } - -Once the value of the ``stage`` field is "completed," you may safely -stop the processes comprising the ``mongodb0`` shard. - - -.. _sharding-procedure-list-databases: - -List Databases with Sharding Enabled ------------------------------------- - -To list the databases that have sharding enabled, query the -``databases`` collection in the :ref:`config-database`. -A database has sharding enabled if the value of the ``partitioned`` -field is ``true``. Connect to a :program:`mongos` instance with a -:program:`mongo` shell, and run the following operation to get a full -list of databases with sharding enabled: - -.. code-block:: javascript - - use config - db.databases.find( { "partitioned": true } ) - -.. example:: You can use the following sequence of commands when to - return a list of all databases in the cluster: - - .. code-block:: javascript - - use config - db.databases.find() - - If this returns the following result set: - - .. code-block:: javascript - - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" } - { "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" } - - Then sharding is only enabled for the ``animals`` database. - -.. _sharding-procedure-list-shards: - -List Shards ------------ - -To list the current set of configured shards, use the :dbcommand:`listShards` -command, as follows: - -.. code-block:: javascript - - use admin - db.runCommand( { listShards : 1 } ) - -.. _sharding-procedure-view-clusters: - -View Cluster Details --------------------- - -To view cluster details, issue :method:`db.printShardingStatus()` or -:method:`sh.status()`. Both methods return the same output. - -.. example:: In the following example output from :method:`sh.status()` - - - ``sharding version`` displays the version number of the shard - metadata. - - - ``shards`` displays a list of the :program:`mongod` instances - used as shards in the cluster. - - - ``databases`` displays all databases in the cluster, - including database that do not have sharding enabled. - - - The ``chunks`` information for the ``foo`` database displays how - many chunks are on each shard and displays the range of each chunk. - - .. code-block:: javascript - - --- Sharding Status --- - sharding version: { "_id" : 1, "version" : 3 } - shards: - { "_id" : "shard0000", "host" : "m0.example.net:30001" } - { "_id" : "shard0001", "host" : "m3.example2.net:50000" } - databases: - { "_id" : "admin", "partitioned" : false, "primary" : "config" } - { "_id" : "animals", "partitioned" : true, "primary" : "shard0000" } - foo.big chunks: - shard0001 1 - shard0000 6 - { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo - { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo - { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo - { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) - { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - foo.large chunks: - shard0001 1 - shard0000 5 - { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) - { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo - { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo - { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo - { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) - { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) - { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } diff --git a/source/tutorial/view-cluster-configuration.txt b/source/tutorial/view-cluster-configuration.txt new file mode 100644 index 00000000000..81540462a43 --- /dev/null +++ b/source/tutorial/view-cluster-configuration.txt @@ -0,0 +1,108 @@ +.. _sharding-manage-shards: + +========================== +View Cluster Configuration +========================== + +.. default-domain:: mongodb + +.. _sharding-procedure-list-databases: + +List Databases with Sharding Enabled +------------------------------------ + +To list the databases that have sharding enabled, query the +``databases`` collection in the :ref:`config-database`. +A database has sharding enabled if the value of the ``partitioned`` +field is ``true``. Connect to a :program:`mongos` instance with a +:program:`mongo` shell, and run the following operation to get a full +list of databases with sharding enabled: + +.. code-block:: javascript + + use config + db.databases.find( { "partitioned": true } ) + +.. example:: You can use the following sequence of commands when to + return a list of all databases in the cluster: + + .. code-block:: javascript + + use config + db.databases.find() + + If this returns the following result set: + + .. code-block:: javascript + + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "animals", "partitioned" : true, "primary" : "m0.example.net:30001" } + { "_id" : "farms", "partitioned" : false, "primary" : "m1.example2.net:27017" } + + Then sharding is only enabled for the ``animals`` database. + +.. _sharding-procedure-list-shards: + +List Shards +----------- + +To list the current set of configured shards, use the :dbcommand:`listShards` +command, as follows: + +.. code-block:: javascript + + use admin + db.runCommand( { listShards : 1 } ) + +.. _sharding-procedure-view-clusters: + +View Cluster Details +-------------------- + +To view cluster details, issue :method:`db.printShardingStatus()` or +:method:`sh.status()`. Both methods return the same output. + +.. example:: In the following example output from :method:`sh.status()` + + - ``sharding version`` displays the version number of the shard + metadata. + + - ``shards`` displays a list of the :program:`mongod` instances + used as shards in the cluster. + + - ``databases`` displays all databases in the cluster, + including database that do not have sharding enabled. + + - The ``chunks`` information for the ``foo`` database displays how + many chunks are on each shard and displays the range of each chunk. + + .. code-block:: javascript + + --- Sharding Status --- + sharding version: { "_id" : 1, "version" : 3 } + shards: + { "_id" : "shard0000", "host" : "m0.example.net:30001" } + { "_id" : "shard0001", "host" : "m3.example2.net:50000" } + databases: + { "_id" : "admin", "partitioned" : false, "primary" : "config" } + { "_id" : "animals", "partitioned" : true, "primary" : "shard0000" } + foo.big chunks: + shard0001 1 + shard0000 6 + { "a" : { $minKey : 1 } } -->> { "a" : "elephant" } on : shard0001 Timestamp(2000, 1) jumbo + { "a" : "elephant" } -->> { "a" : "giraffe" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "giraffe" } -->> { "a" : "hippopotamus" } on : shard0000 Timestamp(2000, 2) jumbo + { "a" : "hippopotamus" } -->> { "a" : "lion" } on : shard0000 Timestamp(2000, 3) jumbo + { "a" : "lion" } -->> { "a" : "rhinoceros" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rhinoceros" } -->> { "a" : "springbok" } on : shard0000 Timestamp(1000, 4) + { "a" : "springbok" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + foo.large chunks: + shard0001 1 + shard0000 5 + { "a" : { $minKey : 1 } } -->> { "a" : "hen" } on : shard0001 Timestamp(2000, 0) + { "a" : "hen" } -->> { "a" : "horse" } on : shard0000 Timestamp(1000, 1) jumbo + { "a" : "horse" } -->> { "a" : "owl" } on : shard0000 Timestamp(1000, 2) jumbo + { "a" : "owl" } -->> { "a" : "rooster" } on : shard0000 Timestamp(1000, 3) jumbo + { "a" : "rooster" } -->> { "a" : "sheep" } on : shard0000 Timestamp(1000, 4) + { "a" : "sheep" } -->> { "a" : { $maxKey : 1 } } on : shard0000 Timestamp(1000, 5) + { "_id" : "test", "partitioned" : false, "primary" : "shard0000" } From ac000eaed5ddad6ac2d67cf1be15332cd0238e23 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 16 Jan 2013 18:04:52 -0500 Subject: [PATCH 11/15] DOCS-987 merged duplicate add shard content --- .../tutorial/add-shards-to-shard-cluster.txt | 142 +++++++----------- source/tutorial/deploy-shard-cluster.txt | 108 +++++++------ .../tutorial/remove-shards-from-cluster.txt | 122 +++++++++++++++ 3 files changed, 233 insertions(+), 139 deletions(-) diff --git a/source/tutorial/add-shards-to-shard-cluster.txt b/source/tutorial/add-shards-to-shard-cluster.txt index 8ed6bbc1e6e..518c1d7ded2 100644 --- a/source/tutorial/add-shards-to-shard-cluster.txt +++ b/source/tutorial/add-shards-to-shard-cluster.txt @@ -1,119 +1,81 @@ -================================= -Add Shards to an Existing Cluster -================================= +======================= +Add Shards to a Cluster +======================= .. default-domain:: mongodb -Synopsis --------- +You add shards to a :term:`sharded cluster` after you create the cluster +or anytime that you need to add capacity to the cluster. If you have not +created a sharded cluster, see :ref:`sharding-procedure-setup`. -This document describes how to add a :term:`shard` to an -existing :term:`sharded cluster`. As your -data set grows you must add additional shards to a cluster to provide -additional capacity. For additional sharding -procedures, see :doc:`/administration/sharding`. - -Concerns --------- - -Distributing :term:`chunks ` among your cluster requires some -capacity to support the migration process. When adding a shard to your -cluster, you should always ensure that your cluster has enough -capacity to support the migration without affecting legitimate -production traffic. +When adding a shard to a cluster, you should always ensure that the +cluster has enough capacity to support the migration without affecting +legitimate production traffic. In production environments, all shards should be :term:`replica sets -`. Furthermore, *all* interaction with your sharded -cluster should pass through a :program:`mongos` instance. This -tutorial assumes that you already have a :program:`mongo` shell -connection to a :program:`mongos` instance. - -Process -------- - -Tell the cluster where to find the individual -shards. You can do this using the :dbcommand:`addShard` command: - -.. code-block:: javascript - - db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) - -Or you can use the :method:`sh.addShard()` helper in the -:program:`mongo` shell: - -.. code-block:: javascript - - sh.addShard( "[hostname]:[port]" ) +`. -Replace ``[hostname]`` and ``[port]`` with the hostname and TCP -port number of where the shard is accessible. +Add a Shard to a Cluster +------------------------ -.. warning:: +You interact with a sharded cluster by connecting to a :program:`mongos` +instance. - Do not use ``localhost`` for the hostname unless your - :term:`configuration server ` - is also running on ``localhost``. +1. From a :program:`mongo` shell, connect to the :program:`mongos` + instance. Issue a command using the following syntax: -For example: - -.. code-block:: javascript - - sh.addShard( "mongodb0.example.net:27027" ) - -If ``mongodb0.example.net:27027`` is a member of a replica -set, call the :method:`sh.addShard()` method with an argument that -resembles the following: - -.. code-block:: javascript - - sh.addShard( "/mongodb0.example.net:27027" ) + .. code-block:: sh -Replace, ```` with the name of the replica set, and -MongoDB will discover all other members of the replica set. + mongo --host --port -.. note:: In production deployments, all shards should be replica sets. + For example, if a :program:`mongos` is accessible at + ``mongos0.example.net`` on port ``27017``, issue the following + command: - .. versionchanged:: 2.0.3 + .. code-block:: sh - Before version 2.0.3, you must specify the shard in the following - form: + mongo --host mongos0.example.net --port 27017 - .. code-block:: sh +#. Add each shard to the cluster using the :method:`sh.addShard()` + method, as shown in the examples below. Issue :method:`sh.addShard()` + separately for each shard. - replicaSetName/,, + If the shard is a replica set, specify the + name of the replica set and specify a member of the set. In + production deployments, all shards should be replica sets. - For example, if the name of the replica set is ``repl0``, then - your :method:`sh.addShard` command would be: + .. optional:: You can instead use the :dbcommand:`addShard` database + command, which lets you specify a name and maximum size for the + shard. If you do not specify these, MongoDB automatically assigns + a name and maximum size. To use the database command, see + :dbcommand:`addShard`. - .. code-block:: javascript + The following are examples of adding a shard with + :method:`sh.addShard()`: - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) + - To add a shard for a replica set named ``rs1`` with a member + running on port ``27017`` on ``mongodb0.example.net``, issue the + following command: -Repeat this step for each shard in your cluster. + .. code-block:: javascript -.. optional:: + sh.addShard( "rs1/mongodb0.example.net:27017" ) - You may specify a "name" as an argument to the - :dbcommand:`addShard`, follows: + .. versionchanged:: 2.0.3 - .. code-block:: javascript + For MongoDB versions prior to 2.0.3, you must specify all members of the replica set. For + example: - db.runCommand( { addShard: mongodb0.example.net, name: "mongodb0" } ) + .. code-block:: javascript - You cannot specify a name for a shard using the - :method:`sh.addShard()` helper in the :program:`mongo` shell. If - you use the helper or do not specify a shard name, then MongoDB - will assign a name upon creation. + sh.addShard( "rs1/mongodb0.example.net:27017,mongodb1.example.net:27017,mongodb2.example.net:27017" ) -.. note:: + - To add a shard for a standalone :program:`mongod` on port ``27017`` + of ``mongodb0.example.net``, issue the following command: - It may take some time for :term:`chunks ` to migrate to the new - shard because the system must copy data from one :program:`mongod` - instance to another while maintaining data consistency. + .. code-block:: javascript - For an overview of the balancing operation, - see the :ref:`Balancing and Distribution ` - section. + sh.addShard( "mongodb0.example.net:27017" ) - For additional information on balancing, see the - :ref:`Balancing Internals ` section. + .. note:: It might take some time for :term:`chunks ` to + migrate to the new shard. diff --git a/source/tutorial/deploy-shard-cluster.txt b/source/tutorial/deploy-shard-cluster.txt index 88dd59ffed2..14c7072afe2 100644 --- a/source/tutorial/deploy-shard-cluster.txt +++ b/source/tutorial/deploy-shard-cluster.txt @@ -7,24 +7,26 @@ Set Up a Sharded Cluster .. default-domain:: mongodb The topics on this page are an ordered sequence of the tasks you perform -to set up a sharded cluster. To set up a sharded cluster, follow the -ordered sequence of tasks: +to set up a :term:`sharded cluster`. -- :ref:`sharding-setup-start-cfgsrvr` +Before setting up a cluster, see the following: -- :ref:`sharding-setup-start-mongos` +- :ref:`sharding-requirements`. -- :ref:`sharding-setup-add-shards` +- :doc:`/administration/replication-architectures` -- :ref:`sharding-setup-enable-sharding` +To set up a sharded cluster, follow ordered sequence of the tasks on +this page: -- :ref:`sharding-setup-shard-collection` +1. :ref:`sharding-setup-start-cfgsrvr` -Before setting up a cluster, see the following: +#. :ref:`sharding-setup-start-mongos` -- :ref:`sharding-requirements`. +#. :ref:`sharding-setup-add-shards` -- :doc:`/administration/replication-architectures` +#. :ref:`sharding-setup-enable-sharding` + +#. :ref:`sharding-setup-shard-collection` .. include:: /includes/warning-sharding-hostnames.rst @@ -128,7 +130,7 @@ should be a replica set. .. code-block:: sh - mongo --host --port + mongo --host --port For example, if a :program:`mongos` is accessible at ``mongos0.example.net`` on port ``27017``, issue the following @@ -138,40 +140,49 @@ should be a replica set. mongo --host mongos0.example.net --port 27017 -#. Add shards to the cluster using the :method:`sh.addShard()` method - issued from the :program:`mongo` shell. Issue :method:`sh.addShard()` - separately for each shard in the cluster. If the shard is a replica - set, specify the name of the set and one member of the set. - :program:`mongos` will discover the names of other members of the - set. In production deployments, all shards should be replica sets. +#. Add each shard to the cluster using the :method:`sh.addShard()` + method, as shown in the examples below. Issue :method:`sh.addShard()` + separately for each shard. - The following example adds a shard for a standalone :program:`mongod`. - To optionally specify a name for the shard or a maximum size, see - :dbcommand:`addShard`: + If the shard is a replica set, specify the + name of the replica set and specify a member of the set. In + production deployments, all shards should be replica sets. - .. code-block:: javascript + .. optional:: You can instead use the :dbcommand:`addShard` database + command, which lets you specify a name and maximum size for the + shard. If you do not specify these, MongoDB automatically assigns + a name and maximum size. To use the database command, see + :dbcommand:`addShard`. - sh.addShard( "mongodb0.example.net:27027" ) + The following are examples of adding a shard with + :method:`sh.addShard()`: - The following example adds a shard for a replica set - named ``rs1``: + - To add a shard for a replica set named ``rs1`` with a member + running on port ``27017`` on ``mongodb0.example.net``, issue the + following command: - .. code-block:: javascript + .. code-block:: javascript + + sh.addShard( "rs1/mongodb0.example.net:27017" ) - sh.addShard( "rs1/mongodb0.example.net:27027" ) + .. versionchanged:: 2.0.3 - .. note:: + For MongoDB versions prior to 2.0.3, you must specify all members of the replica set. For + example: - .. versionchanged:: 2.0.3 + .. code-block:: javascript - For MongoDB versions prior to 2.0.3, if you add a replica set as a - shard, you must specify all members of the replica set. For - example, if the name of the replica set is ``repl0``, then your - :method:`sh.addShard()` command would be: + sh.addShard( "rs1/mongodb0.example.net:27017,mongodb1.example.net:27017,mongodb2.example.net:27017" ) + + - To add a shard for a standalone :program:`mongod` on port ``27017`` + of ``mongodb0.example.net``, issue the following command: .. code-block:: javascript - sh.addShard( "repl0/mongodb0.example.net:27027,mongodb1.example.net:27017,mongodb2.example.net:27017" ) + sh.addShard( "mongodb0.example.net:27017" ) + + .. note:: It might take some time for :term:`chunks ` to + migrate to the new shard. .. _sharding-setup-enable-sharding: @@ -192,14 +203,14 @@ before sharding begins. .. code-block:: sh - mongo --host --port + mongo --host --port #. Issue the :method:`sh.enableSharding()` method, specifying the name of the database for which to enable sharding. Use the following syntax: .. code-block:: javascript - sh.enableSharding() + sh.enableSharding("") Optionally, you can enable sharding for a database using the :dbcommand:`enableSharding` command, which uses the following syntax: @@ -210,8 +221,8 @@ Optionally, you can enable sharding for a database using the .. _sharding-setup-shard-collection: -Enable Sharding for Collections -------------------------------- +Enable Sharding for a Collection +-------------------------------- You enable sharding on a per-collection basis. @@ -219,21 +230,20 @@ You enable sharding on a per-collection basis. of the shard key affects the efficiency of sharding. See the selection considerations listed in the :ref:`sharding-shard-key-selection`. +#. Enable sharding for a collection by issuing the + :method:`sh.shardCollection()` method in the :program:`mongo` shell. + The method uses the following syntax: + .. code-block:: javascript -To enable sharding for a collection, use the -:method:`sh.shardCollection()` method in the :program:`mongo` shell. -The command uses the following syntax: - -.. code-block:: javascript - - sh.shardCollection(".", shard-key-pattern) + sh.shardCollection(".", shard-key-pattern) -Replace the ``.`` string with the full namespace -of your database, which consists of the name of your database, a dot -(e.g. ``.``), and the full name of the collection. The ``shard-key-pattern`` -represents your shard key, which you specify in the same form as you -would an :method:`index ` key pattern. + Replace the ``.`` string with the full + namespace of your database, which consists of the name of your + database, a dot (e.g. ``.``), and the full name of the collection. + The ``shard-key-pattern`` represents your shard key, which you + specify in the same form as you would an :method:`index + ` key pattern. .. example:: The following sequence of commands shards four collections: diff --git a/source/tutorial/remove-shards-from-cluster.txt b/source/tutorial/remove-shards-from-cluster.txt index 72098d758d5..06e38b329f4 100644 --- a/source/tutorial/remove-shards-from-cluster.txt +++ b/source/tutorial/remove-shards-from-cluster.txt @@ -141,3 +141,125 @@ A success message appears at completion: When the value of "state" is "completed", you may safely stop the ``mongodb0`` shard. + +.. redundant below + +.. _sharding-procedure-remove-shard: + +Remove a Shard from a Cluster +----------------------------- + +To remove a :term:`shard` from a :term:`sharded cluster`, you must: + +- Migrate :term:`chunks ` to another shard or database. + +- Ensure that this shard is not the :term:`primary shard` for any databases in + the cluster. If it is, move the "primary" status for these databases + to other shards. + +- Finally, remove the shard from the cluster's configuration. + +.. note:: + + To successfully migrate data from a shard, the :term:`balancer` + process **must** be active. + +The procedure to remove a shard is as follows: + +#. Connect to a :program:`mongos` in the cluster using the + :program:`mongo` shell. + +#. Determine the name of the shard you will be removing. + + You must specify the name of the shard. You may have specified this + shard name when you first ran the :dbcommand:`addShard` command. If not, + you can find out the name of the shard by running the + :dbcommand:`listShards` or :dbcommand:`printShardingStatus` + commands or the :method:`sh.status()` shell method. + + The following examples will remove a shard named ``mongodb0`` from the cluster. + +#. Begin removing chunks from the shard. + + Start by running the :dbcommand:`removeShard` command. This will + start "draining" or migrating chunks from the shard you're removing + to another shard in the cluster. + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + This operation will return the following response immediately: + + .. code-block:: javascript + + { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } + + Depending on your network capacity and the amount of data in the + shard, this operation can take anywhere from a few minutes to several + days to complete. + +#. View progress of the migration. + + You can run the :dbcommand:`removeShard` command again at any stage of the + process to view the progress of the migration, as follows: + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + The output should look something like this: + + .. code-block:: javascript + + { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } + + In the ``remaining`` sub-document ``{ chunks: xx, dbs: y }``, a + counter displays the remaining number of chunks that MongoDB must + migrate to other shards and the number of MongoDB databases that have + "primary" status on this shard. + + Continue checking the status of the :dbcommand:`removeShard` command + until the remaining number of chunks to transfer is 0. + +#. Move any databases to other shards in the cluster as needed. + + This is only necessary when removing a shard that is also the + :term:`primary shard` for one or more databases. + + Issue the following command at the :program:`mongo` shell: + + .. code-block:: javascript + + db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) + + This command will migrate all remaining non-sharded data in the + database named ``myapp`` to the shard named ``mongodb1``. + + .. warning:: + + Do not run the :dbcommand:`movePrimary` command until you have *finished* + draining the shard. + + The command will not return until MongoDB completes moving all + data. The response from this command will resemble the following: + + .. code-block:: javascript + + { "primary" : "mongodb1", "ok" : 1 } + +#. Run :dbcommand:`removeShard` again to clean up all metadata + information and finalize the shard removal, as follows: + + .. code-block:: javascript + + db.runCommand( { removeShard: "mongodb0" } ) + + When successful, this command will return a document like this: + + .. code-block:: javascript + + { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } + +Once the value of the ``stage`` field is "completed," you may safely +stop the processes comprising the ``mongodb0`` shard. \ No newline at end of file From 52a4d374bca3b43f72de0dc23261b20148af1158 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Wed, 16 Jan 2013 18:07:05 -0500 Subject: [PATCH 12/15] DOCS-987 minor --- source/tutorial/add-shards-to-shard-cluster.txt | 4 +--- source/tutorial/deploy-shard-cluster.txt | 4 +--- 2 files changed, 2 insertions(+), 6 deletions(-) diff --git a/source/tutorial/add-shards-to-shard-cluster.txt b/source/tutorial/add-shards-to-shard-cluster.txt index 518c1d7ded2..a89e1b7df04 100644 --- a/source/tutorial/add-shards-to-shard-cluster.txt +++ b/source/tutorial/add-shards-to-shard-cluster.txt @@ -38,9 +38,7 @@ instance. #. Add each shard to the cluster using the :method:`sh.addShard()` method, as shown in the examples below. Issue :method:`sh.addShard()` - separately for each shard. - - If the shard is a replica set, specify the + separately for each shard. If the shard is a replica set, specify the name of the replica set and specify a member of the set. In production deployments, all shards should be replica sets. diff --git a/source/tutorial/deploy-shard-cluster.txt b/source/tutorial/deploy-shard-cluster.txt index 14c7072afe2..8cfecc7473a 100644 --- a/source/tutorial/deploy-shard-cluster.txt +++ b/source/tutorial/deploy-shard-cluster.txt @@ -142,9 +142,7 @@ should be a replica set. #. Add each shard to the cluster using the :method:`sh.addShard()` method, as shown in the examples below. Issue :method:`sh.addShard()` - separately for each shard. - - If the shard is a replica set, specify the + separately for each shard. If the shard is a replica set, specify the name of the replica set and specify a member of the set. In production deployments, all shards should be replica sets. From 897dd265bc58e710459802cd6a4386f058527a63 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Thu, 17 Jan 2013 13:21:07 -0500 Subject: [PATCH 13/15] DOCS-987 merged duplicate remove shard contents --- .../tutorial/remove-shards-from-cluster.txt | 280 ++++++------------ 1 file changed, 97 insertions(+), 183 deletions(-) diff --git a/source/tutorial/remove-shards-from-cluster.txt b/source/tutorial/remove-shards-from-cluster.txt index 06e38b329f4..6a12af04d3e 100644 --- a/source/tutorial/remove-shards-from-cluster.txt +++ b/source/tutorial/remove-shards-from-cluster.txt @@ -4,56 +4,67 @@ Remove Shards from an Existing Sharded Cluster .. default-domain:: mongodb -Synopsis --------- +To remove a :term:`shard` you must ensure the shard's data is migrated +to the remaining shards in the cluster. This procedure describes how to +safely migrate data and how to remove a shard. -This procedure describes the procedure for migrating data from a -shard safely, when you need to decommission a shard. You may also need -to remove shards as part of hardware reorganization and data -migration. - -*Do not* use this procedure to migrate an entire cluster to new -hardware. To migrate an entire shard to new hardware, migrate -individual shards as if they were independent replica sets. +This procedure describes how to safely remove a *single* shard. *Do not* +use this procedure to migrate an entire cluster to new hardware. To +migrate an entire shard to new hardware, migrate individual shards as if +they were independent replica sets. .. DOCS-94 will lead to a tutorial about cluster migrations. In the mean time the above section will necessarily lack links. -To remove a shard, you will: +To remove a shard, first connect to one of the cluster's +:program:`mongos` instances using :program:`mongo` shell. Then follow +the ordered sequence of tasks on this page: + +1. :ref:`remove-shard-ensure-balancer-is-active` + +#. :ref:`remove-shard-determine-name-shard` + +#. :ref:`remove-shard-remove-chunks` + +#. :ref:`remove-shard-check-migration-status` -- Move :term:`chunks ` off of the shard. +#. :ref:`remove-shard-move-unsharded-databases` -- Ensure that this shard is not the :term:`primary shard` for any databases - in the cluster. If it is, move the "primary" status for these - databases to other shards. +#. :ref:`remove-shard-finalize-migration` -- Remove the shard from the cluster's configuration. +.. _remove-shard-ensure-balancer-is-active: -Procedure ---------- +Ensure the Balancer Process is Active +------------------------------------- -Complete this procedure by connecting to any :program:`mongos` in the -cluster using the :program:`mongo` shell. +To successfully migrate data from a shard, the :term:`balancer` process +**must** be active. Check the balancer state using the +:method:`sh.getBalancerState()` helper in the :program:`mongo` shell. +For more information, see the section on :ref:`balancer operations +`. -You can only remove a shard by its shard name. To discover or -confirm the name of a shard, use the :dbcommand:`listShards` command, -:dbcommand:`printShardingStatus` command, or :method:`sh.status()` shell helper. +.. _remove-shard-determine-name-shard: -The example commands in this document remove a shard named ``mongodb0``. +Determine the Name of the Shard to Remove +----------------------------------------- -.. note:: +To determine the name of the shard, do one of the following: - To successfully migrate data from a shard, the :term:`balancer` - process **must** be active. Check the balancer state using the - :method:`sh.getBalancerState()` helper in the :program:`mongo` - shell. For more information, see the section on :ref:`balancer operations - `. +- From the ``admin`` database, run the :dbcommand:`listShards` command. + +- Run either the :method:`sh.status()` method or the + :method:`sh.printShardingStatus()` method. + +The ``shards._id`` field lists the name of each shard. + +.. _remove-shard-remove-chunks: Remove Chunks from the Shard -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +---------------------------- -Start by running the :dbcommand:`removeShard` command. This -begins "draining" chunks from the shard you are removing. +Run the :dbcommand:`removeShard` command. This begins "draining" chunks +from the shard you are removing to other shards in the cluster. For +example, for a shard named ``mongodb0``, run: .. code-block:: javascript @@ -65,201 +76,104 @@ This operation returns immediately, with the following response: { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } -Depending on your network capacity and the amount of data in your -cluster, this operation can take from a few minutes to -several days to complete. +Depending on your network capacity and the amount of data, this +operation can take from a few minutes to several days to complete. + +.. _remove-shard-check-migration-status: Check the Status of the Migration -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +--------------------------------- -To check the progress of the migration, -run :dbcommand:`removeShard` again at any stage of the -process, as follows: +To check the progress of the migration at any stage in the process, run +:dbcommand:`removeShard`. For example, for a shard named ``mongodb0``, run: .. code-block:: javascript db.runCommand( { removeShard: "mongodb0" } ) -The output resembles the following document: +The command returns output similar to the following: .. code-block:: javascript { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } -In the ``remaining`` sub document, a counter displays the remaining number +In the output, the ``remaining`` document displays the remaining number of chunks that MongoDB must migrate to other shards and the number of MongoDB databases that have "primary" status on this shard. Continue checking the status of the `removeShard` command until the -number of chunks remaining is 0. Then proceed to the next step. +number of chunks remaining is ``0``. Then proceed to the next step. -Move Unsharded Databases -~~~~~~~~~~~~~~~~~~~~~~~~ +.. _remove-shard-move-unsharded-databases: -Databases with non-sharded collections store those collections on a -single shard known as the :term:`primary shard` for that database. The -following step is necessary only when the shard to remove is -also the primary shard for one or more databases. +Move Unsharded Data +------------------- -Issue the following command at the :program:`mongo` shell: +If the shard is the :term:`primary shard` for one or more databases in +the cluster, then the shard will have unsharded data. If the shard is +not the primary shard for any databases, skip to the next task, +:ref:`remove-shard-finalize-migration`. -.. code-block:: javascript - - db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) - -This command migrates all remaining non-sharded data in the -database named ``myapp`` to the shard named ``mongodb1``. +In a cluster, a database with unsharded collections stores those +collections only on a single shard. That shard becomes the primary shard +for that database. (Different databases in a cluster can have different +primary shards.) .. warning:: - Do not run the :dbcommand:`movePrimary` until you have *finished* - draining the shard. - -This command will not return until MongoDB completes moving all data, -which may take a long time. The response from this command will -resemble the following: - -.. code-block:: javascript - - { "primary" : "mongodb1", "ok" : 1 } - -Finalize the Migration -~~~~~~~~~~~~~~~~~~~~~~ - -Run :dbcommand:`removeShard` again to clean up all metadata -information and finalize the removal, as follows: - -.. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - -A success message appears at completion: - -.. code-block:: javascript - - { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } - -When the value of "state" is "completed", you may safely stop the -``mongodb0`` shard. - -.. redundant below - -.. _sharding-procedure-remove-shard: - -Remove a Shard from a Cluster ------------------------------ - -To remove a :term:`shard` from a :term:`sharded cluster`, you must: - -- Migrate :term:`chunks ` to another shard or database. - -- Ensure that this shard is not the :term:`primary shard` for any databases in - the cluster. If it is, move the "primary" status for these databases - to other shards. + Do not perform this procedure until you have finished draining the + shard. -- Finally, remove the shard from the cluster's configuration. +1. To determine if the shard your are removing is the primary shard for + any of the cluster's databases, issue one of the following methods: -.. note:: + - :method:`sh.status()` - To successfully migrate data from a shard, the :term:`balancer` - process **must** be active. + - :method:`sh.printShardingStatus()` -The procedure to remove a shard is as follows: - -#. Connect to a :program:`mongos` in the cluster using the - :program:`mongo` shell. - -#. Determine the name of the shard you will be removing. - - You must specify the name of the shard. You may have specified this - shard name when you first ran the :dbcommand:`addShard` command. If not, - you can find out the name of the shard by running the - :dbcommand:`listShards` or :dbcommand:`printShardingStatus` - commands or the :method:`sh.status()` shell method. - - The following examples will remove a shard named ``mongodb0`` from the cluster. - -#. Begin removing chunks from the shard. - - Start by running the :dbcommand:`removeShard` command. This will - start "draining" or migrating chunks from the shard you're removing - to another shard in the cluster. - - .. code-block:: javascript - - db.runCommand( { removeShard: "mongodb0" } ) - - This operation will return the following response immediately: + In the resulting document, the ``databases`` field lists each + database and its primary shard. For example, the following + ``database`` field shows that the ``products`` database uses + ``mongodb0`` as the primary shard: .. code-block:: javascript - { msg : "draining started successfully" , state: "started" , shard :"mongodb0" , ok : 1 } - - Depending on your network capacity and the amount of data in the - shard, this operation can take anywhere from a few minutes to several - days to complete. - -#. View progress of the migration. + { "_id" : "products", "partitioned" : true, "primary" : "mongodb0" } - You can run the :dbcommand:`removeShard` command again at any stage of the - process to view the progress of the migration, as follows: +#. To move a database to another shard, use the :dbcommand:`movePrimary` + command. For example, to migrate all remaining unsharded data from + ``mongodb0`` to ``mongodb1``, issue the following command: .. code-block:: javascript - db.runCommand( { removeShard: "mongodb0" } ) + db.runCommand( { movePrimary: "products", to: "mongodb1" }) - The output should look something like this: + This command does not return until MongoDB completes moving all data, + which may take a long time. The response from this command will + resemble the following: .. code-block:: javascript - { msg: "draining ongoing" , state: "ongoing" , remaining: { chunks: 42, dbs : 1 }, ok: 1 } - - In the ``remaining`` sub-document ``{ chunks: xx, dbs: y }``, a - counter displays the remaining number of chunks that MongoDB must - migrate to other shards and the number of MongoDB databases that have - "primary" status on this shard. - - Continue checking the status of the :dbcommand:`removeShard` command - until the remaining number of chunks to transfer is 0. - -#. Move any databases to other shards in the cluster as needed. - - This is only necessary when removing a shard that is also the - :term:`primary shard` for one or more databases. - - Issue the following command at the :program:`mongo` shell: - - .. code-block:: javascript - - db.runCommand( { movePrimary: "myapp", to: "mongodb1" }) - - This command will migrate all remaining non-sharded data in the - database named ``myapp`` to the shard named ``mongodb1``. - - .. warning:: - - Do not run the :dbcommand:`movePrimary` command until you have *finished* - draining the shard. - - The command will not return until MongoDB completes moving all - data. The response from this command will resemble the following: + { "primary" : "mongodb1", "ok" : 1 } - .. code-block:: javascript +.. _remove-shard-finalize-migration: - { "primary" : "mongodb1", "ok" : 1 } +Finalize the Migration +---------------------- -#. Run :dbcommand:`removeShard` again to clean up all metadata - information and finalize the shard removal, as follows: +To clean up all metadata information and finalize the removal, run +:dbcommand:`removeShard` again. For example, for a shard named +``mongodb0``, run: - .. code-block:: javascript +.. code-block:: javascript - db.runCommand( { removeShard: "mongodb0" } ) + db.runCommand( { removeShard: "mongodb0" } ) - When successful, this command will return a document like this: +A success message appears at completion: - .. code-block:: javascript +.. code-block:: javascript - { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } + { msg: "remove shard completed successfully" , stage: "completed", host: "mongodb0", ok : 1 } -Once the value of the ``stage`` field is "completed," you may safely -stop the processes comprising the ``mongodb0`` shard. \ No newline at end of file +Once the value of the ``stage`` field is "completed", you may safely +stop the processes comprising the ``mongodb0`` shard. From 6041b12a3ca7ae012450fc0b527b26d670496edc Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Thu, 17 Jan 2013 13:27:51 -0500 Subject: [PATCH 14/15] DOCS-987 typo --- source/tutorial/remove-shards-from-cluster.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/tutorial/remove-shards-from-cluster.txt b/source/tutorial/remove-shards-from-cluster.txt index 6a12af04d3e..05a15614096 100644 --- a/source/tutorial/remove-shards-from-cluster.txt +++ b/source/tutorial/remove-shards-from-cluster.txt @@ -124,7 +124,7 @@ primary shards.) Do not perform this procedure until you have finished draining the shard. -1. To determine if the shard your are removing is the primary shard for +1. To determine if the shard you are removing is the primary shard for any of the cluster's databases, issue one of the following methods: - :method:`sh.status()` From 9a9a617fc85ebb10b8d198c119f246fbe69b7cf9 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Thu, 17 Jan 2013 13:50:46 -0500 Subject: [PATCH 15/15] DOCS-987 added explanatory text to main sharding page --- source/sharding.txt | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/source/sharding.txt b/source/sharding.txt index 6ea0571ce21..5e14823ea88 100644 --- a/source/sharding.txt +++ b/source/sharding.txt @@ -11,6 +11,8 @@ of machines. Sharding uses range-based portioning to distribute Sharding Concepts ----------------- +The following pages describe how sharding works and how to leverage sharding. + .. toctree:: :maxdepth: 1 @@ -23,6 +25,8 @@ Sharding Concepts Set Up and Manage Sharded Clusters ---------------------------------- +The following pages give procedures for setting up and maintaining sharded clusters. + .. toctree:: :maxdepth: 1 @@ -39,6 +43,8 @@ Set Up and Manage Sharded Clusters Backup Sharded Clusters ----------------------- +The following pages describe different approaches to backing up sharded clusters. + .. toctree:: :maxdepth: 1 @@ -52,6 +58,8 @@ Backup Sharded Clusters FAQs, Advanced Concepts, and Troubleshooting -------------------------------------------- +The following pages provide information for troubleshooting sharded clusters and for a deeper understanding of sharding. + .. toctree:: :maxdepth: 1 @@ -62,6 +70,8 @@ FAQs, Advanced Concepts, and Troubleshooting Reference --------- +The following pages list the methods and commands used in sharding. + .. toctree:: :maxdepth: 1