From 81a81c8fad68766424e7a4db928c8da92e680a5b Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Mon, 29 Oct 2012 09:36:09 -0400 Subject: [PATCH 1/3] DOCS-293 migrate Bulk Inserts page --- source/administration/sharding.txt | 66 ++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index 15c2435042f..f9251907fe6 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -767,6 +767,72 @@ to pre-splitting. .. todo:: insert link to killing a cursor. +.. index:: bulk insert +.. _sharding-bulk-inserts: + +Bulk Inserts and Sharding +~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. todo link the words "bulk insert" to the bulk insert topic when it's + published + +When performing a bulk insert into a :term:`sharded collection`, consider +the following: + +- If the collection is not yet populated, MongoDB must take time to + "learn" what the key distribution is and how to distribute the data. + To avoid this performance cost, you can pre-split the collection, as + described in :ref:`sharding-administration-pre-splitting`. + +- You can parallel import by sending inserts to multiple + :program:`mongos` instances. If the collection is empty, pre-split + first, as described in :ref:`sharding-administration-pre-splitting`. + +Monotonically Increasing Shard Key Values +````````````````````````````````````````` + +If your shard key monotonically increases during an insert then all the +inserts will go to the last chunk in the collection. The system will +adjust the metadata to keep balance, but at a given time ``t`` all +writes will be going to a single shard, which is undesirable if insert +rate is extremely large. A large insert is one in which the insert +volume is beyond the range that a single shard can process at a given +point in time. Increasing values are fine if the insert volume is within +the range the shard can process. + +To avoid sending more writes than a shard can process, use a shard key +that is not increasing in value. For example in some cases you could +reverse all the bits of your shard key, which preserves information +while avoiding the increasing sequence of values. + +:term:`BSON` :term:`ObjectIds ` increase in value with each +insert. To more evenly distribute inserts based on this property, you +might want at generation time to reverse the bits of the ObjectIds or to +swap the first and last 16-bit words, to "shuffle" the inserts. +Alternatively you might use UUIDs instead, but check that your UUID +generator does not generate consistent increasing UUIDs, which would +cause the same behavior. + +.. example:: The following example, in C++, swaps the leading and + trailing 16-bit word of object IDs generated so that they are no + longer monotonically increasing. + + .. code-block:: none + + using namespace mongo; + OID make_an_id() { + OID x = OID::gen(); + const unsigned char *p = x.getData(); + swap( (unsigned short&) p[0], (unsigned short&) p[10] ); + return x; + } + + void foo() { + // create an object + BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); + // now we might insert o into a sharded collection... + } + .. index:: balancing; operations .. _sharding-balancing-operations: From 2a22699040fd45d4ac5cb5ca5e3ea352ac79622c Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Mon, 29 Oct 2012 09:49:01 -0400 Subject: [PATCH 2/3] DOCS-293 minor edits --- source/administration/sharding.txt | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index f9251907fe6..62408e7b8bc 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -805,13 +805,16 @@ that is not increasing in value. For example in some cases you could reverse all the bits of your shard key, which preserves information while avoiding the increasing sequence of values. -:term:`BSON` :term:`ObjectIds ` increase in value with each -insert. To more evenly distribute inserts based on this property, you -might want at generation time to reverse the bits of the ObjectIds or to -swap the first and last 16-bit words, to "shuffle" the inserts. -Alternatively you might use UUIDs instead, but check that your UUID -generator does not generate consistent increasing UUIDs, which would -cause the same behavior. +:term:`BSON` :term:`ObjectIds ` are one case of a value that +monotonically increases during an insert. If you use :term:`ObjectId` as +a shard key, then you can do either of the following at generation time +to more evenly distribute inserts based on this property: + +- Reverse the bits of the ObjectIds, or +- Swap the first and last 16-bit words, to "shuffle" the inserts. +- Use UUIDs instead, but check that your UUID generator does not + generate consistent increasing UUIDs, which would cause the same + behavior. .. example:: The following example, in C++, swaps the leading and trailing 16-bit word of object IDs generated so that they are no From 0b877a89f537fef270037efe5debe61c64bab599 Mon Sep 17 00:00:00 2001 From: Bob Grabar Date: Mon, 29 Oct 2012 13:21:13 -0400 Subject: [PATCH 3/3] DOCS-293 review edits --- source/administration/sharding.txt | 91 ++++++++++++++---------------- source/core/sharding-internals.txt | 2 + 2 files changed, 44 insertions(+), 49 deletions(-) diff --git a/source/administration/sharding.txt b/source/administration/sharding.txt index 62408e7b8bc..9714980a2a3 100644 --- a/source/administration/sharding.txt +++ b/source/administration/sharding.txt @@ -770,8 +770,11 @@ to pre-splitting. .. index:: bulk insert .. _sharding-bulk-inserts: -Bulk Inserts and Sharding -~~~~~~~~~~~~~~~~~~~~~~~~~ +Bulk Insert Strategies +~~~~~~~~~~~~~~~~~~~~~~ + +.. todo Consider moving to the administrative guide as it's of an applied nature, + or create an applications document for sharding .. todo link the words "bulk insert" to the bulk insert topic when it's published @@ -788,53 +791,43 @@ the following: :program:`mongos` instances. If the collection is empty, pre-split first, as described in :ref:`sharding-administration-pre-splitting`. -Monotonically Increasing Shard Key Values -````````````````````````````````````````` - -If your shard key monotonically increases during an insert then all the -inserts will go to the last chunk in the collection. The system will -adjust the metadata to keep balance, but at a given time ``t`` all -writes will be going to a single shard, which is undesirable if insert -rate is extremely large. A large insert is one in which the insert -volume is beyond the range that a single shard can process at a given -point in time. Increasing values are fine if the insert volume is within -the range the shard can process. - -To avoid sending more writes than a shard can process, use a shard key -that is not increasing in value. For example in some cases you could -reverse all the bits of your shard key, which preserves information -while avoiding the increasing sequence of values. - -:term:`BSON` :term:`ObjectIds ` are one case of a value that -monotonically increases during an insert. If you use :term:`ObjectId` as -a shard key, then you can do either of the following at generation time -to more evenly distribute inserts based on this property: - -- Reverse the bits of the ObjectIds, or -- Swap the first and last 16-bit words, to "shuffle" the inserts. -- Use UUIDs instead, but check that your UUID generator does not - generate consistent increasing UUIDs, which would cause the same - behavior. - -.. example:: The following example, in C++, swaps the leading and - trailing 16-bit word of object IDs generated so that they are no - longer monotonically increasing. - - .. code-block:: none - - using namespace mongo; - OID make_an_id() { - OID x = OID::gen(); - const unsigned char *p = x.getData(); - swap( (unsigned short&) p[0], (unsigned short&) p[10] ); - return x; - } - - void foo() { - // create an object - BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); - // now we might insert o into a sharded collection... - } +- If your shard key monotonically increases during an insert then all + the inserts will go to the last chunk in the collection, which is + undesirable if the insert volume is beyond the range that a single + shard can process at a given point in time. + + If the insert volume exceeds that range, and if you can't avoid + picking a monotonically increasing shard key, then you can do either + of the following at generation time to more evenly distribute inserts: + + - Reverse all the bits of your shard key, which preserves information + while avoiding the increasing sequence of values. + - Swap the first and last 16-bit words, to "shuffle" the inserts. + + .. example:: The following example, in C++, swaps the leading and + trailing 16-bit word of :term:`BSON` :term:`ObjectIds ` + generated so that they are no longer monotonically increasing. + + .. code-block:: cpp + + using namespace mongo; + OID make_an_id() { + OID x = OID::gen(); + const unsigned char *p = x.getData(); + swap( (unsigned short&) p[0], (unsigned short&) p[10] ); + return x; + } + + void foo() { + // create an object + BSONObj o = BSON( "_id" << make_an_id() << "x" << 3 << "name" << "jane" ); + // now we might insert o into a sharded collection... + } + + For information on choosing a shard key, see :ref:`sharding-shard-key` + and see :ref:`Shard Key Internals ` (in + particular, :ref:`sharding-internals-operations-and-reliability` and + :ref:`sharding-internals-choose-shard-key`). .. index:: balancing; operations .. _sharding-balancing-operations: diff --git a/source/core/sharding-internals.txt b/source/core/sharding-internals.txt index 356f68ce922..4886455f388 100644 --- a/source/core/sharding-internals.txt +++ b/source/core/sharding-internals.txt @@ -190,6 +190,8 @@ wait for a response from every shard before it can merge the results and return data. If you require high performance sorted queries, ensure that the sort key is a component of the shard key. +.. _sharding-internals-operations-and-reliability: + Operations and Reliability ~~~~~~~~~~~~~~~~~~~~~~~~~~