mongodb · tychoish · Nov 16, 2012 · Nov 16, 2012
diff --git a/draft/core/storage-viz.txt b/draft/core/storage-viz.txt
@@ -0,0 +1,271 @@
+===========
+Storage-viz
+===========
+
+"Storage-viz" is suite of tools that can be used to diagnose issues or
+assess proposed changes related to storage allocation strategy and index
+balancing heuristics.
+
+The ``storageDetails`` command will aggregate statistics related to the
+storage layout (when invoked with ``analyze: "diskStorage"``) or the percentage
+of pages currently in RAM (when invoked with ``analyze: "pagesInRAM"``) for the
+specified collection, extent or part of extent.
+
+The ``indexStats`` command provides detailed and aggregate information and
+statistics for the underlying btree of a particular index.
+Stats are aggregated for the entire tree, per-depth and, if requested through
+the ``expandNodes`` option, per-subtree.
+
+Both commands take a global READ_LOCK and will page in all the extents or btree
+buckets encountered: this will have adverse effects on server performance.
+The commands should never be run on a primary and will cause a secondary to
+fall behind on replication. ``diskStorage`` when run with
+``analyze: "pagesInRAM"`` is the exception as it typically returns rapidly and
+may only page in extent headers.
+
+.. default-domain:: mongodb
+
+.. dbcommand:: storageDetails
+
+   The command can be slow, particularly on larger data sets.
+
+   .. code-block:: javascript
+
+      { storageDetails: "collection_name",
+        analyze: "diskStorage" | "pagesInRAM" }
+
+   This command will aggregate statistics related to the storage layout
+   (when invoked with ``analyze: "diskStorage"``) or the percentage of pages
+   currently in RAM (when invoked with ``analyze: "pagesInRAM"``) for the
+   specified collection.
+   You may also specify one of the following options:
+
+   - ``extent: 4`` (0-based) only processes the 5th extent of the collection
+
+   - ``range: [start, end]`` only processes the range between ``start`` bytes
+     and ``end`` bytes from the start of the extent. Requires an ``extent`` to
+     be specified.
+
+   - ``granularity: 1 << 20`` splits the extents in 20MB slices and
+     reports statistics aggregated per-slice.
+
+   - ``numberOfSlices: 100`` splits the extent(s) in 100 slices and
+     reports statistics aggregated per-slice.
+
+   ``granularity`` and ``numberOfSlices`` are mutually exclusive.
+
+   - ``characteristicField: "dotted.path"`` specifies a field in the
+     documents of the collection to be inspected and averaged to give
+     an hint on what kind of documents belong to an extent or slice.
+     Defaults to ``"_id"``. ObjectIDs, any number and Dates are
+     supported. If the field has the wrong type in some documents
+     it would be silently ignored.
+
+   - ``processDeletedRecords: false`` disables the analysis of deleted
+     records which can be slow as it requires an iteration on all
+     the deletedList bucket for each extent. Defaults to ``true``.
+
+   - ``showRecords: true`` outputs basic information for each document
+     and deletedRecord encountered. It should only be enabled for small
+     ranges on single extents. Produces large output which can exceed
+     the maximum bson object size.
+
+   The typical output, when ``analyze: 'diskStorage'``, has the form:
+
+   .. code-block:: javascript
+
+           { extentHeaderBytes: <size>,
+             recordHeaderBytes: <size>,
+             range: [startOfs, endOfs],     // extent-relative
+             numEntries: <number of records>,
+             bsonBytes: <total size of the bson objects>,
+             recBytes: <total size of the valid records>,
+             onDiskBytes: <length of the extent or range>,
+       (opt) characteristicCount: <number of records containing the field used to tell them apart>
+       (opt) characteristicAvg: <average value of the characteristic field>
+             outOfOrderRecs: <number of records that follow - in the record linked list -
+                              a record that is located further in the extent>
+       (opt) freeRecsPerBucket: [ ... ],
+
+   The nth element in the ``freeRecsPerBucket`` array is the count of deleted records in the
+   nth bucket of the deletedList.
+   ``characteristicCount`` and ``characteristicAvg`` are only present if some documents contain
+   the field specified as ``characteristicField`` and it has a viable type (any number, ObjectID
+   or Date).
+
+   The list of slices follows, with similar information aggregated per-slice:
+
+   .. code-block:: javascript
+
+         slices: [
+             { numEntries: <number of records>,
+               ...
+               freeRecsPerBucket: [ ... ]
+             },
+             ...
+         ]
+
+   If ``showRecords: true`` was set two additional fields are added to the outer document:
+
+   .. code-block:: javascript
+
+             records: [
+                 { ofs: <record offset from start of extent>,
+                   recBytes: <record size>,
+                   bsonBytes: <bson document size>,
+        (optional) characteristic: <value of the characteristic field>
+                 }, 
+                 ... (one element per record)
+             ],
+        (optional) deletedRecords: [
+                 { ofs: <offset from start of extent>,
+                   recBytes: <deleted record size>
+                 },
+                 ... (one element per deleted record)
+             ]
+
+   The typical output, when ``analyze: 'pagesInRAM'``, has the form:
+
+           { pageBytes: <system page size>,
+             onDiskBytes: <size of the extent>,
+             inMem: <ratio of pages in memory for the entire extent>,
+       (opt) slices: [ ... ] (only present if either params.granularity or numberOfSlices is not
+                              zero and there exist more than one slice for this extent)
+       (opt) sliceBytes: <size of each slice>
+           }
+
+   The :program:`mongo` shell also provides wrappers:
+
+   .. code-block:: javascript
+
+      db.collection.diskStorageStats();
+      db.collection.pagesInRAM();
+
+      db.collection.getDiskStorageStats();
+      db.collection.getPagesInRAM();
+
+   ``diskStorageStats`` analyzes storage for the collection
+   (equivalent to invoking the command with ``{analyze: "diskStorage"}``).
+
+   ``pagesInRAM`` reports the percentage of pages in RAM for the collection
+   (equivalent to invoking the command with ``{analyze: "pagesInRAM"}``).
+
+   ``db.collection.getDiskStorageStats`` and ``db.collection.getPagesInRAM``
+   take the same parameters as ``diskStorageStats`` and ``pagesInRAM``,
+   respectively, and provide a human-readable representation of the output.
+
+
+   .. warning:: This command is resource intensive and may have an
+      impact on the performance of your MongoDB instance. It also requires
+      the entire collection or extent to be loaded in RAM and it may
+      end up evicting some of the pages from other collections or extents.
+
+   .. read-lock
+
+.. dbcommand:: indexStats
+
+   The command can be slow, particularly on large indexes.
+
+   .. code-block:: javascript
+
+      { indexStats: "collection_name",
+        index: "index_name" }
+
+   This command provides detailed and aggregate information and statistics for the underlying
+   btree for the index ``index_name`` in the collection ``collection_name``.
+   Stats are aggregated for the entire tree, per-depth and, if requested through the ``expandNodes``
+   option, per-subtree.
+
+   You can specify ``expandNodes: [0, 3]`` to expand the root (node 0 at depth 0) and the 4th child
+   of root (node 3 at depth 1). The first element of the array should always be 0 otherwise no
+   node will be expanded (there's only root ad depth 0). This will provide basic information about
+   the expanded nodes and statistics for the subtrees rooted at the nodes themselves.
+
+   The typical output has the form:
+
+   .. code-block:: javascript
+
+         { name: <index name>,
+           version: <index version (0 or 1),
+           isIdKey: <true if this is the default _id index>,
+           keyPattern: <bson object describing the key pattern>,
+           storageNs: <namespace of the index's underlying storage>,
+           bucketBodyBytes: <bytes available for keynodes and bson objects in the bucket's body>,
+           depth: <index depth (root excluded)>
+           overall: { (statistics for the entire tree)
+               numBuckets: <number of buckets (samples)>
+               keyCount: { (stats about the number of keys in a bucket)
+                   count: <number of samples>,
+                   mean: <mean>
+        (optional) stddev: <standard deviation>
+        (optional) min: <minimum value (number of keys for the bucket that has the least)>
+        (optional) max: <maximum value (number of keys for the bucket that has the most)>
+        (optional) quantiles: {
+                       0.01: <1st percentile>, 0.02: ..., 0.09: ..., 0.25: <1st quartile>,
+                       0.5: <median>, 0.75: <3rd quartile>, 0.91: ..., 0.98: ..., 0.99: ...
+                   }
+        (optional fields are only present if there are enough samples to compute sensible
+         estimates)
+               }
+               usedKeyCount: <stats about the number of used keys in a bucket>
+                   (same structure as keyCount)
+               bsonRatio: <stats about how much of the bucket body is occupied by bson objects>
+                   (same structure as keyCount)
+               keyNodeRatio: <stats about how much of the bucket body is occupied by KeyNodes>
+                   (same structure as keyCount)
+               fillRatio: <stats about how full is the bucket body (bson objects + KeyNodes)>
+                   (same structure as keyCount)
+           },
+           perLevel: [ (statistics aggregated per depth)
+               (one element with the same structure as 'overall' for each btree level,
+                the first refers to the root)
+           ]
+         }
+
+   If 'expandNodes: [array]' was specified in the parameters, an additional field named
+   'expandedNodes' is included in the output. It contains two nested arrays, such that the
+   n-th element of the outer array contains stats for nodes at depth n (root is included) and
+   the i-th element (0-based) of the inner array at depth n contains stats for the subtree
+   rooted at the i-th child of the expanend node at depth (n - 1).
+   Each element of the inner array has the same structure as 'overall' in the description above:
+   it includes the aggregate stats for all the nodes in the subtree excluding the current
+   bucket.
+   It also contains an additional field 'nodeInfo' representing information for the current
+   node:
+
+   .. code-block:: javascript
+
+         { childNum: <i so that this is the (i + 1)-th child of the parent node>
+           keyCount: <number of keys in this bucket>
+           usedKeyCount: <number of non-empty KeyNodes>
+           diskLoc: { (bson representation of the disk location for this bucket)
+               file: <num>
+               offset: <bytes>
+           }
+           depth: <depth of this bucket, root is at depth 0>
+           fillRatio: <a value between 0 and 1 representing how full this bucket is>
+           firstKey: <bson object containing the value for the first key>
+           lastKey: <bson object containing the value for the last key>
+         }
+
+   The :program:`mongo` shell also provides wrappers:
+
+   .. code-block:: javascript
+
+      db.collection.indexStats({index: "index_name"});
+      db.collection.getIndexStats({index: "index_name"});
+
+      ``db.collection.indexStats({index: "index_name"})`` is equivalent to running the command
+      with {indexStats: "collection", index: "index_name"}.
+
+      ``db.collection.getIndexStats`` takes the same parameters as ``indexStats`` and provides
+      a human-readable summary of the output in the shell.
+
+   .. warning:: This command is resource intensive and may have an
+      impact on the performance of your MongoDB instance. It also requires
+      the entire collection or extent to be loaded in RAM and it may
+      end up evicting some of the pages from other collections or extents.
+
+   .. read-lock
+
+