Skip to content

Commit 52981c0

Browse files
author
Bob Grabar
committed
DOCS-233 read operations: draft 4
1 parent 250800a commit 52981c0

File tree

2 files changed

+129
-166
lines changed

2 files changed

+129
-166
lines changed

draft/core/read-operations.txt

Lines changed: 122 additions & 166 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,10 @@ Read Operations
44

55
.. default-domain:: mongodb
66

7-
This document how MongoDB performs read operations.
7+
Read operations determine how MongoDB returns collection data when you issue a query.
88

9-
MongodDB uses read operations when you retrieve collection data by using
10-
a query.
9+
This document describes how MongoDB performs read operations and how
10+
different factors affect the efficiency of reads.
1111

1212
.. TODO intro and high-level read operations info
1313

@@ -20,6 +20,10 @@ a query.
2020
Query Operations
2121
----------------
2222

23+
Queries retrieve data from your database collections. How a query
24+
retrieves data is dependent on MongoDB read operations and on the
25+
indexes you have created.
26+
2327
.. _read-operations-query-syntax:
2428

2529
Query Syntax
@@ -29,111 +33,21 @@ For a list of query operators, see :doc:`/reference/operators`.
2933

3034
.. TODO see the yet-to-be created query operations doc
3135

32-
.. _read-operations-query-optimization:
33-
34-
Query Optimization
35-
~~~~~~~~~~~~~~~~~~
36-
37-
The MongoDB query optimizer matches a query to the best index for
38-
performing that query. When the optimizer finds the best index, it
39-
creates a query plan so that the query will always use the specified
40-
index.
41-
42-
The MongoDB query optimizer deletes a query plan when a collection has
43-
changed to a point that the the specified index might no longer provide
44-
the fastest results.
45-
46-
Query plans take advantage of MongoDB's indexing features. You should
47-
always write indexes that use the same fields and sort in the same order
48-
as do your queries.
49-
50-
MongoDB creates a query plan as follows: When you run a query for which
51-
there is no query plan, either because the query is new or the old plan
52-
is obsolete, the query optimizer runs the query against several indexes
53-
at once in parallel. Though the optimizer queries the indexes in
54-
parallel, it records the results as though all coming from one index.
55-
The optimizer records all matches in a single common buffer.
56-
57-
As each index yields a match, MongoDB records the match in the buffer.
58-
If an index returns a result already returned by another index, the
59-
optimizer recognizes the duplication and skips recording the match
60-
a second time.
61-
62-
The optimizer determines a "winning" index and stops querying when either of
63-
the following occur:
64-
65-
- The optimizer exhausts an index, which means that index has provided
66-
the full result set the fastest.
67-
68-
- The optimizer reaches 101 results. At that point, the optimizer
69-
chooses the plan that has provided the most results *first* and
70-
continues reading only from that plan. Note that another index might
71-
have provided all those results as duplicates but because the "winning"
72-
index provided the results faster, it is the most efficient index.
73-
74-
The "winning" index now becomes the index specified in the query plan as
75-
the one to use the next time that query is run.
76-
77-
To evaluate the optimizer's choice of query plan, run the query again
78-
with the :method:`explain() <cursor.explain()>` method and
79-
:method:`hint() <cursor.hint()>` methods appended. This returns
80-
statistics about how the query runs. (It returns the statistics in place
81-
of returning the query results.)
82-
83-
.. code-block:: javascript
84-
85-
db.people.find( { name:"John"} ).explain().hint()
86-
87-
.. For details on the output of the :method:`explain()
88-
<cursor.explain()>` method, see ...
89-
90-
If you run :method:`explain() <cursor.explain()>` without including
91-
:method:`hint() <cursor.hint()>`, the query optimizer will re-evaluate
92-
the query, running multiple query plans, before it returns the query
93-
statistics. Unless you want the optimizer to re-evaluate the query, do
94-
not leave off :method:`hint() <cursor.hint()>`.
95-
96-
Because your collections will likely change over time, the query
97-
optimizer uses the query plan only to a certain point.
98-
99-
.. Order of buffer results is different because coming from different
100-
indexes. Not ordered on one index.
101-
102-
.. Sorting >> all query plans are ordered vs none vs some.
103-
104-
.. "Optimal" is determined from a past run of multiple plans. But that
105-
cache gets cleared if there's been multiple writes.
106-
107-
.. Speculative scan of multiple plans.
108-
109-
.. Sparce indexes can change a result set.
110-
111-
.. Interleaving of results sets from multiple indexes ocurrs only when
112-
query plan is being determined. Once query plan is cached, then it's
113-
going to use one index.
114-
115-
.. What validates a cache: 1000 doc writes (not write operations but
116-
actual doc writes). Also if reindex or restart mongod.
117-
118-
.. Interweaving/leaving plans is done with cursor.
119-
120-
.. Dupe on disk lock and not on ID.
121-
122-
.. First time it runs the query (the first time it picks a query plan),
123-
it runs union of all query plans deemed to be potentially useful to
124-
return results set. Second time run the same query, it runs a single
125-
query plan.
36+
.. _read-operations-indexing:
12637

127-
.. Therefore, you can run the same query twice in a row and get the
128-
same results ordered differently.
38+
Indexes
39+
~~~~~~~
12940

130-
.. And when you run explain, you also get different statistics.
41+
Indexes significantly reduce the amount of work needed for query read
42+
operations. Indexes record specified keys and key values and the disk
43+
locations of the documents containing those values.
13144

132-
.. END OF MY NOTES ON THE TECH TALK, EXCEPT FOR THE NOTES ON SPECIFIC
133-
OPTIMIZATION OPERATORS, such as $elemMatch
45+
Indexes are typically stored in RAM *or* located sequentially on disk,
46+
and indexes are smaller than the documents they catalog. When a query
47+
can use an index, the read operation is significantly faster than when
48+
the query must scan all documents in a collection.
13449

135-
Selective Indexes Return Fastest Results
136-
````````````````````````````````````````
50+
MongoDB represents indexes internally as B-trees.
13751

13852
The most selective indexes return the fastest results. The most
13953
selective index possible for a given query is an index for which all the
@@ -169,91 +83,116 @@ documents that match the query criteria also match the entire query.
16983
Conversely, not all the documents that match the query's ``x`` key
17084
value also match the entire query.
17185

172-
.. _read-operations-projection:
86+
.. seealso::
17387

174-
Projection
175-
~~~~~~~~~~
88+
- The :doc:`/core/indexes` documentation, in particular :doc:`/applications/indexes`
89+
- :doc:`/reference/operators`
90+
- :method:`find <db.collection.find()>`
91+
- :method:`findOne`
17692

177-
A projection specifies which field values a query should return for
178-
matching documents. If you run a query *without* a projection, the query
179-
returns all fields and values for matching documents, which can
180-
add unnecessary network and deserialization costs.
93+
.. _read-operations-query-optimization:
18194

182-
MongoDB provides special projection operators that let you specify the
183-
fields to return. For documentation on each operator, click the operator name:
95+
Query Optimization
96+
~~~~~~~~~~~~~~~~~~
18497

185-
- :projection:`$elemMatch`
98+
MongoDB provides a query optimizer that matches a query to the index
99+
that performs the fastest read operation for that query.
186100

187-
- :projection:`$slice`
101+
When you issue a query for the first time, the query optimizer runs the
102+
query against several indexes to find the most efficient. The optimizer
103+
then creates a "query plan" that specifies the index for future runs of
104+
the query.
188105

189-
.. _read-operations-indexing:
106+
The MongoDB query optimizer deletes a query plan when a collection has
107+
changed to a point that the the specified index might no longer provide
108+
the fastest results.
190109

191-
Indexing
192-
~~~~~~~~
110+
Query plans take advantage of MongoDB's indexing features. You should
111+
always write indexes that use the same fields and that sort in the same
112+
order as do your queries. For more information, see :doc:`/applications/indexes`.
193113

194-
Indexes significantly reduce the amount of work needed for query read
195-
operations. Indexes record specified keys and key values and the disk
196-
locations of the documents containing those values.
114+
MongoDB creates a query plan as follows: When you run a query for which
115+
there is no query plan, either because the query is new or the old plan
116+
is obsolete, the query optimizer runs the query against several indexes
117+
at once in parallel but records the results in a single common buffer,
118+
as though the results all come from the same index. As each index yields
119+
a match, MongoDB records the match in the buffer. If an index returns a
120+
result already returned by another index, the optimizer recognizes the
121+
duplication and skips the duplicate match.
122+
123+
The optimizer determines a "winning" index when either of
124+
the following occur:
125+
126+
- The optimizer exhausts an index, which means that the index has
127+
provided the full result set. At this point, the optimizer stops
128+
querying.
129+
130+
- The optimizer reaches 101 results. At this point, the optimizer
131+
chooses the plan that has provided the most results *first* and
132+
continues reading only from that plan. Note that another index might
133+
have provided all those results as duplicates but because the
134+
"winning" index provided the full result set first, it is more
135+
efficient.
197136

198-
Without indexes, MongoDB must scan all documents to return query
199-
results.
137+
The "winning" index now becomes the index specified in the query plan as
138+
the one to use the next time the query is run.
200139

201-
The order of index keys matters.
140+
To evaluate the optimizer's choice of query plan, run the query again
141+
with the :method:`explain() <cursor.explain()>` method and
142+
:method:`hint() <cursor.hint()>` methods appended. Instead of returning
143+
query results, this returns statistics about how the query runs. For example:
202144

203-
In order to fulfill a multi-field query using an index, the query
204-
optimizer first searches the index for the first field in the query.
205-
When the first instance of that entry is found, the query then searches
206-
for the next field within the index entries for the first field.
145+
.. code-block:: javascript
207146

208-
If you structure your index such that the first field ...
147+
db.people.find( { name:"John"} ).explain().hint()
209148

210-
As a general rule, a query where one term demands an exact match and
211-
another specifies a range requires a com- pound index where the range
212-
key comes second.
149+
For details on the output, see :method:`explain() <cursor.explain()>`.
213150

214-
When you create indexes, you must do so with your queries in mind. A
215-
query can use only one index and therefore you must create indexes that
216-
include all the fields in a given query.
151+
.. note::
217152

218-
Because indexes take up space and because MongoDB writes to an index
219-
with every write to the database, you must also be careful with index
220-
creation. Do not create indexes that duplicate each other. For example,
221-
an index that queries on ``a`` and then ``b`` can be used for queries of
222-
``a`` then ``b`` as well as for queries of just ``a``. Do not have two
223-
indexes where one will do.
153+
If you run :method:`explain() <cursor.explain()>` without including
154+
:method:`hint() <cursor.hint()>`, the query optimizer will
155+
re-evaluate the query and run against multiple indexes before
156+
returning the query statistics. Unless you want the optimizer to
157+
re-evaluate the query, do not leave off :method:`hint()
158+
<cursor.hint()>`.
224159

225-
You can also speed read operations by eliminating unnecessary indexes.
160+
Because your collections will likely change over time, the query
161+
optimizer deletes a query plan and re-evaluates the indexes when any
162+
of the following occur:
226163

227-
Whenever you add a document to a collection, each index on that
228-
collection must be modified to include the new document. So if a
229-
particular collection has 10 indexes, then that makes 10 separate
230-
structures to modify on each insert. This holds for any write operation,
231-
whether you’re removing a document or updating a given document’s
232-
indexed keys.
164+
- The number of writes to the collection reaches 1,000.
233165

234-
For read-intensive applications, the cost of indexes is almost always
235-
justified. Just realize that indexes do impose a cost and that they
236-
therefore must be chosen with care. This means ensuring that all of your
237-
indexes are used and that none of them are redundant. You can do this in
238-
part by profiling your application’s queries.
166+
- You run the :dbcommand:`reIndex` command on the index.
239167

240-
Reading from RAM is faster than reading from disk, so you must make sure
241-
your indexes and working sets together fit into RAM. To check the size
242-
of an index use the :method:`db.collection.totalIndexSize()` helper.
168+
- You restart :program:`mongod`.
243169

244-
MongoDB represents indexes internally as B-trees.
170+
When you re-evaluate a query, the optimizer will display the same
171+
results (assuming no data has changed) but might display the results in
172+
a different order, and the :method:`explain() <cursor.explain()>` method
173+
and :method:`hint() <cursor.hint()>` methods might result in different
174+
statistics. This is because the optimizer retrieves the results from
175+
several indexes at once during re-evaluation and the order in which
176+
results appear depends on the order of the indexes within the parallel
177+
querying.
245178

246-
Use the different index types to keep your indexes to only the size
247-
needed. For example, for queries that always return a document only if a
248-
value exists for the search keys, use sparse indexes. Sparse indexes
249-
take up less space than default indexes.
179+
.. _read-operations-projection:
250180

251-
.. seealso::
181+
Projection
182+
~~~~~~~~~~
252183

253-
- The :doc:`/core/indexes` documentation, in particular :doc:`/applications/indexes`
254-
- :doc:`/reference/operators`
255-
- :method:`find <db.collection.find()>`
256-
- :method:`findOne`
184+
A projection specifies which field values from an array a query should
185+
return for matching documents. If you run a query *without* a
186+
projection, the query returns all fields and values for matching
187+
documents, which can add unnecessary network and deserialization costs.
188+
189+
To run the most efficient queries, use the following projection
190+
operators when possible when querying on array values. For documentation
191+
on each operator, click the operator name:
192+
193+
- :projection:`$elemMatch`
194+
195+
- :projection:`$slice`
257196

258197
.. _read-operations-aggregation:
259198

@@ -271,6 +210,23 @@ Aggregation
271210
.. index:: read operation; architecture
272211
.. _read-operations-architecture:
273212

213+
Query Operators that Cannot Use Indexes
214+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
215+
216+
Some query operators cannot take advantage of indexes and require a
217+
collection scan. When using these operators you can narrow the documents
218+
scanned by combining the operator with another operator that does use an
219+
index.
220+
221+
Operators that cannot use indexes include the following:
222+
223+
- :operator:`$nin`
224+
225+
- :operator:`$ne`
226+
227+
.. TODO Regular expressions queries also do not use an index.
228+
.. TODO :method:`cursor.skip()` can cause paginating large numbers of docs
229+
274230
Architecture
275231
------------
276232

source/reference/glossary.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -855,3 +855,10 @@ Glossary
855855
standalone
856856
In MongoDB, a standalone is an instance of :program:`mongod` that
857857
is running as a single server and not as part of a :term:`replica set`.
858+
859+
query optimizer
860+
For each query, the MongoDB query optimizer generates a query plan
861+
that matches the query to the index that produces the fastest
862+
results. The optimizer then uses the query plan each time the
863+
query is run. If a collection changes significantly, the optimizer
864+
creates a new query plan.

0 commit comments

Comments
 (0)