@@ -4,10 +4,10 @@ Read Operations
4
4
5
5
.. default-domain:: mongodb
6
6
7
- This document how MongoDB performs read operations .
7
+ Read operations determine how MongoDB returns collection data when you issue a query .
8
8
9
- MongodDB uses read operations when you retrieve collection data by using
10
- a query .
9
+ This document describes how MongoDB performs read operations and how
10
+ different factors affect the efficiency of reads .
11
11
12
12
.. TODO intro and high-level read operations info
13
13
@@ -20,6 +20,10 @@ a query.
20
20
Query Operations
21
21
----------------
22
22
23
+ Queries retrieve data from your database collections. How a query
24
+ retrieves data is dependent on MongoDB read operations and on the
25
+ indexes you have created.
26
+
23
27
.. _read-operations-query-syntax:
24
28
25
29
Query Syntax
@@ -29,111 +33,21 @@ For a list of query operators, see :doc:`/reference/operators`.
29
33
30
34
.. TODO see the yet-to-be created query operations doc
31
35
32
- .. _read-operations-query-optimization:
33
-
34
- Query Optimization
35
- ~~~~~~~~~~~~~~~~~~
36
-
37
- The MongoDB query optimizer matches a query to the best index for
38
- performing that query. When the optimizer finds the best index, it
39
- creates a query plan so that the query will always use the specified
40
- index.
41
-
42
- The MongoDB query optimizer deletes a query plan when a collection has
43
- changed to a point that the the specified index might no longer provide
44
- the fastest results.
45
-
46
- Query plans take advantage of MongoDB's indexing features. You should
47
- always write indexes that use the same fields and sort in the same order
48
- as do your queries.
49
-
50
- MongoDB creates a query plan as follows: When you run a query for which
51
- there is no query plan, either because the query is new or the old plan
52
- is obsolete, the query optimizer runs the query against several indexes
53
- at once in parallel. Though the optimizer queries the indexes in
54
- parallel, it records the results as though all coming from one index.
55
- The optimizer records all matches in a single common buffer.
56
-
57
- As each index yields a match, MongoDB records the match in the buffer.
58
- If an index returns a result already returned by another index, the
59
- optimizer recognizes the duplication and skips recording the match
60
- a second time.
61
-
62
- The optimizer determines a "winning" index and stops querying when either of
63
- the following occur:
64
-
65
- - The optimizer exhausts an index, which means that index has provided
66
- the full result set the fastest.
67
-
68
- - The optimizer reaches 101 results. At that point, the optimizer
69
- chooses the plan that has provided the most results *first* and
70
- continues reading only from that plan. Note that another index might
71
- have provided all those results as duplicates but because the "winning"
72
- index provided the results faster, it is the most efficient index.
73
-
74
- The "winning" index now becomes the index specified in the query plan as
75
- the one to use the next time that query is run.
76
-
77
- To evaluate the optimizer's choice of query plan, run the query again
78
- with the :method:`explain() <cursor.explain()>` method and
79
- :method:`hint() <cursor.hint()>` methods appended. This returns
80
- statistics about how the query runs. (It returns the statistics in place
81
- of returning the query results.)
82
-
83
- .. code-block:: javascript
84
-
85
- db.people.find( { name:"John"} ).explain().hint()
86
-
87
- .. For details on the output of the :method:`explain()
88
- <cursor.explain()>` method, see ...
89
-
90
- If you run :method:`explain() <cursor.explain()>` without including
91
- :method:`hint() <cursor.hint()>`, the query optimizer will re-evaluate
92
- the query, running multiple query plans, before it returns the query
93
- statistics. Unless you want the optimizer to re-evaluate the query, do
94
- not leave off :method:`hint() <cursor.hint()>`.
95
-
96
- Because your collections will likely change over time, the query
97
- optimizer uses the query plan only to a certain point.
98
-
99
- .. Order of buffer results is different because coming from different
100
- indexes. Not ordered on one index.
101
-
102
- .. Sorting >> all query plans are ordered vs none vs some.
103
-
104
- .. "Optimal" is determined from a past run of multiple plans. But that
105
- cache gets cleared if there's been multiple writes.
106
-
107
- .. Speculative scan of multiple plans.
108
-
109
- .. Sparce indexes can change a result set.
110
-
111
- .. Interleaving of results sets from multiple indexes ocurrs only when
112
- query plan is being determined. Once query plan is cached, then it's
113
- going to use one index.
114
-
115
- .. What validates a cache: 1000 doc writes (not write operations but
116
- actual doc writes). Also if reindex or restart mongod.
117
-
118
- .. Interweaving/leaving plans is done with cursor.
119
-
120
- .. Dupe on disk lock and not on ID.
121
-
122
- .. First time it runs the query (the first time it picks a query plan),
123
- it runs union of all query plans deemed to be potentially useful to
124
- return results set. Second time run the same query, it runs a single
125
- query plan.
36
+ .. _read-operations-indexing:
126
37
127
- .. Therefore, you can run the same query twice in a row and get the
128
- same results ordered differently.
38
+ Indexes
39
+ ~~~~~~~
129
40
130
- .. And when you run explain, you also get different statistics.
41
+ Indexes significantly reduce the amount of work needed for query read
42
+ operations. Indexes record specified keys and key values and the disk
43
+ locations of the documents containing those values.
131
44
132
- .. END OF MY NOTES ON THE TECH TALK, EXCEPT FOR THE NOTES ON SPECIFIC
133
- OPTIMIZATION OPERATORS, such as $elemMatch
45
+ Indexes are typically stored in RAM *or* located sequentially on disk,
46
+ and indexes are smaller than the documents they catalog. When a query
47
+ can use an index, the read operation is significantly faster than when
48
+ the query must scan all documents in a collection.
134
49
135
- Selective Indexes Return Fastest Results
136
- ````````````````````````````````````````
50
+ MongoDB represents indexes internally as B-trees.
137
51
138
52
The most selective indexes return the fastest results. The most
139
53
selective index possible for a given query is an index for which all the
@@ -169,91 +83,116 @@ documents that match the query criteria also match the entire query.
169
83
Conversely, not all the documents that match the query's ``x`` key
170
84
value also match the entire query.
171
85
172
- .. _read-operations-projection :
86
+ .. seealso: :
173
87
174
- Projection
175
- ~~~~~~~~~~
88
+ - The :doc:`/core/indexes` documentation, in particular :doc:`/applications/indexes`
89
+ - :doc:`/reference/operators`
90
+ - :method:`find <db.collection.find()>`
91
+ - :method:`findOne`
176
92
177
- A projection specifies which field values a query should return for
178
- matching documents. If you run a query *without* a projection, the query
179
- returns all fields and values for matching documents, which can
180
- add unnecessary network and deserialization costs.
93
+ .. _read-operations-query-optimization:
181
94
182
- MongoDB provides special projection operators that let you specify the
183
- fields to return. For documentation on each operator, click the operator name:
95
+ Query Optimization
96
+ ~~~~~~~~~~~~~~~~~~
184
97
185
- - :projection:`$elemMatch`
98
+ MongoDB provides a query optimizer that matches a query to the index
99
+ that performs the fastest read operation for that query.
186
100
187
- - :projection:`$slice`
101
+ When you issue a query for the first time, the query optimizer runs the
102
+ query against several indexes to find the most efficient. The optimizer
103
+ then creates a "query plan" that specifies the index for future runs of
104
+ the query.
188
105
189
- .. _read-operations-indexing:
106
+ The MongoDB query optimizer deletes a query plan when a collection has
107
+ changed to a point that the the specified index might no longer provide
108
+ the fastest results.
190
109
191
- Indexing
192
- ~~~~~~~~
110
+ Query plans take advantage of MongoDB's indexing features. You should
111
+ always write indexes that use the same fields and that sort in the same
112
+ order as do your queries. For more information, see :doc:`/applications/indexes`.
193
113
194
- Indexes significantly reduce the amount of work needed for query read
195
- operations. Indexes record specified keys and key values and the disk
196
- locations of the documents containing those values.
114
+ MongoDB creates a query plan as follows: When you run a query for which
115
+ there is no query plan, either because the query is new or the old plan
116
+ is obsolete, the query optimizer runs the query against several indexes
117
+ at once in parallel but records the results in a single common buffer,
118
+ as though the results all come from the same index. As each index yields
119
+ a match, MongoDB records the match in the buffer. If an index returns a
120
+ result already returned by another index, the optimizer recognizes the
121
+ duplication and skips the duplicate match.
122
+
123
+ The optimizer determines a "winning" index when either of
124
+ the following occur:
125
+
126
+ - The optimizer exhausts an index, which means that the index has
127
+ provided the full result set. At this point, the optimizer stops
128
+ querying.
129
+
130
+ - The optimizer reaches 101 results. At this point, the optimizer
131
+ chooses the plan that has provided the most results *first* and
132
+ continues reading only from that plan. Note that another index might
133
+ have provided all those results as duplicates but because the
134
+ "winning" index provided the full result set first, it is more
135
+ efficient.
197
136
198
- Without indexes, MongoDB must scan all documents to return query
199
- results .
137
+ The "winning" index now becomes the index specified in the query plan as
138
+ the one to use the next time the query is run .
200
139
201
- The order of index keys matters.
140
+ To evaluate the optimizer's choice of query plan, run the query again
141
+ with the :method:`explain() <cursor.explain()>` method and
142
+ :method:`hint() <cursor.hint()>` methods appended. Instead of returning
143
+ query results, this returns statistics about how the query runs. For example:
202
144
203
- In order to fulfill a multi-field query using an index, the query
204
- optimizer first searches the index for the first field in the query.
205
- When the first instance of that entry is found, the query then searches
206
- for the next field within the index entries for the first field.
145
+ .. code-block:: javascript
207
146
208
- If you structure your index such that the first field ...
147
+ db.people.find( { name:"John"} ).explain().hint()
209
148
210
- As a general rule, a query where one term demands an exact match and
211
- another specifies a range requires a com- pound index where the range
212
- key comes second.
149
+ For details on the output, see :method:`explain() <cursor.explain()>`.
213
150
214
- When you create indexes, you must do so with your queries in mind. A
215
- query can use only one index and therefore you must create indexes that
216
- include all the fields in a given query.
151
+ .. note::
217
152
218
- Because indexes take up space and because MongoDB writes to an index
219
- with every write to the database, you must also be careful with index
220
- creation. Do not create indexes that duplicate each other. For example,
221
- an index that queries on ``a`` and then ``b`` can be used for queries of
222
- ``a`` then ``b`` as well as for queries of just ``a``. Do not have two
223
- indexes where one will do .
153
+ If you run :method:`explain() <cursor.explain()>` without including
154
+ :method:`hint() <cursor.hint()>`, the query optimizer will
155
+ re-evaluate the query and run against multiple indexes before
156
+ returning the query statistics. Unless you want the optimizer to
157
+ re-evaluate the query, do not leave off :method:`hint()
158
+ <cursor.hint()>` .
224
159
225
- You can also speed read operations by eliminating unnecessary indexes.
160
+ Because your collections will likely change over time, the query
161
+ optimizer deletes a query plan and re-evaluates the indexes when any
162
+ of the following occur:
226
163
227
- Whenever you add a document to a collection, each index on that
228
- collection must be modified to include the new document. So if a
229
- particular collection has 10 indexes, then that makes 10 separate
230
- structures to modify on each insert. This holds for any write operation,
231
- whether you’re removing a document or updating a given document’s
232
- indexed keys.
164
+ - The number of writes to the collection reaches 1,000.
233
165
234
- For read-intensive applications, the cost of indexes is almost always
235
- justified. Just realize that indexes do impose a cost and that they
236
- therefore must be chosen with care. This means ensuring that all of your
237
- indexes are used and that none of them are redundant. You can do this in
238
- part by profiling your application’s queries.
166
+ - You run the :dbcommand:`reIndex` command on the index.
239
167
240
- Reading from RAM is faster than reading from disk, so you must make sure
241
- your indexes and working sets together fit into RAM. To check the size
242
- of an index use the :method:`db.collection.totalIndexSize()` helper.
168
+ - You restart :program:`mongod`.
243
169
244
- MongoDB represents indexes internally as B-trees.
170
+ When you re-evaluate a query, the optimizer will display the same
171
+ results (assuming no data has changed) but might display the results in
172
+ a different order, and the :method:`explain() <cursor.explain()>` method
173
+ and :method:`hint() <cursor.hint()>` methods might result in different
174
+ statistics. This is because the optimizer retrieves the results from
175
+ several indexes at once during re-evaluation and the order in which
176
+ results appear depends on the order of the indexes within the parallel
177
+ querying.
245
178
246
- Use the different index types to keep your indexes to only the size
247
- needed. For example, for queries that always return a document only if a
248
- value exists for the search keys, use sparse indexes. Sparse indexes
249
- take up less space than default indexes.
179
+ .. _read-operations-projection:
250
180
251
- .. seealso::
181
+ Projection
182
+ ~~~~~~~~~~
252
183
253
- - The :doc:`/core/indexes` documentation, in particular :doc:`/applications/indexes`
254
- - :doc:`/reference/operators`
255
- - :method:`find <db.collection.find()>`
256
- - :method:`findOne`
184
+ A projection specifies which field values from an array a query should
185
+ return for matching documents. If you run a query *without* a
186
+ projection, the query returns all fields and values for matching
187
+ documents, which can add unnecessary network and deserialization costs.
188
+
189
+ To run the most efficient queries, use the following projection
190
+ operators when possible when querying on array values. For documentation
191
+ on each operator, click the operator name:
192
+
193
+ - :projection:`$elemMatch`
194
+
195
+ - :projection:`$slice`
257
196
258
197
.. _read-operations-aggregation:
259
198
@@ -271,6 +210,23 @@ Aggregation
271
210
.. index:: read operation; architecture
272
211
.. _read-operations-architecture:
273
212
213
+ Query Operators that Cannot Use Indexes
214
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
215
+
216
+ Some query operators cannot take advantage of indexes and require a
217
+ collection scan. When using these operators you can narrow the documents
218
+ scanned by combining the operator with another operator that does use an
219
+ index.
220
+
221
+ Operators that cannot use indexes include the following:
222
+
223
+ - :operator:`$nin`
224
+
225
+ - :operator:`$ne`
226
+
227
+ .. TODO Regular expressions queries also do not use an index.
228
+ .. TODO :method:`cursor.skip()` can cause paginating large numbers of docs
229
+
274
230
Architecture
275
231
------------
276
232
0 commit comments