Skip to content

Commit a757b9c

Browse files
author
Andrew Brookins
committed
Add more indexing, querying, and aggregations examples.
1 parent a95665d commit a757b9c

File tree

2 files changed

+335
-91
lines changed

2 files changed

+335
-91
lines changed

README.md

+334-24
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ It is the "official" client of RediSearch, and should be regarded as its canonic
1616

1717
## Features
1818

19-
RediSearch is a source avaliable ([RSAL](https://raw.githubusercontent.com/RediSearch/RediSearch/master/LICENSE)), high performance search engine implemented as a [Redis Module](https://redis.io/topics/modules-intro).
19+
RediSearch is a source avaliable ([RSAL](https://raw.githubusercontent.com/RediSearch/RediSearch/master/LICENSE)), high performance search engine implemented as a [Redis Module](https://redis.io/topics/modules-intro).
2020
It uses custom data types to allow fast, stable and feature rich full-text search inside Redis.
2121

22-
This client is a wrapper around the RediSearch API protocol, that allows you to utilize its features easily.
22+
This client is a wrapper around the RediSearch API protocol, that allows you to utilize its features easily.
2323

2424
### RediSearch's features include:
2525

@@ -35,44 +35,354 @@ This client is a wrapper around the RediSearch API protocol, that allows you to
3535

3636
For more details, visit [http://redisearch.io](http://redisearch.io)
3737

38-
## Example: Using the Python Client
38+
## Examples
39+
40+
### Creating a client instance
41+
42+
When you create a redisearch-py client instance, the only required argument
43+
is the name of the index.
44+
45+
```py
46+
from redisearch import Client
47+
48+
client = Client("my-index")
49+
```
50+
51+
To connect with a username and/or password, pass those options to the client
52+
initializer.
53+
54+
```py
55+
client = Client("my-index", username="user", password="my-password")
56+
```
57+
58+
### Using core Redis commands
59+
60+
Every instance of `Client` contains an instance of the redis-py `Client` as
61+
well. Use this object to run core Redis commands.
62+
63+
```py
64+
import datetime
65+
66+
from redisearch import Client
67+
68+
START_TIME = datetime.datetime.now()
69+
70+
client = Client("my-index")
71+
72+
client.redis.set("start-time", START_TIME)
73+
```
74+
75+
### Checking if a RediSearch index exists
76+
77+
To check if a RediSearch index exists, use the `FT.INFO` command and catch
78+
the `ResponseError` raised if the index does not exist.
79+
80+
```py
81+
from redis import ResponseError
82+
from redisearch import Client
83+
84+
client = Client("my-index")
85+
86+
try:
87+
client.info()
88+
except ResponseError
89+
# Index does not exist. We need to create it!
90+
```
91+
92+
### Defining a search index
93+
94+
Use an instance of `IndexDefinition` to define a search index. You only need
95+
to do this when you create an index.
96+
97+
RediSearch indexes follow Hashes in your Redis databases by watching *key
98+
prefixes*. If a Hash whose key starts with one of the search index's
99+
configured key prefixes is added, updated, or deleted from Redis, RediSearch
100+
will make those changes in the index. You configure a search index's key
101+
prefixes using the `prefix` parameter of the `IndexDefinition` initializer.
102+
103+
**NOTE**: Once you create an index, RediSearch will continuously index these
104+
keys when their Hashes change.
105+
106+
`IndexDefinition` also takes a *schema*. The schema specifies which fields to
107+
index from within the Hashes that the index follows. The field types are:
108+
109+
* TextField
110+
* TagField
111+
* NumericalField
112+
* GeoField
113+
114+
For more information on what these field types mean, consult the [RediSearch
115+
documentation](https://oss.redislabs.com/redisearch/Commands/#ftcreate) on
116+
the `FT.CREATE` command.
117+
118+
With redisearch-py, the schema is an iterable of `Field` instances. Once you
119+
have an `IndexDefinition` instance, you can create the instance by passing a
120+
schema iterable to the `create_index()` method.
39121

40122
```py
41-
from redisearch import Client, TextField, IndexDefinition, Query
123+
from redisearch import Client, IndexDefinition
42124

43-
# Creating a client with a given index name
44-
client = Client("myIndex")
125+
SCHEMA = (
126+
TextField("title", weight=5.0),
127+
TextField("body")
128+
)
129+
130+
client = Client("my-index")
45131

46-
# IndexDefinition is available for RediSearch 2.0+
47-
definition = IndexDefinition(prefix=['doc:', 'article:'])
132+
definition = IndexDefinition(prefix=['blog:'])
133+
134+
try:
135+
client.info()
136+
except ResponseError
137+
# Index does not exist. We need to create it!
138+
client.create_index(SCHEMA, definition=definition)
139+
```
48140

49-
# Creating the index definition and schema
50-
client.create_index((TextField("title", weight=5.0), TextField("body")), definition=definition)
141+
### Indexing a document
51142

52-
# Indexing a document for RediSearch 2.0+
53-
client.redis.hset('doc:1',
54-
mapping={
55-
'title': 'RediSearch',
56-
'body': 'Redisearch impements a search engine on top of redis'
57-
})
143+
A RediSearch 2.0 index continually follows Hashes with the key prefixes you
144+
defined, so if you want to add a document to the index, you only need to
145+
create a Hash with one of those prefixes.
58146

147+
```py
148+
# Indexing a document with RediSearch 2.0.
149+
doc = {
150+
'title': 'RediSearch',
151+
'body': 'Redisearch adds querying, indexing, and full-text search to Redis'
152+
}
153+
client.redis.hset('doc:1', mapping=doc)
154+
```
155+
156+
Past versions of RediSearch required that you call the `add_document()`
157+
method. This method is deprecated, but we include its usage here for
158+
reference.
159+
160+
```py
59161
# Indexing a document for RediSearch 1.x
60162
client.add_document(
61163
"doc:2",
62164
title="RediSearch",
63165
body="Redisearch implements a search engine on top of redis",
64166
)
167+
```
168+
169+
### Querying
65170

66-
# Simple search
67-
res = client.search("search engine")
171+
#### Basic queries
68172

69-
# the result has the total number of results, and a list of documents
70-
print(res.total) # "2"
71-
print(res.docs[0].title) # "RediSearch"
173+
Use the `search()` method to perform basic full-text and field-specific
174+
searches. This method doesn't take many of the options available to the
175+
RediSearch `FT.SEARCH` command -- read the section on building complex
176+
queries later in this document for information on how to use those.
72177

73-
# Searching with complex parameters:
74-
q = Query("search engine").verbatim().no_content().with_scores().paging(0, 5)
178+
```py
179+
res = client.search("evil wizards")
180+
```
181+
#### Result objects
182+
183+
Results are wrapped in a `Result` object that includes the number of results
184+
and a list of matching documents.
185+
186+
```py
187+
>>> print(res.total)
188+
2
189+
>>> print(res.docs[0].title)
190+
"Wizard Story 2: Evil Wizards Strike Back"
191+
```
192+
193+
#### Building complex queries
194+
195+
You can use the `Query` object to build complex queries:
196+
197+
```py
198+
q = Query("evil wizards").verbatim().no_content().with_scores().paging(0, 5)
75199
res = client.search(q)
200+
```
201+
202+
For an explanation of these options, see the [RediSearch
203+
documentation](https://oss.redislabs.com/redisearch/Commands/#ftsearch) for
204+
the `FT.SEARCH` command.
205+
206+
#### Query syntax
207+
208+
The default behavior of queries is to run a full-text search across all
209+
`TEXT` fields in the index for the intersection of all terms in the query.
210+
211+
So the example given in the "Basic queries" section of this README,
212+
`client.search("evil wizards")`, run a full-text search for the intersection
213+
of "evil" and "wizard" in all `TEXT` fields.
214+
215+
Many more types of queries are possible, however! The string you pass into
216+
the `search()` method or `Query()` initializer has the full range of query
217+
syntax available in RediSearch.
218+
219+
For example, a full-text search against a specific `TEXT` field in the index
220+
looks like this:
221+
222+
```py
223+
# Full-text search
224+
res = client.search("@title:evil wizards")
225+
```
226+
227+
Finding books published in 2020 or 2021 looks like this:
228+
229+
```python
230+
client.search("@published_year:[2020 2021]")
231+
```
232+
233+
To learn more, see the [RediSearch
234+
documentation](https://oss.redislabs.com/redisearch/Query_Syntax/) on query
235+
syntax.
236+
237+
### Aggregations
238+
239+
This library contains a programmatic interface to run [aggregation
240+
queries](https://oss.redislabs.com/redisearch/Aggregations/) with RediSearch.
241+
242+
#### Making an aggregation query
243+
244+
To make an aggregation query, pass an instance of the `AggregateRequest`
245+
class to the `search()` method of an instance of `Client`.
246+
247+
For example, here is what finding the most books published in a single year
248+
looks like:
249+
250+
```py
251+
from redisearch import Client
252+
from redisearch import reducers
253+
from redisearch.aggregation import AggregateRequest
254+
255+
client = Client('books-idx')
256+
257+
request = AggregateRequest('*').group_by(
258+
'@published_year', reducers.count().alias("num_published")
259+
).group_by(
260+
[], reducers.max("@num_published").alias("max_books_published_per_year")
261+
)
262+
263+
result = client.aggregate(request)
264+
```
265+
266+
#### A redis-cli equivalent query
267+
268+
The aggregation query just given is equivalent to the following
269+
`FT.AGGREGATE` command entered directly into the redis-cli:
270+
271+
```sql
272+
FT.AGGREGATE books-idx *
273+
GROUPBY 1 @published_year
274+
REDUCE COUNT 0 AS num_published
275+
GROUPBY 0
276+
REDUCE MAX 1 @num_published AS max_books_published_per_year
277+
```
278+
279+
#### The AggregateResult object
280+
281+
Aggregation queries return an `AggregateResult` object that contains the rows
282+
returned for the query and a cursor if you're using the [cursor
283+
API](https://oss.redislabs.com/redisearch/Aggregations/#cursor_api).
284+
285+
```py
286+
from redisearch.aggregation import AggregateRequest, Asc
287+
288+
request = AggregateRequest('*').group_by(
289+
['@published_year'], reducers.avg('average_rating').alias('average_rating_for_year')
290+
).sort_by(
291+
Asc('@average_rating_for_year')
292+
).limit(
293+
0, 10
294+
).filter('@published_year > 0')
295+
296+
...
297+
298+
299+
In [53]: resp = c.aggregate(request)
300+
In [54]: resp.rows
301+
Out[54]:
302+
[['published_year', '1914', 'average_rating_for_year', '0'],
303+
['published_year', '2009', 'average_rating_for_year', '1.39166666667'],
304+
['published_year', '2011', 'average_rating_for_year', '2.046'],
305+
['published_year', '2010', 'average_rating_for_year', '3.125'],
306+
['published_year', '2012', 'average_rating_for_year', '3.41'],
307+
['published_year', '1967', 'average_rating_for_year', '3.603'],
308+
['published_year', '1970', 'average_rating_for_year', '3.71875'],
309+
['published_year', '1966', 'average_rating_for_year', '3.72666666667'],
310+
['published_year', '1927', 'average_rating_for_year', '3.77']]
311+
```
312+
313+
#### Reducer functions
314+
315+
Notice from the example that we used an object from the `reducers` module.
316+
See the [RediSearch documentation](https://oss.redislabs.com/redisearch/Aggregations/#groupby_reducers)
317+
for more examples of reducer functions you can use when grouping results.
318+
319+
Reducer functions include an `alias()` method that gives the result of the
320+
reducer a specific name. If you don't supply a name, RediSearch will generate
321+
one.
322+
323+
#### Grouping by zero, one, or multiple fields
324+
325+
The `group_by` statement can take a single field name as a string, or multiple
326+
field names as a list of strings.
327+
328+
```py
329+
AggregateRequest('*').group_by('@published_year', reducers.count())
330+
331+
AggregateRequest('*').group_by(
332+
['@published_year', '@average_rating'],
333+
reducers.count())
334+
```
335+
336+
To run a reducer function on every result from an aggregation query, pass an
337+
empty list to `group_by()`, which is equivalent to passing the option
338+
`GROUPBY 0` when writing an aggregation in the redis-cli.
339+
340+
```py
341+
AggregateRequest('*').group_by([], reducers.max("@num_published"))
342+
```
343+
344+
**NOTE**: Aggregation queries require at least one `group_by()` method call.
345+
346+
#### Sorting and limiting
347+
348+
Using an `AggregateRequest` instance, you can sort with the `sort_by()` method
349+
and limit with the `limit()` method.
350+
351+
For example, finding the average rating of books published each year, sorting
352+
by the average rating for the year, and returning only the first ten results:
353+
354+
```py
355+
from redisearch import Client
356+
from redisearch.aggregation import AggregateRequest, Asc
357+
358+
c = Client()
359+
360+
request = AggregateRequest('*').group_by(
361+
['@published_year'], reducers.avg('average_rating').alias('average_rating_for_year')
362+
).sort_by(
363+
Asc('@average_rating_for_year')
364+
).limit(0, 10)
365+
366+
c.aggregate(request)
367+
```
368+
369+
**NOTE**: The first option to `limit()` is a zero-based offset, and the second
370+
option is the number of results to return.
371+
372+
#### Filtering
373+
374+
Use filtering to reject results of an aggregation query after your reducer
375+
functions run. For example, calculating the average rating of books published
376+
each year and only returning years with an average rating higher than 3:
377+
378+
```py
379+
from redisearch.aggregation import AggregateRequest, Asc
380+
381+
req = AggregateRequest('*').group_by(
382+
['@published_year'], reducers.avg('average_rating').alias('average_rating_for_year')
383+
).sort_by(
384+
Asc('@average_rating_for_year')
385+
).filter('@average_rating_for_year > 3')
76386
```
77387

78388
## Installing
@@ -115,6 +425,6 @@ Finally, invoke the virtual environment and run the tests:
115425

116426
```
117427
. ./venv3/bin/activate
118-
REDIS_PORT=6379 python test/test.py
428+
REDIS_PORT=6379 python test/test.py
119429
REDIS_PORT=6379 python test/test_builder.py
120430
```

0 commit comments

Comments
 (0)