-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial service commit #1
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,3 +127,7 @@ dmypy.json | |
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# Pycharm | ||
.idea/ | ||
.DS_Store |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,38 @@ | ||
# openfoodfacts-search | ||
Open Food Facts Search API V3 using ElasticSearch - https://wiki.openfoodfacts.org/Search_API_V3 | ||
|
||
This API is currently in development. It is not serving any production traffic. The [Work Plan](https://wiki.openfoodfacts.org/Search_API_V3#Work_Plan) will be updated as development continues | ||
|
||
### Organization | ||
The main file is `api.py`, and the Product schema is in `models/product.py`. | ||
|
||
The `scripts/` directory contains various scripts for manual validation, constructing the product schema, importing, etc. | ||
|
||
### Running locally | ||
Docker spins up: | ||
- Two elasticsearch nodes | ||
- [Elasticvue](https://elasticvue.com/) | ||
|
||
You will then need to import from CSV (see instructions below). | ||
|
||
Make sure your environment is configured: | ||
```commandline | ||
export ELASTIC_PASSWORD=PASSWORD_HERE | ||
``` | ||
|
||
|
||
### Helpful commands: | ||
|
||
To start docker: | ||
```console | ||
docker-compose up -d | ||
``` | ||
|
||
To start server: | ||
```console | ||
uvicorn api:app --reload | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I'm a bit lost as I know little docker and python unfortunately. Where do I need to run uvicorn? In one of the docker containers? (elasticvue?) Or in a Python virtual env outside of docker, where I should install everything listed in requirements.txt? If you can put a bit more details in the Helpful commands for newbies like me, it would be very helpful indeed! :) Our python and docker expert is @alexgarel , but he's on vacation until August 16th. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Absolutely! So, I've now structured this so that the search service runs in docker. I've also added commands for both using docker and using locally. Note, as part of this I moved everything inside an |
||
``` | ||
|
||
To import data from the [CSV export](https://world.openfoodfacts.org/data): | ||
```console | ||
python scripts/perform_import.py --filename=/path/to/file.csv |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
from elasticsearch_dsl import Q | ||
from fastapi import FastAPI, HTTPException | ||
|
||
from models.product import Product | ||
from models.request import AutocompleteRequest, SearchRequest | ||
from utils import connection, constants, response | ||
|
||
app = FastAPI() | ||
connection.get_connection() | ||
|
||
|
||
# TODO: Remove this commented out code, so that it's not confusing about where the current GET API is served | ||
# (retaining temporarily as a proof of concept) | ||
# @app.get("/{barcode}") | ||
# def get_product(barcode: str): | ||
# results = Product.search().query("match", code=barcode).execute() | ||
# results_dict = [r.to_dict() for r in results] | ||
# | ||
# if not results_dict: | ||
# raise HTTPException(status_code=404, detail="Barcode not found") | ||
# | ||
# product = results_dict[0] | ||
# return product | ||
|
||
@app.post("/autocomplete") | ||
def autocomplete(request: AutocompleteRequest): | ||
# TODO: This function needs unit testing | ||
if not request.search_fields: | ||
request.search_fields = constants.AUTOCOMPLETE_FIELDS | ||
for field in request.search_fields: | ||
if field not in constants.AUTOCOMPLETE_FIELDS: | ||
raise HTTPException(status_code=400, detail="Invalid field: {}".format(field)) | ||
|
||
match_queries = [] | ||
for field in request.search_fields: | ||
match_queries.append(Q('match', **{field: request.text})) | ||
|
||
results = Product.search().query('bool', should=match_queries).extra(size=request.get_num_results()).execute() | ||
resp = response.create_response(results, request) | ||
return resp | ||
|
||
|
||
@app.post("/search") | ||
def search(request: SearchRequest): | ||
raise NotImplementedError() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
version: "2.2" | ||
|
||
services: | ||
setup: | ||
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION} | ||
volumes: | ||
- certs:/usr/share/elasticsearch/config/certs | ||
user: "0" | ||
command: > | ||
bash -c ' | ||
if [ x${ELASTIC_PASSWORD} == x ]; then | ||
echo "Set the ELASTIC_PASSWORD environment variable in the .env file"; | ||
exit 1; | ||
fi; | ||
if [ ! -f config/certs/ca.zip ]; then | ||
echo "Creating CA"; | ||
bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip; | ||
unzip config/certs/ca.zip -d config/certs; | ||
fi; | ||
if [ ! -f config/certs/certs.zip ]; then | ||
echo "Creating certs"; | ||
echo -ne \ | ||
"instances:\n"\ | ||
" - name: es01\n"\ | ||
" dns:\n"\ | ||
" - es01\n"\ | ||
" - localhost\n"\ | ||
" ip:\n"\ | ||
" - 127.0.0.1\n"\ | ||
" - name: es02\n"\ | ||
" dns:\n"\ | ||
" - es02\n"\ | ||
" - localhost\n"\ | ||
" ip:\n"\ | ||
" - 127.0.0.1\n"\ | ||
> config/certs/instances.yml; | ||
bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key; | ||
unzip config/certs/certs.zip -d config/certs; | ||
fi; | ||
echo "Setting file permissions" | ||
chown -R root:root config/certs; | ||
find . -type d -exec chmod 750 \{\} \;; | ||
find . -type f -exec chmod 640 \{\} \;; | ||
echo "Waiting for Elasticsearch availability"; | ||
until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done; | ||
echo "All done!"; | ||
' | ||
healthcheck: | ||
test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"] | ||
interval: 1s | ||
timeout: 5s | ||
retries: 120 | ||
|
||
es01: | ||
depends_on: | ||
setup: | ||
condition: service_healthy | ||
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION} | ||
volumes: | ||
- certs:/usr/share/elasticsearch/config/certs | ||
- esdata01:/usr/share/elasticsearch/data | ||
ports: | ||
- ${ES_PORT}:9200 | ||
environment: | ||
- node.name=es01 | ||
- cluster.name=${CLUSTER_NAME} | ||
- cluster.initial_master_nodes=es01,es02 | ||
- discovery.seed_hosts=es02 | ||
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD} | ||
- bootstrap.memory_lock=true | ||
- xpack.security.enabled=false | ||
- xpack.license.self_generated.type=${LICENSE} | ||
- http.cors.enabled=true | ||
- http.cors.allow-origin=http://localhost:8080,http://127.0.0.1:8080 | ||
- http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization | ||
- http.cors.allow-credentials=true | ||
mem_limit: ${MEM_LIMIT} | ||
ulimits: | ||
memlock: | ||
soft: -1 | ||
hard: -1 | ||
healthcheck: | ||
test: | ||
[ | ||
"CMD-SHELL", | ||
"curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'", | ||
] | ||
interval: 10s | ||
timeout: 10s | ||
retries: 120 | ||
|
||
es02: | ||
depends_on: | ||
- es01 | ||
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION} | ||
volumes: | ||
- certs:/usr/share/elasticsearch/config/certs | ||
- esdata02:/usr/share/elasticsearch/data | ||
environment: | ||
- node.name=es02 | ||
- cluster.name=${CLUSTER_NAME} | ||
- cluster.initial_master_nodes=es01,es02 | ||
- discovery.seed_hosts=es01 | ||
- bootstrap.memory_lock=true | ||
- xpack.security.enabled=false | ||
- xpack.license.self_generated.type=${LICENSE} | ||
- http.cors.enabled=true | ||
- http.cors.allow-origin=http://localhost:8080,http://127.0.0.1:8080 | ||
- http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization | ||
- http.cors.allow-credentials=true | ||
mem_limit: ${MEM_LIMIT} | ||
ulimits: | ||
memlock: | ||
soft: -1 | ||
hard: -1 | ||
healthcheck: | ||
test: | ||
[ | ||
"CMD-SHELL", | ||
"curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'", | ||
] | ||
interval: 10s | ||
timeout: 10s | ||
retries: 120 | ||
|
||
|
||
# elasticsearch browser | ||
elasticvue: | ||
image: cars10/elasticvue | ||
container_name: elasticvue | ||
ports: | ||
- '8080:8080' | ||
links: | ||
- es01 | ||
|
||
volumes: | ||
certs: | ||
driver: local | ||
esdata01: | ||
driver: local | ||
esdata02: | ||
driver: local | ||
esdata03: | ||
driver: local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also suggest sensible defaults for the other variables? Or provide an env file maybe?
e.g. for MEM_LIMIT, STACK_VERSION, CLUSTER_NAME, ES_PORT etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2064 export STACK_VERSION=8.3.3
2065 export CLUSTER_NAME=elasticcluster
2066 export MEM_LIMIT=1g
2069 export ES_PORT=7777
seem to be enough to launch the docker containers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely - there was an .env file but it wasn't committed due to the .gitignore. I'll add this now.
Note, I found that 2GB of memory lead to better performance, so I'll put that in. Still works well with 1GB if we're memory constrained.