Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial service commit #1

Merged
merged 2 commits into from
Aug 4, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,7 @@ dmypy.json

# Pyre type checker
.pyre/

# Pycharm
.idea/
.DS_Store
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,38 @@
# openfoodfacts-search
Open Food Facts Search API V3 using ElasticSearch - https://wiki.openfoodfacts.org/Search_API_V3

This API is currently in development. It is not serving any production traffic. The [Work Plan](https://wiki.openfoodfacts.org/Search_API_V3#Work_Plan) will be updated as development continues

### Organization
The main file is `api.py`, and the Product schema is in `models/product.py`.

The `scripts/` directory contains various scripts for manual validation, constructing the product schema, importing, etc.

### Running locally
Docker spins up:
- Two elasticsearch nodes
- [Elasticvue](https://elasticvue.com/)

You will then need to import from CSV (see instructions below).

Make sure your environment is configured:
```commandline
export ELASTIC_PASSWORD=PASSWORD_HERE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also suggest sensible defaults for the other variables? Or provide an env file maybe?
e.g. for MEM_LIMIT, STACK_VERSION, CLUSTER_NAME, ES_PORT etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2064 export STACK_VERSION=8.3.3
2065 export CLUSTER_NAME=elasticcluster
2066 export MEM_LIMIT=1g
2069 export ES_PORT=7777

seem to be enough to launch the docker containers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely - there was an .env file but it wasn't committed due to the .gitignore. I'll add this now.

Note, I found that 2GB of memory lead to better performance, so I'll put that in. Still works well with 1GB if we're memory constrained.

```


### Helpful commands:

To start docker:
```console
docker-compose up -d
```

To start server:
```console
uvicorn api:app --reload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I'm a bit lost as I know little docker and python unfortunately. Where do I need to run uvicorn? In one of the docker containers? (elasticvue?) Or in a Python virtual env outside of docker, where I should install everything listed in requirements.txt?

If you can put a bit more details in the Helpful commands for newbies like me, it would be very helpful indeed! :)

Our python and docker expert is @alexgarel , but he's on vacation until August 16th.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! So, I've now structured this so that the search service runs in docker. I've also added commands for both using docker and using locally.

Note, as part of this I moved everything inside an app directory, so the PR change looks bigger than it is.

```

To import data from the [CSV export](https://world.openfoodfacts.org/data):
```console
python scripts/perform_import.py --filename=/path/to/file.csv
45 changes: 45 additions & 0 deletions api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
from elasticsearch_dsl import Q
from fastapi import FastAPI, HTTPException

from models.product import Product
from models.request import AutocompleteRequest, SearchRequest
from utils import connection, constants, response

app = FastAPI()
connection.get_connection()


# TODO: Remove this commented out code, so that it's not confusing about where the current GET API is served
# (retaining temporarily as a proof of concept)
# @app.get("/{barcode}")
# def get_product(barcode: str):
# results = Product.search().query("match", code=barcode).execute()
# results_dict = [r.to_dict() for r in results]
#
# if not results_dict:
# raise HTTPException(status_code=404, detail="Barcode not found")
#
# product = results_dict[0]
# return product

@app.post("/autocomplete")
def autocomplete(request: AutocompleteRequest):
# TODO: This function needs unit testing
if not request.search_fields:
request.search_fields = constants.AUTOCOMPLETE_FIELDS
for field in request.search_fields:
if field not in constants.AUTOCOMPLETE_FIELDS:
raise HTTPException(status_code=400, detail="Invalid field: {}".format(field))

match_queries = []
for field in request.search_fields:
match_queries.append(Q('match', **{field: request.text}))

results = Product.search().query('bool', should=match_queries).extra(size=request.get_num_results()).execute()
resp = response.create_response(results, request)
return resp


@app.post("/search")
def search(request: SearchRequest):
raise NotImplementedError()
144 changes: 144 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
version: "2.2"

services:
setup:
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
volumes:
- certs:/usr/share/elasticsearch/config/certs
user: "0"
command: >
bash -c '
if [ x${ELASTIC_PASSWORD} == x ]; then
echo "Set the ELASTIC_PASSWORD environment variable in the .env file";
exit 1;
fi;
if [ ! -f config/certs/ca.zip ]; then
echo "Creating CA";
bin/elasticsearch-certutil ca --silent --pem -out config/certs/ca.zip;
unzip config/certs/ca.zip -d config/certs;
fi;
if [ ! -f config/certs/certs.zip ]; then
echo "Creating certs";
echo -ne \
"instances:\n"\
" - name: es01\n"\
" dns:\n"\
" - es01\n"\
" - localhost\n"\
" ip:\n"\
" - 127.0.0.1\n"\
" - name: es02\n"\
" dns:\n"\
" - es02\n"\
" - localhost\n"\
" ip:\n"\
" - 127.0.0.1\n"\
> config/certs/instances.yml;
bin/elasticsearch-certutil cert --silent --pem -out config/certs/certs.zip --in config/certs/instances.yml --ca-cert config/certs/ca/ca.crt --ca-key config/certs/ca/ca.key;
unzip config/certs/certs.zip -d config/certs;
fi;
echo "Setting file permissions"
chown -R root:root config/certs;
find . -type d -exec chmod 750 \{\} \;;
find . -type f -exec chmod 640 \{\} \;;
echo "Waiting for Elasticsearch availability";
until curl -s --cacert config/certs/ca/ca.crt https://es01:9200 | grep -q "missing authentication credentials"; do sleep 30; done;
echo "All done!";
'
healthcheck:
test: ["CMD-SHELL", "[ -f config/certs/es01/es01.crt ]"]
interval: 1s
timeout: 5s
retries: 120

es01:
depends_on:
setup:
condition: service_healthy
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
volumes:
- certs:/usr/share/elasticsearch/config/certs
- esdata01:/usr/share/elasticsearch/data
ports:
- ${ES_PORT}:9200
environment:
- node.name=es01
- cluster.name=${CLUSTER_NAME}
- cluster.initial_master_nodes=es01,es02
- discovery.seed_hosts=es02
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
- bootstrap.memory_lock=true
- xpack.security.enabled=false
- xpack.license.self_generated.type=${LICENSE}
- http.cors.enabled=true
- http.cors.allow-origin=http://localhost:8080,http://127.0.0.1:8080
- http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
- http.cors.allow-credentials=true
mem_limit: ${MEM_LIMIT}
ulimits:
memlock:
soft: -1
hard: -1
healthcheck:
test:
[
"CMD-SHELL",
"curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
]
interval: 10s
timeout: 10s
retries: 120

es02:
depends_on:
- es01
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
volumes:
- certs:/usr/share/elasticsearch/config/certs
- esdata02:/usr/share/elasticsearch/data
environment:
- node.name=es02
- cluster.name=${CLUSTER_NAME}
- cluster.initial_master_nodes=es01,es02
- discovery.seed_hosts=es01
- bootstrap.memory_lock=true
- xpack.security.enabled=false
- xpack.license.self_generated.type=${LICENSE}
- http.cors.enabled=true
- http.cors.allow-origin=http://localhost:8080,http://127.0.0.1:8080
- http.cors.allow-headers=X-Requested-With,X-Auth-Token,Content-Type,Content-Length,Authorization
- http.cors.allow-credentials=true
mem_limit: ${MEM_LIMIT}
ulimits:
memlock:
soft: -1
hard: -1
healthcheck:
test:
[
"CMD-SHELL",
"curl -s --cacert config/certs/ca/ca.crt https://localhost:9200 | grep -q 'missing authentication credentials'",
]
interval: 10s
timeout: 10s
retries: 120


# elasticsearch browser
elasticvue:
image: cars10/elasticvue
container_name: elasticvue
ports:
- '8080:8080'
links:
- es01

volumes:
certs:
driver: local
esdata01:
driver: local
esdata02:
driver: local
esdata03:
driver: local
Empty file added models/__init__.py
Empty file.
Loading