Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Products index #1

Merged
merged 11 commits into from
Oct 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[report]
exclude_lines = pass
38 changes: 38 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: CI

on:
push:
branches:
- '*'
pull_request:
branches:
- '*'

jobs:
build:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.10'

- name: Install project dependencies
run: |
pip install poetry
poetry install
working-directory: ${{ github.workspace }}

- name: Run tests
run: |
poetry run coverage run -m unittest discover ./app/tests/
poetry run coverage report
poetry run coverage xml
working-directory: ${{ github.workspace }}

- name: Upload coverage report
uses: codecov/codecov-action@v2
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,8 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

.vscode
*cache*
.coverage
htmlcov
20 changes: 20 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
FROM python:3.10-slim

WORKDIR /app

RUN pip install poetry

COPY pyproject.toml poetry.lock ./

RUN poetry config virtualenvs.create false && \
poetry install --no-dev

COPY . .

EXPOSE 8000

COPY entrypoint.sh /entrypoint.sh

RUN chmod +x /entrypoint.sh

CMD ["/entrypoint.sh"]
189 changes: 188 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,188 @@
# microservice-ia
[![CI](https://github.com/weni-ai/SentenX/actions/workflows/ci.yaml/badge.svg)](https://github.com/weni-ai/SentenX/actions/workflows/ci.yaml)

# SentenX

microservice that uses a sentence transformer model to index and search records.

## Table of Contents

1. [Requirements](#requirements)
2. [Quickstart](#quickstart)
3. [Usage](#usage)
4. [Test](#test)

## Requirements

* python 3.10
* elasticsearch 8.9.1

## Quickstart
on root directory of this project run the following commands to:

setup sagemaker required keys and elasticsearch url environment variables

```
export AWS_ACCESS_KEY_ID=YOUR_SAGEMAKER_AWS_ACCESS_KEY
export AWS_SECRET_ACCESS_KEY=YOUR_SAGEMAKER_AWS_SECRET_ACCESS_KEY
export ELASTICSEARCH_URL=YOUR_ELASTICSEARCH_URL
```

install poetry
```
pip install poetry
```

create a python 3.10 virtual environment
```
poetry env use 3.10
```

activate the environment
```
poetry shell
```

install dependencies
```
poetry install
```

start the microservice
```
uvicorn app.main:main_app.api --reload
```

### Docker compose

to start sentenx with elasticsearch with docker compose:

setup `AWS_SECRET_ACCESS_KEY` and `AWS_ACCESS_KEY_ID` on `docker-compose.yml`
```
docker compose up -d
```

to stop:
```
docker compose down
```

to start with rebuild after any change on source:
```
docker compose up -d --build
```


## Usage

### To index a product

request:
```bash
curl -X PUT http://localhost:8000/products/index \
-H 'Content-Type: application/json' \
-d '{
"catalog_id": "cat1",
"product": {
"facebook_id": "123456789",
"title": "massa para bolo de baunilha",
"org_id": "1",
"channel_id": "5",
"catalog_id": "cat1",
"product_retailer_id": "pp1"
}
}
'
```
response:
```json
status: 200
{
"catalog_id": "cat1",
"documents": [
"cac65148-8c1d-423c-a022-2a52cdedcd3c"
]
}
```

### To index products in batch

request:
```bash

curl -X PUT http://localhost:8000/products/batch \
-H 'Content-Type: application/json' \
-d '{
"catalog_id": "asdfgh",
"products": [
{
"facebook_id": "1234567891",
"title": "banana prata 1kg",
"org_id": "1",
"channel_id": "5",
"catalog_id": "asdfgh",
"product_retailer_id": "p1"
},
{
"facebook_id": "1234567892",
"title": "doce de banana 250g",
"org_id": "1",
"channel_id": "5",
"catalog_id": "asdfgh",
"product_retailer_id": "p2"
}
]
}'
```

response:
```json
status: 200

{
"catalog_id": "asdfgh",
"documents": [
"f5b8d394-eb62-4c92-9501-51a8ebcf1380",
"bcb551e8-0bd1-4ca7-825b-cf8aa8a3f0e0"
]
}
```

### To search for products

request
```bash
curl http://localhost:8000/products/search \
-H 'Content-Type: application/json' \
-d '{
"search": "massa",
"filter": {
"catalog_id": "cat1"
},
"threshold": 1.6
}
'
```
response:
```json
status: 200
{
"products": [
{
"facebook_id": "1",
"title": "massa para bolo de baunilha",
"org_id": "1",
"channel_id": "5",
"catalog_id": "asdfgh4321",
"product_retailer_id": "abc321"
}
]
}
```

## Test

we use unittest with discover to run the tests that are in `./app/tests`
```
coverage run -m unittest discover -s app/tests
```

Empty file added app/__init__.py
Empty file.
26 changes: 26 additions & 0 deletions app/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import os


class AppConfig:
def __init__(self):
self.product_index_name = os.environ.get(
"INDEX_PRODUCTS_NAME", "catalog_products"
)
self.es_url = os.environ.get("ELASTICSEARCH_URL", "http://localhost:9200")
self.embedding_type = os.environ.get("EMBEDDING_TYPE", "sagemaker")
self.sagemaker = {
"endpoint_name": os.environ.get(
"SAGEMAKER_ENDPOINT_NAME",
"huggingface-pytorch-inference-2023-07-28-21-01-20-147",
),
"region_name": os.environ.get("SAGEMAKER_REGION_NAME", "us-east-1"),
}
self.huggingfacehub = {
"repo_id": os.environ.get(
"HUGGINGFACE_REPO_ID", "sentence-transformers/all-MiniLM-L6-v2"
),
"task": os.environ.get("HUGGINGFACE_TASK", "feature-extraction"),
"huggingfacehub_api_token": os.environ.get(
"HUGGINGFACE_API_TOKEN", "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"
),
}
23 changes: 23 additions & 0 deletions app/handlers/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from abc import ABC, abstractmethod


class IDocumentHandler(ABC):
@abstractmethod
def index(self):
pass

@abstractmethod
def batch_index(self):
pass

@abstractmethod
def search(self):
pass

@abstractmethod
def delete(self):
pass

@abstractmethod
def delete_batch(self):
pass
Loading
Loading