This project consist in 5 folders:
data-embedding
: the folder that automated the process to create and embed the indexemails
to the zincsearch database.web
: the folder that contains the web application.server
: the folder that contains the go server that will handle the requests to the databasezincsearch
and retrieves the results(limited to 200).docker
: the folder that contains the docker-compose file that can be used to run the entire project.terraform
: the folder that contains the terraform code that allows top deploy the project to anaws
EC2 instance.
- Go == 1.22
- Docker
- Docker-compose
- Node >= 20.10.0(recommended)
- Graphviz(if you want to generate the profiling graphs)
- terraform(Optional, more info here)
- Give permissions:
chmod +x envs.sh chmod a+rwx ./data-embedding
- Only follow the instructions of data-embedding
- Then run the following commands:
. envs.sh cd docker docker-compose up
Note
Remember, if you make changes in web or server you should remove the images so docker compose remounts them.
To setup the project first run the following commands:
- Set up env variables(you can modify the file to set the values you want)
chmod +x envs.sh
. envs.sh
- Give write permission
chmod a+rwx ./data-embedding
- Start zincsearch docker image
docker-compose up
Note
All the cmd commands were made from a linux based os.
- Obtain the data downloaded from enron_mail
- Unzip it e.g.
tar -xvzf enron_mail_20110402.tgz
- Enter the folder and move it to the project directory e.g.
cd enron_mail_20110402 mv maildir /path/your_project_path/data-embedding
- Run the script
cd data-embedding go run main.go
Warning
This step consumes a lot of CPU recourses, I recommend to run it with everything else closed.
-
Gen profiling graphs(Optional)
- CPU profiling
cd profs go tool pprof cpu.prof (pprof) pdf
- Memory profiling
cd profs go tool pprof mem.prof (pprof) pdf
- CPU profiling
Right now the server doesn't need any external configuration, just make sure that the
zincsearch server is running in localhost:4080
and that the user credentials are the same
as the ones set in config/credentials.go
-
Change
zincsearchEndpoint
tohttp://localhost:4080/api
in thezincsearch_repo.go
file. -
Start server
cd ../server go run main.go
-
Try it out: You can use any program(Postman, Insomnia, etc..), for simplicity I'm going to use curl.
curl -i -X GET http://localhost:3001/
Result:
HTTP/1.1 200 OK
Vary: Origin
Date: Sat, 09 Mar 2024 02:03:11 GMT
Content-Length: 0
curl -X POST http://localhost:3001/emailSearch -H "Content-Type: application/json" --data '{"term": "manipulated", "max_results": 10, "field": "content", "sort_fields": []}'
Result:
{"time": 712,"emails": [{"id": "26rdgTPY702","from": "linda.robertson@enron.com",...]}
To start the web server run:
cd ../web
npm i
npm run lint
npm run dev
Now just open a web browser at http://localhost:5173/
and use the app.
- Add additional options to the search emails request.
- Add to web project at least one new feature.
- Dockerize sever and web projects.
- Add tests to server project.
- Search how to improve data-embedding to not use all the computer CPU, because if the computer has low spects it probably will crash.
-
Add tests to web project.