Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation #15

Merged
merged 2 commits into from
May 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 0 additions & 40 deletions deployment/ansible/run_searchengine_index_cache_services.yml

This file was deleted.

24 changes: 24 additions & 0 deletions deployment/ansible/run_searchengine_index_services.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#Issue setup ip address inside the hpa config file for postgres to accept the connection from it
- name: Deploying search engine cache and indexing
connection: local
hosts: local
vars_files:
searchengine_vars.yml
tasks:

- name: Get data from postgres database and insert them to Elasticsearch index using docker searchengine
become: yes
docker_container:
image: "{{ searchengine_docker_image }}"
name: searchengine_index
cleanup: True
auto_remove: yes
command: "get_index_data_from_database"
networks:
- name: searchengine-net
ipv4_address: 10.11.0.11
published_ports:
- "5577:5577"
state: started
volumes:
- "{{ apps_folder }}/searchengine/searchengine/:/etc/searchengine/"
4 changes: 2 additions & 2 deletions deployment/ansible/searchengine_vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ database_user_password: pass1234
cache_rows: 10000
#searchenginecache_folder: /data/searchengine/searchengine/cacheddata/
search_engineelasticsearch_docker_image: docker.elastic.co/elasticsearch/elasticsearch:7.16.2
searchengine_docker_image: openmicroscopy/omero-searchengine:latest
searchengineclient_docker_image: openmicroscopy/omero-searchengineclient:latest
searchengine_docker_image: searchengine
searchengineclient_docker_image: searchengineclient
ansible_python_interpreter: path/to/bin/python
searchengine_cache: searchengine_cache
searchengine_index: searchengine_index
Expand Down
41 changes: 20 additions & 21 deletions docs/configuration/configuration_installtion.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@ The application should have the access attributes (e.g, URL, username, password,
* DATABASE_USER
* DATABASE_PASSWORD
* DATABASE_NAME
* CASH_FOLDER
* ELASTICSEARCH__URL
* PAGE_SIZE
* Although the user can edit this file to set the values, there are some methods inside manage.py which could help to set the configuration e.g.
* set_database_configuration
* set_elasticsearch_configuration

* When the app runs for the first time, it will look for the application configuration file.

Expand All @@ -26,40 +28,38 @@ The application should have the access attributes (e.g, URL, username, password,

There is a need to create the ELasticsearch indices and insert the data to them to be able to use the application.

* There is another method inside manage.py (get_index_data_from_database) to allow indexing automatically from the app.
* There is a method inside manage.py (get_index_data_from_database) to allow indexing automatically from the app.

* Another method to index the data by
* Another method to index the data by:
* The data is extracted from the IDR/Omero database using some SQL queries and saved to csv files ({path/to/project}/omero_search_engine/search_engine/cache_functions/elasticsearch/sql_to_csv.py)
* The image index data is generated in a big file, so it is recommended to split it into several files to facilitate processing the data and inserting it into the index. In Linux os, users can use the split command to divide the file, for example:
* split -l 2600000 images.csv
* create_index: Create the Elasticsearch indices, it can be used to create a single index or all the indices; the default is creating all the indices.
* the indices are saved in this script ({path/to/project}/omero_search_engine/search_engine/cache_functions/elasticsearch/elasticsearch_templates.py)
* add_resource_data_to_es_index: Insert the data to the ELasticsearch index; the data can be in a single file (CSV format) or multiple files.

* It has some utility functions inside the manage.py script to build hd5 cash files.
* These files contain the available key and value pair inside the database.
* The user builds them using a direct connection with the Postgres database server.
* These cashed data is available to the user through URLs as it is described in the user manual.

Application installation using docker:
======================================
Ubuntu and Centos7 images are provided
* The user should pull the image from:
* The user may build the docker image using the following command:

* Ubuntu: [imageurl]
* Centos: [imageurl]
* docker build . -f deployment/docker/centos/Dockerfile -t searchengine

* The user should first pull the image and then run using a command docker run and then the image name.
* The image runs on port 5569 so mapping this port is required to expose the port to the host machine
* Also, folders (i.e. /etc/searchengine) and user home folder ($HOME) should be mapped to folder inside the the host machine.
* Alternatively, the user can pull the openmicroscopy docker image by using the following command::
* docker pull openmicroscopy/omero-searchengine:latest

* The image runs on port 5577 so mapping this port is required to expose the port to the host machine
* Also, folders (i.e. /etc/searchengine) and local data folder (e.g. user home folder) should be mapped to folder inside the the host machine.
* It will be used to save the configuration file so the user can configure his instance
* in addition, it will be used to save the logs files and other cached data.

* Example of running the docker run command for Centos image: which maps the etc/searchengine to the user home folder to save the log files, in addition, to mapping the application configuration file
* docker run --rm -p 5569:5569 v /home/kmohamed001/.app_config.yml:/opt/app-root/src/.app_config.yml -v $HOME/:/etc/searchengine/ searchengine
* docker run --rm -p 5577:5577 -d -v $HOME/:/etc/searchengine/ searchengine
* This is an example of a Docker image command to un indexing and re-indexing:
* docker run -d --name searchengine_2 -v $HOME/:/etc/searchengine/ -v $HOME/:/opt/app-root/src/logs/ --network=searchengine-net searchengine get_index_data_from_database
* The user can call any method inside manage.py by adding the method name by end of the run command. e.g:
* docker run --rm -p 5569:5569 v /home/kmohamed001/.app_config.yml:/opt/app-root/src/.app_config.yml -v $HOME/:/etc/searchengine/ searchengine show_saved_indices

* docker run --rm -p 5577:5577 -v $HOME/:/etc/searchengine/ searchengine show_saved_indices

Searchengine installation and configuration using Ansible:
==========================================================
Expand All @@ -69,16 +69,15 @@ There is an ansible playbook (management-searchengine.yml) that has been written
* It will configure and create the required folders
* It will configure the three apps and run them
* There is a variables file (searchengine_vars.yml) that the user needs to edit before running the playbook
* The variable names are self-explained
* The variable names are self-explained and should be customized to the host machine
* To check that the apps have been installed and run, the user can use wget or curl to call:
* for searchengine, http://127.0.0.1:5556/api/v2/resources/
* for searchengine, http://127.0.0.1:5556/api/v1/resources/
* for searchengine client, http://127.0.0.1:5556
* for Elasticsearch, http://127.0.0.1:9201
* After deploying the apps using the playbook, it is needed to run another playbook for indexing:
* After deploying the apps, it is needed to run another playbook for indexing:
* run_searchengine_index_services.yml
* If the Postgresql database server is located at the same machine which hosts the searchengine, it is needed to:
* Edit pg_hba.conf file (one of the postgresql configuration files) and add two client ips (i.e. 10.11.0.10 and 10.11.0.11)
* Edit pg_hba.conf file (one of the postgresql configuration files) and add client IP (i.e. 10.11.0.11)
* Reload the configuration; so the PostgreSQL accepts the connection from indexing and caching services.
* As the caching and indexing processes take a long time, there are another two playbooks that enable the user to check if they have finished or not:
* check_indexing_service.yml
* check_caching_service.yml
18 changes: 9 additions & 9 deletions readme.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
OMERO Search Engine
--------------------
* OMERO search engine app is used to search metadata (key-value pairs)
* OMERO search engine app is used to search metadata (key-value pairs)

* The search engine query is a dict that has three parts:

Expand All @@ -13,11 +13,12 @@ OMERO Search Engine

* The second part of the query is or_filters; it has alternatives to search the database; it answers a question like finding the images which can satisfy one or more of conditions inside this list. It is a list of dict also and have the same format as the dict inside and_filter

* The third part is the main_attributes, it allows the user to search using one or more of project _id, dataset_id, owner_id, group_id, owner_id, etc. It supports two operators, equals and not_equals. Hence, it is possible to search one project instead of all the projects, also it is possible to search the results which belong to a specific user or a group.
* The third part is the main_attributes, it allows the user to search using one or more of project _id, dataset_id, group_id, owner_id, group_id, owner_id, etc. It supports two operators, equals and not_equals. Hence, it is possible to search one project instead of all the projects, also it is possible to search the results which belong to a specific user or a group.

* The search engine returns the results in a JSON which has the following keys:

* 'notice': A message to report an error or a message to the sender.
* 'notice': report a message to the sender whihc may includes an error message.
" 'Error': specific error message
* 'query_details': The submitted query.
* 'resource': The resource, e.g. image
* 'server_query_time': The server query times in seconds
Expand All @@ -30,11 +31,11 @@ OMERO Search Engine

* It is possible to query the search engine to get all the available resources (e.g. image) and their keys (names) using the following URL:

* 127.0.0.01:5556/api/v1/resources/all/keys
* 127.0.0.01:5577/api/v1/resources/all/keys

* The user can get the available values for a specific key for a recourse, e.g. what are the available values for Organism:

* http://127.0.0.1:5556/api/v1/resources/image/getannotationvalueskey/?key=Organism
* http://127.0.0.1:5577/api/v1/resources/image/getannotationvalueskey/?key=Organism

* The following python script sends a query to the search engine and gets the results

Expand All @@ -49,7 +50,7 @@ OMERO Search Engine
# url to get the next page for a query, bookmark is needed
image_page_ext = "/resources/image/searchannotation_page/"
# search engine url
base_url = "http://idr-testing.openmicroscopy.org/searchengineapi/api/v1/"
base_url = "http://127.0.0.1:5577/api/v1/"

import sys

Expand Down Expand Up @@ -142,7 +143,6 @@ OMERO Search Engine
* It is used to build the query
* It will display the results when they are ready


* The app uses Elasticsearch
* There is a method inside manage.py (create_index) to create a separate index for image, project, dataset, screen, plate and well using two templates:
* image template (image_template) for image index. It is derived from some Omero tables into a single Elasticsearch index (image, annoation_mapvalue, imageannotationlink, project, dataset, well, plate and screen to generate a single index.
Expand All @@ -153,7 +153,7 @@ OMERO Search Engine
* There is a method inside manage.py script (add_resource_data_to_es_index) that reads the CSV files and inserts the data to the Elasticsearch index.
* I am investigating automatic updates of the elastic search data in case of the data inside the PostgreSQL database has been changed.

* The data can be transferred directly from the Omero database to the Elasticsearch using a method inside manage.py (get_index_data_from_database):
* The data can be transferred directly from the OMERO database to the Elasticsearch using a method inside manage.py (get_index_data_from_database):
* It creates the elastic search indices for each resource
* it queries the Omero database, after receiving the data it process and push them to the Elasticsearch indices.
* This process takes a relatively long time, it depends on the hosting machine specs. The user can adjust how many rows can be processed at one call to the Omero database:
Expand All @@ -164,4 +164,4 @@ OMERO Search Engine
* There is a method inside manage.py script (add_resource_data_to_es_index) which reads the CSV files and inserts the data to the Elasticsearch index.
* I am investigating automatic updates of the elastic search data in case of the data inside the PostgreSQL database has been changed.

For the configuration and installation instructions, please read the following document doc/configuration/configuration_installtion.rs
For the configuration and installation instructions, please read the following document doc/configuration/configuration_installtion.rst
3 changes: 1 addition & 2 deletions search_engine/api/v1/resources/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

@resources.route('/',methods=['GET'])
def index():
return "Omero search engine (API V1)"
return "OMERO search engine (API V1)"

@resources.route('/<resource_table>/searchannotation_page/',methods=['POST'])
def search_resource_page(resource_table):
Expand Down Expand Up @@ -185,4 +185,3 @@ def search(resource_table):
from search_engine.api.v1.resources.query_handler import simple_search
results=simple_search(key, value, operator,case_sensitive,bookmark, resource_table)
return jsonify(results)