Skip to content
RastogiAbhijeet edited this page Jun 6, 2018 · 2 revisions

Health US News

Using this repository, a user can index the list of Doctor to different Cities in USA For, this particular assignment the City of Focus is

NEW JERSEY

About

The extracted used in this application is inspired from the website https://health.usnews.com . The data is stored in an Elastic Search Index. The aim is to represent the data so obtained in form of report with the following requirements

  1. Total number of doctors by city
  2. Total number of doctors by specialty (element g of the scrapped elements)
  3. Total number of doctors based on their experience range (experience range : 0 – 4 years,5 – 10 years, 11 – 16 years, 17 – 20 years and 20 years above)
  4. Total number of doctors by zipcode (The last five digit of the address; numeric field)i. i.e., 222 New Rd, Linwood, NJ 08221 <- zipcode

Output of the Report so generated https://github.com/RastogiAbhijeet/python_elasticsearch_businesscase/blob/master/summ2.png?raw=true


Requirements

The following code is written for Python version - 2.7*, and Elastic Search Version 6.2.4 is used to store the indexed data. Kibana Tool Version - 6.2.4 is used for management and Visualisation purposes.

Before Installing Elastic Search it is important to install Java on your system

Run the following Command | sudo apt-get install openjdk-8-jdk

Installing Elastic Search 6.2.4 on Ubuntu 16.04

  1. First, update your package index.

sudo apt-get update

  1. Download the latest Elasticsearch version, which is 2.3.1 at the time of writing.

wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.3.1/elasticsearch-2.3.1.deb

  1. Then install it in the usual Ubuntu way with dpkg.

_sudo dpkg -i elasticsearch-2.3.1.deb_ This results in Elasticsearch being installed in /usr/share/elasticsearch/ with its configuration files placed in /etc/elasticsearch and its init script added in /etc/init.d/elasticsearch.

To make sure Elasticsearch starts and stops automatically with the server, add its init script to the default runlevels.

  1. sudo systemctl enable elasticsearch.service

To test whether the Service Runs or not, run the following command. By default the elastic search service will run on localhost:9200

curl -X GET "localhost:9200" This will return the following output


{ "name" : "My First Cluster", "cluster_name" : "MyCluster", "cluster_uuid" : "CN-Gtg7rRvai3VAx8TC1dw", "version" : { "number" : "6.2.4", "build_hash" : "ccec39f", "build_date" : "2018-04-12T20:37:28.497551Z", "build_snapshot" : false, "lucene_version" : "7.2.1", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }


If you dont see the following output and get a port 9200 error

open the /etc/elasticsearch/elasticsearch.yml and change the path_log and path_data to a valid directory and make sure you give valid permission to the directories

Install kibana 6.2.4 on Ubuntu 16.04:

Follow the guide : https://www.elastic.co/guide/en/kibana/current/setup.html

Clone this wiki locally