Skip to content

Latest commit

 

History

History
57 lines (41 loc) · 2.42 KB

README.md

File metadata and controls

57 lines (41 loc) · 2.42 KB

B.Tech (Computer Science & Engineering) Final Year Project - VNIT, Nagpur

Team Members -

  1. Abdul Sattar Mapara
  2. Saket Chopade
  3. Rohan Salvi
  4. Pritam Kumar Sahoo

Guided by -

  1. Dr. U.A. Deshpande Sir (VNIT, Nagpur)
  2. Dr. Sagar Sunkle Sir (TRDDC, Pune)

About the Project

The aim of the project is to gather time-stamped factual information about a given topic/entity from a given set of documents (Brokerage Reports).

More precisely, given a set of documents (brokerage reports in PDF format), about a company or a bank (or any organization) published over a period of 1-2 years, it is expected that factual information about that company, or a bank (or any entity) to be extracted (in the form of semi-structured statements) and classified as an increasing or decreasing trend. The extracted facts are expected to be grouped by date/month.

Summary of Tasks Accomplished

  1. Collecting and Processing the reports

    1. Brokerage Reports collected from - trendlyne.com

    2. PDF -> Text conversion

    3. Text -> Sentence (Sentence Tokenization)

    4. Pass through spaCy pipeline for tokenization (into tokens), lemmatization, Part of Speech Tagging, Dependency Parse tree generation, Named Entity Recognition

  2. Extraction of Date/Timestamp

    1. Using Named Entity Recognition
    2. Using Metadata associated with the reports
  3. Extraction of Facts in the form of Semi Structured Statements

    1. Using Textacy library
    2. Using Dependency Parse tree generated by spaCy (Custom Approach)
    3. Explored relation extraction using Stanford Open IE
  4. Sentiment Analysis (Sentence Classification)

    1. Dictionary based approach
    2. Machine learning based approach (using Support Vector Machines)
    3. Deep learning based approach (using Convolutional Neural Networks)

    Note: Conversion of words to numbers done using custom word2vec model

  5. Application (using Flask framework) for demonstration of the project

About this Repository

This repository contains the source code written during the project for accomplishing the required tasks and experimentation.

This branch (master) contains the source code of the application developed for demonstration.

Demonstration

Video - Download Final-Year-Project-Demo OR View Final-Year-Project-Demo