Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xabier branch to merge with main #11

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Xabier branch to merge with main #11

wants to merge 19 commits into from

Conversation

XabUG7
Copy link

@XabUG7 XabUG7 commented Oct 23, 2024

No description provided.

takaokakegawa and others added 19 commits February 26, 2024 05:01
put in new files for summary generation to fill in from colab notebooks.
updated summary_generation.py with abstractive summary stage of summary process.
summary generation added to hangul with necessary files:
summary_generation.py and sentence_ranking.py
new disaster detection module added
update agg_summary_input in hangul.py
fixed new_disaster_detection in hangul.py
Only ranked sentences for summary generation
Copy link
Collaborator

@sarahaptumich sarahaptumich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code performed locally as expected, need to remove some large uncommented lines in hangul.py, add modification notes in header

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested using:
Create virtual environment
pip install virtualenv
python -m venv env

ACTIVATE VIRTUAL ENVIROMENT
git checkout xabier_branch
source env/bin/activate
pip install -r requirements.txt

OPEN THE APP
waitress-serve --port=8080 wsgi:app

Open another terminal to use the app
TEST CHETAH
curl -X POST http://127.0.0.1:8080/api/v1/products/chetah
-H "Content-Type: application/json"
-d '{"query": "Are there hurricane in Mexico?"}'
Result:
[{"cluster":"Camp Coordination","date":"2019-11-13","link":"https://reliefweb.int/sites/reliefweb.int/files/resources/DPR-Synthesis-Report-Final_02.08.2019.pdf","summary_full":"Legal and Institutional Frameworks .............................................................................................. 13 2. Disaster risk finance ...................................................................................................................... 13 3. Legal facilities ................................................................................................................................ 16 7. Disaster-Related Human Mobility ................................................................................................. 16 8. Emergency Shelter and Housing, Land and Property Ri ......................................

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PASSED TEST

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API was moved to wsgi.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial file not used in chetah. Chetah_v1.py PASSED TEST

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested Hangul V1
curl -X POST http://127.0.0.1:8080/api/v1/products/hangul
-F "file=@han-21 Report Type Detection/Data/News and Press Release/3266793.pdf"
-F "kw_num=5"

TESTED HANGUL V2 LOCALLY :
Memory usage after Hangul 2.0 second API call:
Memory usage: 3122.53 MB

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

REMOVE LARGE COMMENTED SECTIONS
NEED TO ADD HEADER, EXAMPLE:
"""
Filename: hangul.py
Description: Extracts and analyzes content and metadata from PDF files, detects language, generates keywords, etc.

Author: Sidra Effendi
Created on: ?

Modification History:

  • 2024-10-29: Explain update by Xabier Urruchua
  • 2024-10-30: Explain update by Xabier Urruchua
    """

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code performs during testing, need to add header with modification notes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add header with created date and author

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new title extractions added, and tested

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested in local virtual environment, and works as expected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing script:
Specify python version
python3.10 -m venv /path/to/your/venv
Create virtual environment
pip install virtualenv
python -m venv env

ACTIVATE VIRTUAL ENVIROMENT
git checkout xabier_branch
source env/bin/activate
pip install -r requirements.txt

OPEN THE APP
waitress-serve --port=8080 wsgi:app

Open another terminal to use the app
TEST CHETAH
curl -X POST http://127.0.0.1:8080/api/v1/products/chetah
-H "Content-Type: application/json"
-d '{"query": "Are there hurricane in Mexico?"}'

HANGUL_V1
curl -X POST http://127.0.0.1:8080/api/v1/products/hangul
-F "file=~/3266793.pdf"
-F "kw_num=5"

HANGUL V2
curl -X POST http://127.0.0.1:8080/api/v2/products/hangul -F "file=~/3266793.pdf" -F "kw_num=5"

SUMMARY
curl -X POST http://127.0.0.1:8080/api/v2/products/summary -H "Content-Type: application/json" -d '{
"themes_detected": [
"Food and Nutrition",
"Contributions",
"Health",
"Water Sanitation Hygiene",
"Protection and Human Rights",
"Shelter and Non-Food Items",
"Education"
],
"ranked_sentences": [
""In Iraq, support for Syrian refugees has been included in the wider UK Iraq Crisis response from 2015."",
"Agriculture/Livelihoods and Education include results achieved under the DFID CSSF portfolio in Syria, in addition to the DFID Syria humanitarian portfolio.",
"Syria Lebanon Jordan Turkey Iraq",
"Figures do not include allocations made and spend incurred under the Home Office resettlement scheme for Syrian refugees or UK support to Syrian refugees who have migrated to Europe.",
"Note: UK support for Syrian refugees in Turkey is ongoing.",
"As the brutal conflict continues in Syria, millions of people continue to be in need.",
"This includes DFID allocations to over 30 implementing partners (including United Nations agencies, international non-governmental organisations and the Red Cross) and is helping to meet the immediate needs of vulnerable people in Syria and of refugees in the region.",
"Our support is reaching millions of people and has saved lives in Syria, Jordan, Lebanon, Turkey, Iraq and Egypt.",
"Key Facts: 11.7 million People in need of humanitarian assistance in Syria (Syria 2019 HNO March 2019)",
"The 2018 UN inter-agency appeals for the Syria crisis are an estimated $9 billion, including $3.36 billion for projects inside Syria and $5.6 billion for regional projects."
],
"top_locations": [
{"name": "Syrian Arab Republic", "occurrences": 20},
{"name": "United Kingdom of Great Britain and Northern Ireland", "occurrences": 11},
{"name": "Iraq", "occurrences": 6},
{"name": "Türkiye", "occurrences": 6},
{"name": "Jordan", "occurrences": 4},
{"name": "Lebanon", "occurrences": 4},
{"name": "Egypt", "occurrences": 4}
],
"_detected_disasters": ["Earthquake", "Epidemic"]
}'

EXIT

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants