Skip to content

elizavetaRa/sis-exercise-ragozina

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assignment: INSPIREHEP Search & Summarization Web App

Setup Application

Backend

Prerequisites: Docker, Docker-compose

  1. Rename ".env.example" file to ".env" and adjust
  2. Setup
    • Run the django app
        make up 
        make bootstrap 
        # visit localhost:8000
        # user: admin
        # password: admin
    

The backend app should run on http://localhost:8000/ To access admin view: go to http://localhost:8000/admin with user and *password *

Celery Data Harvester

  1. Run the backend
  2. In another terminal copy the celery docker-container id by running

docker ps

  1. Run celery worker

The harvester will fill the db with the api results in the interval set in CELERY_SCHEDULE_MINUTES in .env file (1 day default). Recommendation to put to 1 minute for test purposes! docker exec -it f <celery-docker-container-id> celery -A sis_exercise worker --loglevel=info
The command should return something like "celery@00769f42c709 ready." Depending on system sudo might be needed fro this command

  1. Run celery beat docker exec -it <celery-docker-container-id> celery -A sis_exercise beat --loglevel=info The command should return something like "beat: Starting... Scheduler: Sending due task scheduler (api.tasks.harvest_literature)". Depending on system sudo might be needed fro this command

Frontend

Prerequisites: developed on node version v22.9.0

  1. Go to frontend folder and rename frontend/.env.example to frontend/.env and adjust REACT_APP_BACKEND_URL if needed (default as above)
  2. In frontend folder and run npm install npm start

The frontend app should run on http://localhost:3000/


Overview solution: Searching App

The solution provides a simple web application that allows users to search for high-energy physics papers using the INSPIREHEP REST API and receive a summary generated by the OpenAI API and the list of results. The application consists of a Django backend and a React frontend.

  • The celery harvester harvests everyday the INSPIREHEP REST API, gets the first 40 papers and saves them
  • The frontend UI provides an accessible search bar for the search queries and displays the OpenAI generated summary and the list of results
  • The backend provides the endpoint for the search query, extracts titles and abstracts from the top search results, summizes the information using the OpenAI API or a mock function and returns it with the search results

Assumptions or shortcuts

  • Publication date is not returned by the api for tested results but publication year. For the mocking of date assumed, the publication date is "01/01/year"
  • Usage of ant.design for quick prototyping of the frontend
  • Unit tests are not covering the use cases and are implemented as example
  • For setup of frontend the simplest setup with react-create-app and js is taken. For real-world applications recommended setup with ts.
  • cors policies allow all: would need to be changed for production

Given task: Create a task description for the following request:

  • The product owner wants to have metrics about the OPENAI API response time and the most common user queries.
    • Create an issue for this task that you will give to your team to solve, please be specific. in the implementation.
    • OPTIONAL: Implement your suggestion.

Task Description: Metrics for OPENAI API Response Time and User Queries

  • Task Title: Implement Metrics for OPENAI API Performance and User Interaction

  • User Story: As a product owner, I want to gather metrics about the OPENAI API response time and the most common user queries displayed in the Django admin view so that we can analyze it's performance.

  • Acceptance Criteria:

    • Response Time Measurement: Implement functionality to track the response time for each API call to the OPENAI API. Log response times in milliseconds for each request.
  • User Queries Tracking: Capture and store all user queries and their counts sent to the OPENAI API. Ensure queries are logged with timestamps for tracking trends over time. Data Aggregation:

  • Reporting Dashboard: Create a user-friendly dashboard in django admin panel that displays:

    • Logs with real-time response time metrics on each query (Openai api metrics)
    • A list of the user queries along with their frequency, average, max, min response time
  • Error Handling: Implement robust error handling to log and report failed API requests separately.

  • Documentation: Provide clear documentation on: How metrics are collected and reported. Instructions for accessing and interpreting the dashboard.

  • Definition of Done:

  • Sprint considerations: Estimate story points for this task and prioritize it in the upcoming sprint. After implementation, gather feedback from the product owner to refine metrics further.

Overview solution: Metrics for OPENAI API

  1. Run backend with instructions above
  2. Access metrics http://localhost:8000/admin/api/openaiapimetrics/ (logs with timestamps and response times per query) http://localhost:8000/admin/api/openaiapistatistics/ (statistics per query with count,avg, min, max response time)

Constraints, assumptions or shortcuts

  • All queries that return 0 results from Elasticsearch are not sent to openai and not displayed in the statistics

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published