Skip to content

Latest commit

 

History

History
51 lines (43 loc) · 2.37 KB

TODOS.md

File metadata and controls

51 lines (43 loc) · 2.37 KB

TO DO LIST :

  • [] COLLECT MORE TWEET FOR MORE ANALYSIS
  • REFRACTOR PREPROCESSING STEP
  • WORK ON TOPIC MODELING
  • EMBEDED IT REAL TIME PROCESS WITH A DATABASE (check apache airflow)
  • Learn about topic modeling and write the first draft of that blog post
  • Refractor the twitter to collect geograhpical data about the blog
  • [] refractor the blog post and put in a publishable state.
  • find a way to run the streaming code at midnight and collect tweets for one date
  • ask question on SO about how to retrieve data about the country
  • other step for cleaning
  • Remove one word or 2 words strings and charcteres
  • Remove kinshasa
  • [for each word in a topic plot the distribution]
  • Read Aspitel project and check how she implemented it
  • create the django or flask project and plan the deployment
  • [] write unittest for project
  • [] refractor the project and make it maintainable
  • [] get geocode location for all cities in DRC
  • fix the issue of retrieving tweets by date
  • use apache airflow to run cron jobs and data retrieval task at a specific time in a day
  • [] Deploy everything to DO
  • [] Improve the processing by removing Congolese names from stematization
  • Add a job that tweet the word count everyday after getting it
  • [] Create a job that goes to every tweet and collect all the replies about it
  • [] Get all the data for year 2020 and save it in a raw json file without cleaning
  • [] Save the data to json file without cleaning
  • [] https://dagshub.com/ investigate the usage of this
  • Move the project from Airflow to Prefect or Any other workflow manager
  • [] get all the tweets from my timeline
  • [] Add a script to intialize the database migration

DRC Coordinates

The whole country [11.94,-13.64,30.54,5.19]

PostModern Feb 2023

I finally get time to touch on this project after around 6 month of it being down. It took me around a day to setup a new server and to get the project running again. The github action are back working but there have been a lot of learning since the last time I worked on this project.

I would like to improve it by adding new tools.

HEre is the next road map.

Before adding new features to the project I would like to replace Airflow with Perfect as workflow manager. Replace Docker with Kubernetes as container manager. Then add more feature and improve the modelling aspect of the project.