Tikvah Telegram channel analysis repo
! this repo analysis is done for learning purpose
- bs4
git clone https://github.com/hmhard/tikvah-tg-channel-analysis.git
# give permission
chmod +x process.sh
# default 500 given to top n change to whatever you want
./process.sh 500
- Fetched HTML data.
- Extracted data into JSON format.
- Filtered Amharic keywords, removing entries with:
- Emojis
- English characters
- Special characters
- Numbers
- Filtered out stop words.
data = {
"ሰዎች": 14109,
"ከተማ": 10457,
"ክልል": 9968,
"ቤት": 8725,
"አበባ": 7661,
"ሲሆን": 7288,
"ዛሬ": 6636,
"መሆኑን": 6383,
"ተማሪዎች": 5937,
"ቀን": 5896,
"ከፍተኛ": 5741,
"የኢትዮጵያ": 5684,
"ሰዓት": 5294,
"ፖሊስ": 5167,
"እንዲሁም": 5136,
"መንግስት": 4997,
"ትምህርት": 4992,
"መረጃ": 4845,
"ስራ": 4820,
"ብር": 4673,
"ኢትዮጵያ": 4528,
"ቫይረስ": 4229,
"ደግሞ": 4185,
"ቁጥር": 4154,
"ሚኒስትር": 4130,
"አገልግሎት": 4026,
"ዞን": 3999,
"ዩኒቨርሲቲ": 3820,
"ዓመት": 3709,
...