Skip to content

Commit

Permalink
Merge pull request #4 from juanantoniodelgado/bundle
Browse files Browse the repository at this point in the history
Added support for multiple languages and fixed minor errors
  • Loading branch information
juanantoniodelgado authored May 29, 2021
2 parents 066ce6a + fbfa639 commit ece3726
Show file tree
Hide file tree
Showing 12 changed files with 5,031 additions and 2 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.idea/*
/vendor/
/src/cache.json
/src/words/test.json
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ PHP StopWords removal library with support for multiple languages.
$stopwords->clean('your text to clean');

## Supported languages
Arabic, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Ukrainian.
Arabic, Basque, Bulgarian, Catalan, Czech, Danish, Dutch, English, Finnish, French, German, Gujarati, Hebrew, Hindi, Hungarian, Indonesian, Italian, Malay, Norwegian, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Turkish, Ukrainian, and Vietnamese.

### Notes
Language files are set according to [ISO 639-2][standard].
Expand All @@ -26,23 +26,32 @@ Language | Source
---------- | -----------------
Arabic | https://github.com/Alir3z4/stop-words/blob/master/arabic.txt
Basque | http://www.ranks.nl/stopwords/basque
Bulgarian | https://github.com/Alir3z4/stop-words/blob/master/bulgarian.txt
Catalan | http://www.ranks.nl/stopwords/catalan http://latel.upf.edu/morgana/altres/pub/ca_stop.htm
Czech | https://github.com/Alir3z4/stop-words/blob/master/czech.txt
Danish | https://github.com/Alir3z4/stop-words/blob/master/danish.txt
Dutch | https://github.com/Alir3z4/stop-words/blob/master/dutch.txt
English | http://www.ranks.nl/stopwords
Finnish | https://github.com/Alir3z4/stop-words/blob/master/finnish.txt
French | http://www.ranks.nl/stopwords/french https://github.com/Alir3z4/stop-words/blob/master/french.txt
German | https://github.com/Alir3z4/stop-words/blob/master/german.txt
Gujarati | https://github.com/Alir3z4/stop-words/blob/master/gujarati.txt
Hebrew | https://github.com/Alir3z4/stop-words/blob/master/hebrew.txt
Hindi | https://github.com/Alir3z4/stop-words/blob/master/hindi.txt
Hungarian | https://github.com/Alir3z4/stop-words/blob/master/hungarian.txt
Indonesian | https://github.com/Alir3z4/stop-words/blob/master/indonesian.txt
Italian | https://raw.githubusercontent.com/Alir3z4/stop-words/master/italian.txt
Malay | https://github.com/Alir3z4/stop-words/blob/master/malaysian.txt
Norwegian | https://raw.githubusercontent.com/Alir3z4/stop-words/master/norwegian.txt
Portuguese | https://raw.githubusercontent.com/Alir3z4/stop-words/master/portuguese.txt
Romanian | https://raw.githubusercontent.com/Alir3z4/stop-words/master/romanian.txt
Russian | https://raw.githubusercontent.com/Alir3z4/stop-words/master/russian.txt
Slovak | https://github.com/Alir3z4/stop-words/blob/master/slovak.txt
Spanish | http://www.ranks.nl/stopwords/spanish http://snowball.tartarus.org/algorithms/spanish/stop.txt https://github.com/Alir3z4/stop-words/blob/master/spanish.txt
Swedish | https://raw.githubusercontent.com/Alir3z4/stop-words/master/swedish.txt
Turkish | https://raw.githubusercontent.com/Alir3z4/stop-words/master/turkish.txt
Ukrainian | https://raw.githubusercontent.com/Alir3z4/stop-words/master/ukrainian.txt
Vietnamese | https://github.com/Alir3z4/stop-words/blob/master/vietnamese.txt

## License
Contents of this repository are available under [Attribution 4.0 International (CC BY 4.0)][license].
Expand Down
269 changes: 269 additions & 0 deletions src/words/bulgarian.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,269 @@
{
"name": "bulgarian",
"handlers": [
"bulgarian",
"bul",
"bg"
],
"words": [
"а",
"автентичен",
"аз",
"ако",
"ала",
"бе",
"без",
"беше",
"би",
"бивш",
"бивша",
"бившо",
"бил",
"била",
"били",
"било",
"благодаря",
"близо",
"бъдат",
"бъде",
"бяха",
"в",
"вас",
"ваш",
"ваша",
"вероятно",
"вече",
"взема",
"ви",
"вие",
"винаги",
"внимава",
"време",
"все",
"всеки",
"всички",
"всичко",
"всяка",
"във",
"въпреки",
"върху",
"г",
"ги",
"главен",
"главна",
"главно",
"глас",
"го",
"година",
"години",
"годишен",
"д",
"да",
"дали",
"два",
"двама",
"двамата",
"две",
"двете",
"ден",
"днес",
"дни",
"до",
"добра",
"добре",
"добро",
"добър",
"докато",
"докога",
"дори",
"досега",
"доста",
"друг",
"друга",
"други",
"е",
"евтин",
"едва",
"един",
"една",
"еднаква",
"еднакви",
"еднакъв",
"едно",
"екип",
"ето",
"живот",
"за",
"забавям",
"зад",
"заедно",
"заради",
"засега",
"заспал",
"затова",
"защо",
"защото",
"и",
"из",
"или",
"им",
"има",
"имат",
"иска",
"й",
"каза",
"как",
"каква",
"какво",
"както",
"какъв",
"като",
"кога",
"когато",
"което",
"които",
"кой",
"който",
"колко",
"която",
"къде",
"където",
"към",
"лесен",
"лесно",
"ли",
"лош",
"м",
"май",
"малко",
"ме",
"между",
"мек",
"мен",
"месец",
"ми",
"много",
"мнозина",
"мога",
"могат",
"може",
"мокър",
"моля",
"момента",
"му",
"н",
"на",
"над",
"назад",
"най",
"направи",
"напред",
"например",
"нас",
"не",
"него",
"нещо",
"нея",
"ни",
"ние",
"никой",
"нито",
"нищо",
"но",
"нов",
"нова",
"нови",
"новина",
"някои",
"някой",
"няколко",
"няма",
"обаче",
"около",
"освен",
"особено",
"от",
"отгоре",
"отново",
"още",
"пак",
"по",
"повече",
"повечето",
"под",
"поне",
"поради",
"после",
"почти",
"прави",
"пред",
"преди",
"през",
"при",
"пък",
"първата",
"първи",
"първо",
"пъти",
"равен",
"равна",
"с",
"са",
"сам",
"само",
"се",
"сега",
"си",
"син",
"скоро",
"след",
"следващ",
"сме",
"смях",
"според",
"сред",
"срещу",
"сте",
"съм",
"със",
"също",
"т",
"тази",
"така",
"такива",
"такъв",
"там",
"твой",
"те",
"тези",
"ти",
"т.н.",
"то",
"това",
"тогава",
"този",
"той",
"толкова",
"точно",
"три",
"трябва",
"тук",
"тъй",
"тя",
"тях",
"у",
"утре",
"харесва",
"хиляди",
"ч",
"часа",
"че",
"често",
"чрез",
"ще",
"щом",
"юмрук",
"я",
"як"
]
}
Loading

0 comments on commit ece3726

Please sign in to comment.