Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add blacklisted url #39

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions public_html/carik/files/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blacklist.txt
3 changes: 3 additions & 0 deletions public_html/carik/files/blacklist-global.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
Test Luri,62878764690001
Ace Hardware,6281314789856
AstraPay,6288807619898
Bank Allo,62811681846110
Bank BCA,62811615006998
Bank BTPN,6281218976774
Bank Mega,6282208223322
Bank Mega Kartu Kredit,628226082262107
Blibli Promotion,6281517551356
Blibli Tiket,6281119215050
Cinepolis Indonesia,6287777731078
Citilink,6281110110808
Expand All @@ -15,6 +17,7 @@ Honest OTP,6281119959581
Jenius,6281218976774
KAI121,6281112111121
Mega Life,6281197900111
MRT Jakarta,6285195901552
Prioritas Info,6281808800055
Shopee Security,6285574670796
Telkomsel,6281111111111
Expand Down
122 changes: 122 additions & 0 deletions public_html/carik/files/blacklist-url.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Telegram Bot
@majorwbot
@BiggesTonBot
@DinoAirdBot
@dogshouse_bot
@dogsqhouse_bot
@dogsprize_bot
@EmpirTokenxBot
@lacaknomer_bot
@MajorAidrops_Bot
@MajorAllBot
@MAJORSTORYBOT
@Notcoin_DroopBot
@Notcoin_litebot
@Notcoin_luckybot
@Notcoin_moonbot
@Notcoin
@PAWSOG_nbot
@PAWSQO_BOT
@SpinTetherBOT
@TetherSpinsBOT
@TondropAI_bot
@TONEWYERBOT
@XEmpireTelegramBot
@xFreeSpinx_bot
@XTONXEBOT
@xtoncoinspin_bot
@xtonspin_bot
@XTONCHRISTMASBOT
putri_aniss334
Raymond_adminFx

# Keyword
$DOGS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove or clarify the "$DOGS" identifier

The "$DOGS" entry appears to be a variable-like identifier that may not be effective for URL blocking. Consider either removing it or replacing it with the actual URLs/patterns you want to block.

listrikgratis

# Telegram
t.me/+dIRnvlnAAd5jZjc8
t.me/+mNW8EcFfbHw5ZTQ0
t.me/JOIN_THE_WINNING_PROCESS
t.me/major
t.me/SecureStocks
t.me/TATTIAHOE

# Site/Link
bansosupdate2024.trustklik.live
bansos2024.regist-report22.com
claim-danabansos.directklick.com
claim-danabansos.sosialasia.com
claimm-danabansos.smediax.online
click-bansosvia-telegram.web.id
cliick-informaasi-baansos2024.my.id
Comment on lines +48 to +52
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance patterns for social assistance claim scams

The new entries show variations in claim-related URLs. Consider adding these patterns:

+ # Social assistance claim patterns
+ ^cla?i[m]+[-]?dana?[-]?bansos\..*$
+ ^click[-]?bansos(?:via)?[-]?telegram\..*$

This would catch variations in spelling and formatting used to avoid detection.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
claim-danabansos.directklick.com
claim-danabansos.sosialasia.com
claimm-danabansos.smediax.online
click-bansosvia-telegram.web.id
cliick-informaasi-baansos2024.my.id
claim-danabansos.directklick.com
claim-danabansos.sosialasia.com
claimm-danabansos.smediax.online
click-bansosvia-telegram.web.id
cliick-informaasi-baansos2024.my.id
# Social assistance claim patterns
^cla?i[m]+[-]?dana?[-]?bansos\..*$
^click[-]?bansos(?:via)?[-]?telegram\..*$

cpxsppk.plx-8.systems
feji.us/informasi-bansos2024t.me/mh_aripin
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix malformed URL combining multiple destinations

The URL feji.us/informasi-bansos2024t.me/mh_aripin appears to combine a URL shortener with a Telegram link. This should be split into separate entries.

-feji.us/informasi-bansos2024t.me/mh_aripin
+feji.us/informasi-bansos2024
+t.me/mh_aripin
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
feji.us/informasi-bansos2024t.me/mh_aripin
feji.us/informasi-bansos2024
t.me/mh_aripin

feji.us/lapangankerja-bumn
feji.us/mytelkomsel2024
Comment on lines +54 to +56
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: URL shorteners require additional security measures

URL shorteners (feji.us, s.id) pose significant risks:

  1. Destinations can change after being blacklisted
  2. Multiple redirects can bypass checks
  3. Short URLs are easy to regenerate

Consider:

  1. Implementing URL expansion before blacklist checks
  2. Storing expanded URLs in the blacklist
  3. Using URL reputation services

Also applies to: 59-65

infoloker.aplly-my.com/Jobs
linkin.bio/layananmo
lokerbumn-2024.guirel.win
lokerr-update2024.trustklik.live
lokerindonesia2024.lokerblog.net
lowongan-kerja2024.apd-d1.com/jb
lowongankerja.vrole.uk
lowongankerja2024.online
lowonganbumn-terbaru2024.directklick.com
Comment on lines +59 to +65
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance domain patterns for job recruitment scams

The new entries reveal additional patterns in job scam URLs. Consider adding these patterns:

+ # Additional job scam patterns
+ ^rekrutmen(?:bersama)?bumn\d+\..*\.win$
+ ^lowongan.*(?:bumn|kerja).*\.(?:directklick\.com|vrole\.uk)$

This would complement existing patterns while catching more variants of job scam domains.

Also applies to: 50-51

pendaftaran-kerja.info-1d.com
qclaim-danaabnsos.smediax.com
rakyat.me/dftxrkn
rekrutmenbersamabumn05.byfux.win
rekrutmenbersamabumn19.byfux.win
s.id/bantuansosial
s.id/dtkssbans0sterupdate
s.id/Loker_Update_2024
s.id/Rekrutmenttssbersama
s.id/lOkerinsdonesiaterupdatessn
s.id/lokerterbaru
s.id/rekrutsmentsbersmbumn2024isn
satuin.web.id/1AVxF/?Lowongankerja2024
searchloker02.newinfo.cc
telegra.ph/Free-TON-Giveaway-11-30
voxmn.pl/nKHr8/?LokerBUMN24

Comment on lines +45 to +82
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Critical: Implement URL expansion and pattern-based blocking

  1. URL shorteners (s.id, feji.us) pose security risks as destinations can change:
#!/bin/bash
# Implement URL expansion before blacklist checks
for domain in $(grep -E "s\.id|feji\.us" public_html/carik/files/blacklist-url.txt); do
  echo "Checking $domain redirects"
  curl -sI "$domain" | grep -i "location:"
done
  1. Replace individual URLs with patterns:
+# Social assistance scams
+^(?:bansos|bantuan).*(?:2024|update)\.
+^(?:claim|clalm)[-]?(?:dana)?[-]?(?:bansos)\.
+# Job recruitment scams
+^(?:loker|lowongan).*(?:bumn|kerja|2024)\.
+^rekrutmen(?:bersama)?bumn\d+\.


# Domain/Site
AirdropMaga.lol
aply1-id.com
antgpt.org
best-value.ltd
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove duplicate domain entry

The domain best-value.ltd appears twice in the list. Remove one instance:

 best-value.ltd
 bansos-2024.info
 bansos2024.info
 bantuan.us.to
-best-value.ltd
 byfux.win

Also applies to: 91-91

bansos-2024.info
bansos2024.info
bantuan.us.to
bantuanbansospkh.net
best-value.ltd
byfux.win
clalm.one
directklick.com
eth-spin.lol
geets-cliks.com
gets-offcial.com
ghiju.us
gshortlink.com
heylink.me
indodock.com
info-ind.com
informasi.us.to
klikhere.website
kminfo.app
newsupdate.asia
ppkh.site
register2024.live
rakyat.me
rkyt.eu
site-klik.com
st-rg.com
sosialasia.com
spoo.me
tribunsinfo.cc
trustklik.live
uasx11.com
vitur.me
xbeack.asia
ze-me.xyz
1 change: 1 addition & 0 deletions public_html/carik/files/word-standard.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9513,6 +9513,7 @@ indologi
indonesia
indonesianisasi
indra
indraja
indraloka
indranila
indriawi
Expand Down
3 changes: 3 additions & 0 deletions source/common/carik.inc
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ const
LINE_TOKEN = 'line/default/token';
LINE_BOT_REPLY_DISABLE = 'line/default/reply_disable';
LINE_BOT_FIRST_SESSION_RESPONSE = 'line/default/first_session_response';
TELEGRAM_GROUP_MAXIMUM_MEMBER_COUNT = 1500;

WITAI_TOKEN = 'witai/default/token';
GOOGLE_KEY = 'google/default/key';
Expand Down Expand Up @@ -138,10 +139,12 @@ const

NEW_MEMBER_INTERVAL_POST_PERMITTED = 20;
SPAM_SCORE_THRESHOLD = 80;
SPAM_SCORE_FORWARD_STORY = 80;
SPAM_WORD = 'ai/default/spam_word';
SPAM_CAS_OFFENSE = 1; // api.cas.chat

GROUP_DATA_FILENAME = 'files/carik/carik-groupdata.dat';
BLACKLIST_URL_FILENAME = 'files/blacklist-url.txt';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve blacklist file path handling and documentation

Several concerns regarding the blacklist file configuration:

  1. The relative path could cause issues depending on execution context
  2. There are now three different blacklist files without clear distinction:
    • BLACKLIST_URL_FILENAME
    • BLACKLIST_GLOBAL_FILENAME
    • BLACKLIST_FILENAME

Consider:

  1. Using absolute paths or environment-based path resolution
  2. Adding documentation to clarify the purpose of each blacklist file
  3. Consider consolidating the blacklists if their purposes overlap

Example documentation:

+ // Blacklist files:
+ // - blacklist-global.txt: Contains globally banned entities
+ // - blacklist.txt: Contains locally banned entities
+ // - blacklist-url.txt: Contains banned URLs and domains
  BLACKLIST_GLOBAL_FILENAME = 'files/blacklist-global.txt';
  BLACKLIST_FILENAME = 'files/blacklist.txt';
  BLACKLIST_URL_FILENAME = 'files/blacklist-url.txt';

Committable suggestion skipped: line range outside the PR's diff.

CALLBACK_QUERY_TIMEOUT = 5; // 5 minutes
CALLBACK_QUERY_TIMEOUT_PREFIX = 30; // 5 minutes
MESSAGE_TYPE = 'message_type';
Expand Down
68 changes: 55 additions & 13 deletions source/common/carik_webmodule.pas
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ TCarikWebModule = class(TMyCustomWebModule)
FVenueLongitude: double;
FVenueName: string;
FGroupData: TIniFile;
FIsCustomActionFileExist:boolean;

// TELEGRAM
function getActiveContext: string;
Expand Down Expand Up @@ -419,6 +420,7 @@ TCarikWebModule = class(TMyCustomWebModule)
property CustomReplyURLFromExternalNLP: string read FCustomReplyURLFromExternalNLP;
property CustomReplyActionTypeFromExternalNLP: string read FCustomReplyActionTypeFromExternalNLP;
property CustomReplyDataFromExternalNLP: TJSONUtil read FCustomReplyDataFromExternalNLP;
property IsCustomActionFileExist: boolean read FIsCustomActionFileExist write FIsCustomActionFileExist;
procedure SaveActionToUserDataFromCard(AData: TJSONObject);
procedure SaveActionToUserDataFromForm(AData: TJSONObject);
procedure SaveActionToUserData(AActionType: string; AData: TJSONObject = nil);
Expand Down Expand Up @@ -2609,6 +2611,8 @@ function TCarikWebModule.customMathHandler(const IntentName: string;
Result := Format(FFormatNumber,[resultValue]);
Result := Result.Replace(',000','');
except
Result := ExternalNLP('berapa '+Params.Values['Formula_value']);
if Result.IsEmpty then Result := 'duuhh... saya bingung dehh.. ';
end;
mathParser.Free;
end;
Expand Down Expand Up @@ -3209,6 +3213,7 @@ function TCarikWebModule.SpamScore(AUserID: string; AText: string;
i, j, additionScore: integer;
s, triggerName, triggerWord: string;
jData: TJSONData;
lstURL: TStringList;
begin
Result := 0;
AText := AText.ToLower;
Expand Down Expand Up @@ -3247,15 +3252,32 @@ function TCarikWebModule.SpamScore(AUserID: string; AText: string;
triggerWord := jData.Items[i].Items[j].AsString;
if preg_match(triggerWord, AText) then
begin
LogUtil.Add(triggerName + '/' + triggerWord, 'SPAM-CHECK');
Result := Result + additionScore;
end;
end;
end;
jData.Free;

if Result < 80 then
if Result < SPAM_SCORE_THRESHOLD then
begin
// check blacklisted URL
lstURL := TStringList.Create;
lstURL.LoadFromFile(BLACKLIST_URL_FILENAME);
for i := 0 to lstURL.Count -1 do
begin
s := LowerCase(lstURL[i]).Trim;
if s.IsEmpty then Continue;
if s.IsExists('#') then Continue;
if Pos( s, AText) > 0 then
begin
Result := Result + SPAM_SCORE_THRESHOLD;
end;
end;
lstURL.Free

//TODO: check from spam-score api;

end;

if Result >= SPAM_SCORE_THRESHOLD then
Expand Down Expand Up @@ -4015,6 +4037,11 @@ function TCarikWebModule.GenerateResponseJson: string;
jsonOutput['processing_time'] := SimpleBOT.SimpleAI.ElapsedTime.ToString.ToInteger;
end;

if FIsCustomActionFileExist then
begin
jsonOutput.ValueArray['action/files'] := FCustomActionFiles;
end;

if FIsDebug then
Result := jsonOutput.AsJSONFormated
else
Expand Down Expand Up @@ -4484,17 +4511,19 @@ procedure TCarikWebModule.LogChat(AChannelID: string; AGroupID: string;
begin
requestJson['client_id'] := FClientId;
end;
requestJson['message/message_id'] := AMessageID.ToString;
requestJson['message/message_id'] := MessageID;
requestJson['message/text'] := (AText);
requestJson['message/reply'] := (AReply);
requestJson['message/from/id'] := AUserID;
requestJson['message/from/name'] := AFullName;
requestJson['message/from/username'] := AUserName;
requestJson['message/chat/channel'] := AChannelID;
requestJson['message/chat/id'] := AMessageID.ToString;
requestJson['message/chat/id'] := MessageID;
requestJson['message/chat/is_mentioned'] := is_mentioned;
requestJson['message/chat/is_group'] := is_group;
requestJson['message/intents/name'] := SimpleBOT.SimpleAI.IntentName;
requestJson['message/dashboard_device_id'] := DashboardDeviceID;

if AReplyFromMessageId > 0 then requestJson['message/chat/is_reply'] := 0;
if AIsGroup then
begin
Expand Down Expand Up @@ -4522,8 +4551,10 @@ procedure TCarikWebModule.LogChat(AChannelID: string; AGroupID: string;
httpResponse := Post;
if _GET['_DEBUG'] <> '1' then
begin
//LogUtil.Add(httpResponse.ResultText, 'logchat');
end;
//LogUtil.Add(responseJson.AsJSON, 'logchat');
//LogUtil.Add(log_url, 'logchat');
//LogUtil.Add(httpResponse.ResultText, 'logchat');
Free;
end;
try
Expand Down Expand Up @@ -4591,6 +4622,7 @@ procedure TCarikWebModule.LogJoin(AChannelID: string; AGroupID: string;
ContentType := 'application/json';
RequestBody := TStringStream.Create(requestJson.AsJSON);
try
LogUtil.Add( 'submit: ' + requestJson.AsJSON, 'JOIN', False, AppData.logDir + 'logjoin.log');
http_response := Post;
LogUtil.Add( 'response-join: ' + http_response.ResultText, 'JOIN');
except
Expand Down Expand Up @@ -4735,6 +4767,7 @@ constructor TCarikWebModule.CreateNew(AOwner: TComponent; CreateMode: integer);
SimpleBOT.StorageType := stRedis;
end;
Carik := TCarikController.Create;
FCustomActionFiles := nil;
FLanguage := 'en-id';
FSendAudio := False;
FSendPhoto := False;
Expand Down Expand Up @@ -4775,6 +4808,7 @@ constructor TCarikWebModule.CreateNew(AOwner: TComponent; CreateMode: integer);
FCustomReplyTypeFromExternalNLP := '';
FCustomReplyURLFromExternalNLP := '';
FCustomReplyActionTypeFromExternalNLP := '';
FIsCustomActionFileExist := False;
FExternalNLPStarted := False;
FGPTTimeout := 0;
FPackageName := '';
Expand Down Expand Up @@ -4923,7 +4957,7 @@ function TCarikWebModule.ProcessText(AMessage: string): string;
begin
SimpleBOT.SimpleAI.AdditionalParameters.Values['ClientId'] := ClientId;;
SimpleBOT.SimpleAI.AdditionalParameters.Values['client_id'] := ClientId;
if DeviceId.IsNotEmpty then SimpleBOT.SimpleAI.AdditionalParameters.Values['dashboard_device_id'] := DeviceId;
if DeviceId.IsNotEmpty then SimpleBOT.SimpleAI.AdditionalParameters.Values['dashboard_device_id'] := DashboardDeviceID.ToString;
if IsDelayReplay then SimpleBOT.SimpleAI.AdditionalParameters.Values['delay_reply'] := '1';
end;
if Carik.IsGroup then
Expand Down Expand Up @@ -5399,6 +5433,7 @@ function TCarikWebModule.FormInputHandler: boolean;
FCustomReplyActionTypeFromExternalNLP := 'text';
if FCustomReplyTypeFromExternalNLP.IsNotEmpty then
begin
LogUtil.Add('external data not empty', 'FORM');
FCustomReplyActionTypeFromExternalNLP := FCustomReplyDataFromExternalNLP['action/type'];
FCustomReplyURLFromExternalNLP := FCustomReplyDataFromExternalNLP['action/url'];
FCustomReplyName := FCustomReplyDataFromExternalNLP['action/name'];
Expand Down Expand Up @@ -5706,7 +5741,22 @@ function TCarikWebModule.FormInputHandler: boolean;
FCustomReplyDataFromExternalNLP.LoadFromJsonString(httpResponse.ResultText, False);
Suffix := FCustomReplyDataFromExternalNLP['text'];

// check files - taruh di sini karena konflik dengan code di bawah *1
try
if not (FCustomReplyDataFromExternalNLP.Data.FindPath('action.files') = nil) then
begin
FIsCustomActionFileExist := True;
FCustomActionFiles := TJSONArray(GetJSON(FCustomReplyDataFromExternalNLP.Data.GetPath('action.files').AsJSON));
end;
except
on E:Exception do
begin
//LogUtil.Add('Error: ' + E.Message, 'FORM');
end;
end;

//TODO: build custom action
//TODO: ref *1
FCustomReplyTypeFromExternalNLP := FCustomReplyDataFromExternalNLP['type'];
FCustomReplyActionTypeFromExternalNLP := 'text';
if FCustomReplyTypeFromExternalNLP.IsNotEmpty then
Expand All @@ -5732,14 +5782,6 @@ function TCarikWebModule.FormInputHandler: boolean;
}
end;

// files
FCustomActionFiles := Nil;
try
LogUtil.Add('ada file', 'FORM');
FCustomActionFiles := TJSONArray(FCustomReplyDataFromExternalNLP.Data.GetPath('files'));
except
end;

end;

end;
Expand Down
10 changes: 7 additions & 3 deletions source/common/direct_handler.pas
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ procedure TCarikHandler.Post;
if Text = 'False' then
Text := '';

messageID := jsonData.Value['message/message_id'];
MessageID := jsonData.Value['message/message_id'];
chatID := jsonData.Value['message/chat/id'];
chatType := jsonData.Value['message/chat/type'];
groupID := jsonData.Value['message/chat/group_id'];
Expand Down Expand Up @@ -210,8 +210,12 @@ procedure TCarikHandler.Post;
SimpleBOT.FirstSessionResponse := s2b(Config[CONFIG_FIRST_SESSION_RESPONSE]);

try
Carik.GroupName := jsonData.GetPath('message.chat.title').AsString;
Carik.GroupName := jsonData.GetPath('message.chat.group_name').AsString;
except
try
Carik.GroupName := jsonData.GetPath('message.chat.title').AsString;
except
end;
end;
Carik.UserPrefix := channelID;

Expand Down Expand Up @@ -321,7 +325,7 @@ procedure TCarikHandler.Post;
LogChatPayload.Text:= Response.Content;
LogChat(ChannelId, Carik.GroupChatID, Carik.GroupName, Carik.UserID, Carik.UserName, Carik.FullName, OriginalText, '', Carik.IsGroup, True);
//OutputJson(11, 'muted: ' + MutedUntil.AsString);
Response.Content:= SimpleBOT.SimpleAI.ResponseJson;
Response.Content := SimpleBOT.SimpleAI.ResponseJson;
Exit;
end;

Expand Down
Loading