Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match subject base64 #127

Open
amerlyq opened this issue May 12, 2016 · 7 comments
Open

Match subject base64 #127

amerlyq opened this issue May 12, 2016 · 7 comments

Comments

@amerlyq
Copy link

amerlyq commented May 12, 2016

At my work mail server encodes whole subject to base64 if it contains at least one non-ascii character.
This problem persists not only with me, if google for "decode mail subject" you can find many other servers.
Currently I haven't found any way to force subject decoding by imapfilter, which completely eliminates usefullness of imapfilter for me.
I can't easily replace match_subject with contain_subject because of spoken language structure when I need to match many word variations with regexes. Moreover in almost all cases subject is the single way to distinguish work-spam from useful work messages and urgent from pending, as I can't make such decision based on to/from/etc fields.

Would it be too much to ask for appropriate piece of code to add into imapfilter?)
If you are really tight on time to write and test it (as everyone is), please, point me at places in code where I could start working to implement it myself.

@lefcha
Copy link
Owner

lefcha commented May 22, 2016

IIUC, neither contain_subject() nor match_subject() work if the the mail Subject is encoded this way? Are you trying to search using a word that is not encoded in base64?

I think it should not be that hard to add support for encoding/decoding strings as the OpenSSL library that is required by imapfilter already has a C API for doing that. But first lets clarify what you want to do, and what works/doesn't work...

@amerlyq
Copy link
Author

amerlyq commented May 24, 2016

Lets clarify: contain_subject() always works either for base64 or not.
It's match_subject() which doesn't work.
Consider next two formats of Subject in my mailbox which I can't match:

First:

=?UTF-8?B?0JrQvtC80LjRgdGB0LjRjyDQv9GA0Lgg0LzQtdC20LTRg9C90LDRgNC+?=
 =?UTF-8?B?0LTQvdGL0YUg0L/QtdGA0LXRh9C40LvQtdC90LjRj9GFINCh0J/QlA==?=
 =?utf-8?B?0J3Rg9C20L3QviDQv9C10YDQtdC00LDRgtGMINC/0L7RgdGL0LvQvtGH0Lo=?=
 =?utf-8?B?0YMg0LjQtyDQmtC40LXQstCwINCyINCc0L7RgdC60LLRgy4g0L/QvtC/0Ys=?=
 =?utf-8?B?0YLQutCwIOKEljI=?=

Second:

=?utf-8?Q?=D0=97_=D0=94=D0=BD=D0=B5=D0=BC_=D0=9D=D0=B0=D1=80=D0=BE=D0=B4=D0=B6=D0=B5=D0=BD=D0=BD=D1=8F=21?=
21 =?utf-8?Q?=D1=80=D1=96=D1=87=D0=BD=D0=B8=D1=86=D1=8F_?=Java!
 =?utf-8?Q?=D0=9E=D1=82=D1=87=D0=B5=D1=82_=D0=BF=D1=80=D0=BE_=D0=B8=D0=B3=D1=80=D1=83_?=19
 =?utf-8?Q?=D1=82=D1=83=D1=80=D0=B0_=D0=92=D1=82=D0=BE=D1=80=D0=BE=D0=B9_=D0=9B=D0=B8=D0=B3=D0=B8_=D0=9A=D0=90=D0=A4_
 =D0=9E=D1=82=D1=87=D0=B5=D1=82_=D0=BF=D1=80=D0=BE_=D0=B8=D0=B3=D1=80=D1=83_?=19
 =?utf-8?Q?=D1=82=D1=83=D1=80=D0=B0_=D0=92=D1=82=D0=BE=D1=80=D0=BE=D0=B9_=D0=9B=D0=B8=D0=B3=D0=B8_=D0=9A=D0=90=D0=A4_

One block = one subject.
Some of them splitted in multiple lines in raw mail, being actually genuine oneline.
Seems like terms B? and Q? represent different formats w/o and w/ = symbols.

@lefcha
Copy link
Owner

lefcha commented Jun 25, 2016

I see, I'll have to look into this when I have some time, as it looks useful to be able to match such Subject header fields...

@onoraba
Copy link

onoraba commented Aug 22, 2016

workaround with maildrop http://www.courier-mta.org/maildrop/, that works with base64 encoded headers and message body

maildrop configuration
$ cat ~/.mailfilter if ( /^Subject:.*(путевка|тунис|романтика)/ ) { EXITCODE=5 exit } else { EXITCODE=0 exit } $

configuration test
$ cat ~/spam/test | maildrop ; echo $? 5 $

example imapfilter part

`all = account1['mailbox']:match_to('(?i)all@')
spam = Set {}

for _, mesg in ipairs(all) do
mbox, uid = table.unpack(mesg)
text = mbox[uid]:fetch_message()
mail_status = pipe_to('maildrop', text)
if (mail_status == 5) then
table.insert(spam, mesg)
end
end

all = all - spam

spam:copy_messages(account1['spam'])
spam:mark_deleted()
spam = nil

all:copy_messages(account1['mailbox2'])
all:mark_deleted()
all = nil
`

@amerlyq
Copy link
Author

amerlyq commented Apr 13, 2017

Also, it seems those names are conformant to rfc2047. So, despite its prohibited to use them now in mailing, they are still often guest in the wild. Like received from misconfigured Outlook, etc.

@Cybolic
Copy link

Cybolic commented Apr 8, 2021

For what it's worth, I got around this by creating a match_utf8_field function that I call instead of match_field or match_subject.

I put it up here: https://paste.sr.ht/~cybolic/902986c795599f558165c63bcb65a3d4ae15881e

@newhinton
Copy link

This also affects the match_from method. It seems spam heavily relies on utf-8 encoding to bypass "simple" filters, and imapfilter also does not catch those.

How would i decode the header before it is passed to match_from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants