Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails on large message count #41

Open
vortek opened this issue Jan 29, 2018 · 8 comments
Open

Fails on large message count #41

vortek opened this issue Jan 29, 2018 · 8 comments

Comments

@vortek
Copy link

vortek commented Jan 29, 2018

It worked for all the folders. Then I did the dry-run for the INBOX folder and it found 113000 duplicates. When i remove the -n option it fails. If I try the dry-run again now it also fails.

$ ./imapdedup.py -s mail.server.com -u user@mail.com -x l
Password:
Spam
Drafts
Deleted Items
Sent
INBOX
$ ./imapdedup.py -s mail.server.com -u user@mail.com -x INBOX
Password: 
There are 170714 messages in INBOX.
No message(s) currently marked as deleted in INBOX
170714 others in INBOX
Traceback (most recent call last):
  File "./imapdedup.py", line 324, in <module>
    main(sys.argv[1:])
  File "./imapdedup.py", line 321, in main
    process(options, mboxes)
  File "./imapdedup.py", line 248, in process
    ms = check_response(server.fetch(message_ids, '(RFC822.HEADER)'))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 456, in fetch
    typ, dat = self._simple_command(name, message_set, message_parts)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 1088, in _simple_command
    return self._command_complete(name, self._command(name, *args))
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/imaplib.py", line 912, in _command_complete
    raise self.abort('command: %s => %s' % (name, val))
imaplib.abort: command: FETCH => socket error: EOF
@quentinsf
Copy link
Owner

Hi João,

It looks as if you may have hit some limit on your server, or maybe it's timing out. I'd need to look more carefully at this and I'm afraid I'm not likely to manage that in the near future.

If you need a temporary fix, can I suggest splitting your inbox into folders, e.g. by year, running the program against each folder, and then (if you really want an inbox that large!) recombining them again?

Best,
Quentin

@vortek
Copy link
Author

vortek commented Jan 29, 2018

Hello Quentin,
How do you suggest that I split the inbox?
Thanks!

@quentinsf
Copy link
Owner

Well, there are ways you could script it, but I would just use an email program to create a new folder, select all the messages in one year, and move them over. Then do the next year...

Depending on your email client, you may be able to do something clever with smart mailboxes to make the selection process easier...

@vortek
Copy link
Author

vortek commented Jan 29, 2018

Thanks for the tips!

@Bill48105
Copy link

Bill48105 commented Mar 4, 2018

I ran into this as well doing an inbox with 300K+ messages. (Don't ask..) First run was great it deleted 100K dupes & I was excited but there were still dupes showing up in roundcube so I figured I'd run it again but I'd get that EOF error on the same fetch headers line. I changed (RFC822.HEADER) to (BODY.PEEK[HEADER]) and it worked again for 1 run. Then the dreaded EOF error every run after. So I edited (BODY.PEEK[HEADER]) back to (RFC822.HEADER) and it worked.. For one run.. Until I let it sit awhile & it worked again.. For 1 run then EOF. By that time it was clear something funky was up so I decided to dig deeper to try & narrow it down. While I did many things including adjust MAXLINE and wrap the IMAP commands in try/except hoping it'd continue (it doesn't) it wasn't until I enabled debugging with imaplib.Debug & m.debug = True I finally got a big clue as to what was going on:

35:55.56 BYE response: Server shutting down.

So yeah umm seems the remote server is shutting down mid session? That'd explain why it works after editing (time passed allowing the server to be online again) And note it happened on folder with only 39 messages.. I had changed to another folder with fewer messages to try & narrow down the issue. I thought it was a fluke but was able to reproduce this shutting down bit multiple times.

Many guesses as to what is up from corrupt messages on server to overloading server to bug in python imaplib to who knows but clearly there's an issue, just can't say it's in IMAPdedup (in fact it's not in that I get similar issues with other programs/scripts) beyond maybe it'd be helpful if it better handled & recovered.

Btw not sure about OP but in my case this is all on InMotion shared business hosting which is Dovecot:

  • OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE NAMESPACE STARTTLS AUTH=PLAIN AUTH=LOGIN] Dovecot ready.

EDIT: Ok seems maybe that's syslog rate limiting in that post so maybe unrelated & weird coincidence..
Little searching & maybe it's rate limiting:
"server dovecot: imap(account@tld.com): Server shutting down. in=7140 out=70598"
https://www.howtoforge.com/community/threads/server-dovecot-imap-account-tld-com-server-shutting-down-in-7140-out-70598.74887/

If that's the case maybe need option to limit max # of messages it does at a time and/or add sleeps in the loop to help?

@shubhammatta
Copy link

1150749 others in INBOX
30:37.28 > LCJD5 FETCH 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 (RFC822.HEADER)
30:38.69 last 0 IMAP4 interactions:
30:38.69 > LCJD6 LOGOUT
30:38.69 last 0 IMAP4 interactions:
Traceback (most recent call last):
File "imapdedup.py", line 324, in
main(sys.argv[1:])
File "imapdedup.py", line 321, in main
process(options, mboxes)
File "imapdedup.py", line 248, in process
ms = check_response(server.fetch(message_ids, '(RFC822.HEADER)'))
File "/root/daily_build/64_23/4.3.4/SysUtil/Python-2.7.5-cross/install_path_full/lib/python2.7/imaplib.py", line 443, in fetch
File "/root/daily_build/64_23/4.3.4/SysUtil/Python-2.7.5-cross/install_path_full/lib/python2.7/imaplib.py", line 1070, in _simple_command
File "/root/daily_build/64_23/4.3.4/SysUtil/Python-2.7.5-cross/install_path_full/lib/python2.7/imaplib.py", line 899, in _command_complete
imaplib.abort: command: FETCH => socket error: EOF

I turned the imaplib debug on.
I get that INBOX has huge amount of mails but fetching result in socket error EOF. Anyone has any insights?

@quentinsf
Copy link
Owner

Mmm. Do you have access to the server logs?

The imaplib source says that '"abort" exceptions imply the connection should be reset, and
the command re-tried.'

So perhaps that's what we should do (if anyone who can test this would like to submit a pull request!)

I guess your mail server may be very heavily loaded and timing out trying to do this even for 100 messages. However, you may be asking for problems with any IMAP server if you keep more than a million messages in a single mailbox! Not to mention using a lot of RAM on your local machine if you do manage to download even their headers...

@shubhammatta
Copy link

Thanks for the info. I reduced the chunksize to 1 and script ran. although if it again aborts, I will try to add the re connect part in the script. Will comment if that works.
Although I wish it does not abort . Have been at it for quite some time now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants