-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails on large message count #41
Comments
Hi João, It looks as if you may have hit some limit on your server, or maybe it's timing out. I'd need to look more carefully at this and I'm afraid I'm not likely to manage that in the near future. If you need a temporary fix, can I suggest splitting your inbox into folders, e.g. by year, running the program against each folder, and then (if you really want an inbox that large!) recombining them again? Best, |
Hello Quentin, |
Well, there are ways you could script it, but I would just use an email program to create a new folder, select all the messages in one year, and move them over. Then do the next year... Depending on your email client, you may be able to do something clever with smart mailboxes to make the selection process easier... |
Thanks for the tips! |
I ran into this as well doing an inbox with 300K+ messages. (Don't ask..) First run was great it deleted 100K dupes & I was excited but there were still dupes showing up in roundcube so I figured I'd run it again but I'd get that EOF error on the same fetch headers line. I changed (RFC822.HEADER) to (BODY.PEEK[HEADER]) and it worked again for 1 run. Then the dreaded EOF error every run after. So I edited (BODY.PEEK[HEADER]) back to (RFC822.HEADER) and it worked.. For one run.. Until I let it sit awhile & it worked again.. For 1 run then EOF. By that time it was clear something funky was up so I decided to dig deeper to try & narrow it down. While I did many things including adjust MAXLINE and wrap the IMAP commands in try/except hoping it'd continue (it doesn't) it wasn't until I enabled debugging with imaplib.Debug & m.debug = True I finally got a big clue as to what was going on: 35:55.56 BYE response: Server shutting down. So yeah umm seems the remote server is shutting down mid session? That'd explain why it works after editing (time passed allowing the server to be online again) And note it happened on folder with only 39 messages.. I had changed to another folder with fewer messages to try & narrow down the issue. I thought it was a fluke but was able to reproduce this shutting down bit multiple times. Many guesses as to what is up from corrupt messages on server to overloading server to bug in python imaplib to who knows but clearly there's an issue, just can't say it's in IMAPdedup (in fact it's not in that I get similar issues with other programs/scripts) beyond maybe it'd be helpful if it better handled & recovered. Btw not sure about OP but in my case this is all on InMotion shared business hosting which is Dovecot:
EDIT: Ok seems maybe that's syslog rate limiting in that post so maybe unrelated & weird coincidence.. If that's the case maybe need option to limit max # of messages it does at a time and/or add sleeps in the loop to help? |
1150749 others in INBOX I turned the imaplib debug on. |
Mmm. Do you have access to the server logs? The imaplib source says that '"abort" exceptions imply the connection should be reset, and So perhaps that's what we should do (if anyone who can test this would like to submit a pull request!) I guess your mail server may be very heavily loaded and timing out trying to do this even for 100 messages. However, you may be asking for problems with any IMAP server if you keep more than a million messages in a single mailbox! Not to mention using a lot of RAM on your local machine if you do manage to download even their headers... |
Thanks for the info. I reduced the chunksize to 1 and script ran. although if it again aborts, I will try to add the re connect part in the script. Will comment if that works. |
It worked for all the folders. Then I did the dry-run for the INBOX folder and it found 113000 duplicates. When i remove the -n option it fails. If I try the dry-run again now it also fails.
The text was updated successfully, but these errors were encountered: