-
Notifications
You must be signed in to change notification settings - Fork 3
Mailbagit Input Output Examples
mailbagit uses the bagit-python directory input, but accepts both paths to individual files and directories:
mailbagit path/to/account --mailbag-name account -i eml -d pdf
mailbagit path/to/account.pst --mailbag-name account -i pst -d eml pdf
mailbagit path/to/email.msg --mailbag-name email -i msg -d pdf
mailbagit path/to/account.mbox -m account -i mbox -d eml
mailbagit path/to/email -m email -i mbox -d pdf
bagit-python supports multiple directory input arguments:
bagit.py path/to/stuff path/to/more/stuff
Unlike bagit-python, mailbagit will only support one input path argument. If you give it more it will raise an error:
mailbagit path/to/mail.mbox path/to/more.mbox -m account -i mbox -d eml
> 2022-01-27 13:39.05 [error ] Multiple input paths provided. Mailbagit only supports single input paths. You may want to try providing a directory of email or running the command multiple times to create multiple mailbags.
A mailbag is not opinionated on if you are packaging a single or multiple email accounts, but it does try to maintain the arrangement structure of email present in either the input's directory structure and/or filenames, X-Folder, X-Gmail-Labels, or other header fields, or included within a PST file. This may be problematic, as the the arrangement of a directory of email data may or may not define the intellectual arrangement of the email. A Mailbag merely maintains this information in mailbag.csv
fields and a human must infer any arrangement based on context.
A Mailbag uses three fields to maintain this information, File-Path, Original-Filename, and Message-Path. And additional Message-Path-Escaped is also used in the case that the Message-Path field is incompatible with the filesystem used during packaging.
- Original-File
- The relative path to any email source file within a mailbag.
- Message-Path
- Any email folder structure read from PST directory structure or email headers like X-Folder or X-Gmail-Label.
- Derivatives-Path
- The relative path within a mailbag used to write derivatives within a mailbag. This is a join of Original-File without the extension and Message-Path. Any characters that are invalid in the packaging filesystem need to be escaped here.
Command
mailbagit path/to/account.pst --mailbag-name allfacstaff --input pst --derivatives eml pdf
Input
path/to/account.pst
path/to/otherFile.docx
ignores all files without .pst extension (not case sensitive)
Metadata Examples
- message1
- Original-File:
account.pst
- Message-Path:
Top of Outlook data file/Inbox
- Derivatives-Path:
account/Top of Outlook data file/Inbox
- Original-File:
- message2
- Original-File:
account.pst
- Message-Path:
Top of Outlook data file/Inbox
- Derivatives-Path:
account/Top of Outlook data file/Inbox
- Original-File:
- message3
- Original-File:
account.pst
- Message-Path:
Top of Outlook data file/*Important*
- Derivatives-Path:
account/Top of Outlook data file/%2AImportant%2A
- Original-File:
- message4
- Original-File:
account.pst
- Message-Path:
Top of Outlook data file/Trash
- Derivatives-Path:
account/Top of Outlook data file/Trash
- Original-File:
Output
path/to/allfacstaff/bagit.txt
path/to/allfacstaff/bagit.txt
path/to/allfacstaff/mailbag.csv
...
path/to/allfacstaff/data/pst/account.pst
path/to/allfacstaff/data/eml/account/Top of Outlook data file/Inbox/1.eml
path/to/allfacstaff/data/eml/account/Top of Outlook data file/Inbox/2.eml
path/to/allfacstaff/data/eml/account/Top of Outlook data file/%2AImportant%2A/3.eml
path/to/allfacstaff/data/eml/account/Top of Outlook data file/Trash/4.eml
path/to/allfacstaff/data/pdf/account/Top of Outlook data file/Inbox/1.pdf
path/to/allfacstaff/data/pdf/account/Top of Outlook data file/Inbox/2.pdf
path/to/allfacstaff/data/pdf/account/Top of Outlook data file/%2AImportant%2A/3.pdf
path/to/allfacstaff/data/pdf/account/Top of Outlook data file/Trash/4.pdf
path/to/otherFile.docx
mailbagit could potentially ignore "Top of Outlook data file," but it is unclear if this is consistent across all PST files. This top folder may also contain messages.
Command
mailbagit path/to/parentDirectory -m allfacstaff -i pst -d pdf
Input
path/to/parentDirectory/faculty/Dave.pst
path/to/parentDirectory/faculty/Fatima.pst
path/to/parentDirectory/staff/Eric.pst
path/to/parentDirectory/staff/save.pst
path/to/parentDirectory/otherFile.docx
Metadata Examples
- message1
- Original-File: faculty/Dave.pst
- Message-Path: Top of Outlook data file/Inbox
- Derivatives-Path: faculty/Dave/Top of Outlook data file/Inbox
- message2
- Original-File: faculty/Dave.pst
- Message-Path: Top of Outlook data file/Inbox
- Derivatives-Path: faculty/Dave/Top of Outlook data file/Inbox
- message3
- Original-File: faculty/Fatima.pst
- Message-Path: Top of Outlook data file/Inbox
- Derivatives-Path: faculty/Fatima/Top of Outlook data file/Inbox
- message4
- Original-File: staff/Eric.pst
- Message-Path: Top of Outlook data file/Inbox
- Derivatives-Path: faculty/Eric/Top of Outlook data file/Inbox
- message5
- Original-File: staff/save.pst
- Message-Path: Top of Outlook data file/Save
- Derivatives-Path: faculty/Dave/Top of Outlook data file/Inbox
Output
path/to/parentDirectory/allfacstaff/bagit.txt
path/to/parentDirectory/allfacstaff/bagit.txt
path/to/parentDirectory/allfacstaff/mailbag.csv
...
path/to/parentDirectory/allfacstaff/data/pst/faculty/Dave.pst
path/to/parentDirectory/allfacstaff/data/pst/faculty/Fatima.pst
path/to/parentDirectory/allfacstaff/data/pst/staff/Eric.pst
path/to/parentDirectory/allfacstaff/data/pst/staff/save.pst
path/to/parentDirectory/allfacstaff/data/pdf/faculty/Dave/Top of Outlook data file/Inbox/1.pdf
path/to/parentDirectory/allfacstaff/data/pdf/faculty/Dave/Top of Outlook data file/Inbox/Listservs/2.pdf
path/to/parentDirectory/allfacstaff/data/pdf/faculty/Fatima/Top of Outlook data file/Inbox/3.pdf
path/to/parentDirectory/allfacstaff/data/pdf/staff/Eric/Top of Outlook data file/Inbox/4.pdf
path/to/parentDirectory/allfacstaff/data/pdf/staff/save/Top of Outlook data file/Save/5.pdf
path/to/parentDirectory/otherFile.docx
Command
mailbagit path/to/parentDirectory -m account_export -i mbox -d pdf
Input
path/to/parentDirectory/Inbox.mbox
path/to/parentDirectory/Inbox/Listservs.mbox
path/to/parentDirectory/Trash.mbox
path/to/parentDirectory/otherFile.docx
ignores all files without .mbox extension (not case sensitive)
Metadata Examples
- message1
- Original-File: Inbox.mbox
- Message-Path: Inbox
- Derivatives-Path: Inbox/Inbox
- message2
- Original-File: Inbox.mbox
- Message-Path: Inbox
- Derivatives-Path: Inbox/Inbox
- message3
- Original-File: Inbox/Listservs.mbox
- Message-Path: Inbox/Listservs
- Derivatives-Path: Inbox/Inbox
- message4
- Original-File: Inbox/Listservs.mbox
- Message-Path: Inbox/Listservs
- Derivatives-Path: Inbox/Listservs/Inbox/Listservs
- message5
- Original-File: Trash.mbox
- Message-Path: Trash
- Derivatives-Path: Trash/Trash
in this case, the Message-Path was extracted from the X-Folder header and is duplicative of the MBOX as arranged on disk
Output
path/to/parentDirectory/account_export/bagit.txt
path/to/parentDirectory/account_export/bagit.txt
path/to/parentDirectory/account_export/mailbag.csv
...
path/to/parentDirectory/account_export/data/mbox/Inbox/Inbox.mbox
path/to/parentDirectory/account_export/data/mbox/Inbox/Listservs/Inbox/Listservs.mbox
path/to/parentDirectory/account_export/data/mbox/Trash/Trash.mbox
path/to/parentDirectory/account_export/data/eml/Inbox/Inbox/1.eml
path/to/parentDirectory/account_export/data/eml/Inbox/Inbox/2.eml
path/to/parentDirectory/account_export/data/eml/Inbox/Listservs/Inbox/Listservs/3.eml
path/to/parentDirectory/account_export/data/eml/Inbox/Listservs/Inbox/Listservs/4.eml
path/to/parentDirectory/account_export/data/eml/Trash/Trash/5.eml
path/to/parentDirectory/account_export/data/pdf/Inbox/Inbox/1.pdf
path/to/parentDirectory/account_export/data/pdf/Inbox/Inbox/2.pdf
path/to/parentDirectory/account_export/data/pdf/Inbox/Listservs/Inbox/Listservs/3.pdf
path/to/parentDirectory/account_export/data/pdf/Inbox/Listservs/Inbox/Listservs/4.pdf
path/to/parentDirectory/account_export/data/pdf/Trash/Trash/5.pdf
path/to/parentDirectory/otherFile.docx
Command
mailbagit path/to/parentDirectory -m faculty -i mbox -d pdf
Input
path/to/parentDirectory/Dave/inbox.mbox
path/to/parentDirectory/Dave/trash.mbox
path/to/parentDirectory/Fatima/inbox.mbox
path/to/parentDirectory/Fatima/listservs.mbox
Metadata Examples
- message1
- Original-File: Dave/inbox.mbox
- Message-Path: Inbox
- Derivatives-Path: Dave/inbox/Inbox
- message2
- Original-File: Dave/inbox.mbox
- Message-Path: Inbox
- Derivatives-Path: Dave/inbox/Inbox
- message3
- Original-File: Dave/trash.mbox
- Message-Path: Trash
- Derivatives-Path: Dave/trash/Trash
- message4
- Original-File: Fatima/inbox.mbox
- Message-Path: Inbox
- Derivatives-Path: Fatima/inbox/Inbox
- message5
- Original-File: Fatima/listservs.mbox
- Message-Path: Inbox/Listservs
- Derivatives-Path: Fatima/listservs/Inbox/Listservs
Output
path/to/parentDirectory/faculty/bagit.txt
path/to/parentDirectory/faculty/bagit.txt
path/to/parentDirectory/faculty/mailbag.csv
...
path/to/parentDirectory/faculty/data/mbox/Dave/inbox.mbox
path/to/parentDirectory/faculty/data/mbox/Dave/trash.mbox
path/to/parentDirectory/faculty/data/mbox/Fatima/inbox.mbox
path/to/parentDirectory/faculty/data/mbox/Fatima/listservs.mbox
path/to/parentDirectory/faculty/data/eml/Dave/inbox/Inbox/1.eml
path/to/parentDirectory/faculty/data/eml/Dave/inbox/Inbox/2.eml
path/to/parentDirectory/faculty/data/eml/Dave/trash/Inbox/3.eml
path/to/parentDirectory/faculty/data/eml/Fatima/inbox/Inbox/4.eml
path/to/parentDirectory/faculty/data/eml/Fatima/listservs/Inbox/Listservs/5.eml
path/to/parentDirectory/faculty/data/pdf/Dave/inbox/Inbox/1.pdf
path/to/parentDirectory/faculty/data/pdf/Dave/inbox/Inbox/2.pdf
path/to/parentDirectory/faculty/data/pdf/Dave/trash/Inbox/3.pdf
path/to/parentDirectory/faculty/data/pdf/Fatima/inbox/Inbox/4.pdf
path/to/parentDirectory/faculty/data/pdf/Fatima/listservs/Inbox/Listservs/5.pdf
Command
mailbagit path/to/All mail Including Spam and Trash.mbox -m account -i mbox -d pdf
Input
path/to/All mail Including Spam and Trash.mbox
path/to/otherFile.docx
ignores all files without .mbox extension (not case sensitive)
Metadata Examples
- message1
- Original-File: All mail Including Spam and Trash.mbox
- Message-Path: Inbox
- Derivatives-Path: All mail Including Spam and Trash/Inbox
- message2
- Original-File: All mail Including Spam and Trash.mbox
- Message-Path: Inbox
- Derivatives-Path: All mail Including Spam and Trash/Inbox
- message3
- Original-File: All mail Including Spam and Trash.mbox
- Message-Path: Inbox
- Derivatives-Path: All mail Including Spam and Trash/Inbox
- message4
- Original-File: All mail Including Spam and Trash.mbox
- Message-Path: Trash
- Derivatives-Path: All mail Including Spam and Trash/Trash
Output
path/to/account/bagit.txt
path/to/account/bagit.txt
path/to/account/mailbag.csv
...
path/to/account/data/mbox/All mail Including Spam and Trash.mbox
path/to/account/data/eml/All mail Including Spam and Trash/1.eml
path/to/account/data/eml/All mail Including Spam and Trash/2.eml
path/to/account/data/eml/All mail Including Spam and Trash/3.eml
path/to/account/data/eml/All mail Including Spam and Trash/4.eml
path/to/account/data/pdf/All mail Including Spam and Trash/1.pdf
path/to/account/data/pdf/All mail Including Spam and Trash/2.pdf
path/to/account/data/pdf/All mail Including Spam and Trash/3.pdf
path/to/account/data/pdf/All mail Including Spam and Trash/4.pdf
This case is a Gmail export example. Here the .mbox is a flat file even though it contains multiple folders. The folder of a message is only documented by using a custom Gmail-specific X-Gmail-Labels
header.
Command
mailbagit path/to/parentDirectory -m allfacstaff -i msg -d pdf
Input
path/to/parentDirectory/Inbox/message1.msg
path/to/parentDirectory/Inbox/message2.msg
path/to/parentDirectory/Inbox/Listservs/message1.msg
path/to/parentDirectory/Inbox/Listservs/message2.msg
path/to/parentDirectory/Inbox/Listservs/Code4Lib/message1.msg
path/to/parentDirectory/Inbox/Listservs/Code4Lib/message2.msg
path/to/parentDirectory/Trash/spam.msg
path/to/parentDirectory/otherFile.docx
ignores all files without .msg extension (not case sensitive) and directories that do not contain .msg files
Metadata Examples
- message1
- Original-File: Inbox/message1.msg
- Message-Path:
- Derivatives-Path: Inbox
- message2
- Original-File: Inbox/message2.msg
- Message-Path:
- Derivatives-Path: Inbox
- message3
- Original-File: Inbox/Listservs/message1.msg
- Message-Path:
- Derivatives-Path: Inbox/Listservs
- message4
- Original-File: Inbox/Listservs/message2.msg
- Message-Path:
- Derivatives-Path: Inbox/Listservs
- message5
- Original-File: Inbox/Listservs/Code4Lib/message1.msg
- Message-Path:
- Derivatives-Path: Inbox/Listservs/Code4Lib
- message6
- Original-File: Inbox/Listservs/Code4Lib/message2.msg
- Message-Path:
- Derivatives-Path: Inbox/Listservs/Code4Lib
- message7
- Original-File: Trash/spam.msg
- Message-Path:
- Derivatives-Path: Trash
msg files seem to rarely contain arrangement information in headers.
Output
parentDirectory/allfacstaff/bagit.txt
parentDirectory/allfacstaff/mailbag.csv
...
parentDirectory/allfacstaff/data/msg/Inbox/message1.msg
parentDirectory/allfacstaff/data/msg/Inbox/message2.msg
parentDirectory/allfacstaff/data/msg/Inbox/Listservs/message1.msg
parentDirectory/allfacstaff/data/msg/Inbox/Listservs/message2.msg
parentDirectory/allfacstaff/data/msg/Inbox/Listservs/Code4Lib/message1.msg
parentDirectory/allfacstaff/data/msg/Inbox/Listservs/Code4Lib/message2.msg
parentDirectory/allfacstaff/data/msg/Trash/spam.eml
parentDirectory/allfacstaff/data/pdf/Inbox/1.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/2.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/3.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/4.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/Code4Lib/5.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/Code4Lib/6.pdf
parentDirectory/allfacstaff/data/pdf/Trash/7.pdf
Problem: What to do if a directory contains both .msg and other files?
path/to/parentDirectory/Inbox/message1.msg
path/to/parentDirectory/Inbox/message2.msg
--> path/to/parentDirectory/Inbox/otherFile.docx
path/to/parentDirectory/Inbox/Listservs/message1.msg
path/to/parentDirectory/Inbox/Listservs/message2.msg
path/to/parentDirectory/Inbox/Listservs/Code4Lib/message1.msg
path/to/parentDirectory/Inbox/Listservs/Code4Lib/message2.msg
path/to/parentDirectory/Trash/spam.msg
Right now, the option is to include these files in the mailbag. This could be an issue if a uses tries to package a directory of, say, emls and another email format. In these cases the other formats will be included in the mailbag, but won't be documented in mailbag.csv
or used to create any derivative files.
Command
mailbagit path/to/parentDirectory -m allfacstaff -i eml -d pdf
Input
path/to/parentDirectory/message1.eml
path/to/parentDirectory/message2.eml
path/to/parentDirectory/message3.eml
path/to/parentDirectory/message4.eml
path/to/parentDirectory/message5.eml
path/to/parentDirectory/message6.eml
path/to/parentDirectory/message7.eml
path/to/parentDirectory/otherFile.docx
ignores all files without .eml extension (not case sensitive) and directories that do not contain .eml files
Metadata Examples
- message1
- Original-File: message1.eml
- Message-Path: Inbox
- Derivatives-Path: Inbox
- message2
- Original-File: message2.eml
- Message-Path: Inbox
- Derivatives-Path: Inbox
- message3
- Original-File: message3.eml
- Message-Path: Inbox/Listservs
- Derivatives-Path: Inbox/Listservs
- message4
- Original-File: message4.eml
- Message-Path: Inbox/Listservs
- Derivatives-Path: Inbox/Listservs
- message5
- Original-File: message5.eml
- Message-Path: Inbox/Listservs/Code4Lib
- Derivatives-Path: Inbox/Listservs/Code4Lib
- message6
- Original-File: message6.eml
- Message-Path: Inbox/Listservs/Code4Lib
- Derivatives-Path: Inbox/Listservs/Code4Lib
- message7
- Original-File: message7.eml
- Message-Path: Trash
- Derivatives-Path: Trash
In this example, the Message-Path was read using the X-Folder header in each EML file.
Output
parentDirectory/allfacstaff/bagit.txt
parentDirectory/allfacstaff/mailbag.csv
...
parentDirectory/allfacstaff/data/eml/message1.eml
parentDirectory/allfacstaff/data/eml/message2.eml
parentDirectory/allfacstaff/data/eml/message3.eml
parentDirectory/allfacstaff/data/eml/message4.eml
parentDirectory/allfacstaff/data/eml/message5.eml
parentDirectory/allfacstaff/data/eml/message6.eml
parentDirectory/allfacstaff/data/eml/message7.eml
parentDirectory/allfacstaff/data/pdf/Inbox/1.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/2.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/3.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/4.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/Code4Lib/5.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/Code4Lib/6.pdf
parentDirectory/allfacstaff/data/pdf/Trash/7.pdf