Skip to content

Mailbagit Input Output Examples

Gregory Wiedeman edited this page Mar 24, 2022 · 35 revisions

Mailbag Input

mailbagit uses the bagit-python directory input, but accepts both paths to individual files and directories:

mailbagit path/to/account --mailbag-name account -i eml -d pdf
mailbagit path/to/account.pst --mailbag-name account -i pst -d eml pdf
mailbagit path/to/email.msg --mailbag-name email -i msg -d pdf
mailbagit path/to/account.mbox -m account -i mbox -d eml 
mailbagit path/to/email -m email -i mbox -d pdf

bagit-python supports multiple directory input arguments:

bagit.py path/to/stuff path/to/more/stuff

Unlike bagit-python, mailbagit will only support one input path argument. If you give it more it will raise an error:

mailbagit path/to/mail.mbox path/to/more.mbox -m account -i mbox -d eml
>  2022-01-27 13:39.05 [error    ] Multiple input paths provided. Mailbagit only supports single input paths. You may want to try providing a directory of email or running the command multiple times to create multiple mailbags.

Original-File, Message-Path, and Derivatives-Path

A mailbag is not opinionated on if you are packaging a single or multiple email accounts, but it does try to maintain the arrangement structure of email present in either the input's directory structure and/or filenames, X-Folder, X-Gmail-Labels, or other header fields, or included within a PST file. This may be problematic, as the the arrangement of a directory of email data may or may not define the intellectual arrangement of the email. A Mailbag merely maintains this information in mailbag.csv fields and a human must infer any arrangement based on context.

A Mailbag uses three fields to maintain this information, File-Path, Original-Filename, and Message-Path. And additional Message-Path-Escaped is also used in the case that the Message-Path field is incompatible with the filesystem used during packaging.

  • Original-File
    • The relative path to any email source file within a mailbag.
  • Message-Path
    • Any email folder structure read from PST directory structure or email headers like X-Folder or X-Gmail-Label.
  • Derivatives-Path
    • The relative path within a mailbag used to write derivatives within a mailbag. This is a join of Original-File without the extension and Message-Path. Any characters that are invalid in the packaging filesystem need to be escaped here.

Account-level files (pst)

Example 1

Command

mailbagit path/to/account.pst --mailbag-name allfacstaff --input pst --derivatives eml pdf

Input

path/to/account.pst
path/to/otherFile.docx

ignores all files without .pst extension (not case sensitive)

Metadata Examples

  • message1
    • Original-File: account.pst
    • Message-Path: Top of Outlook data file/Inbox
    • Derivatives-Path: account/Top of Outlook data file/Inbox
  • message2
    • Original-File: account.pst
    • Message-Path: Top of Outlook data file/Inbox
    • Derivatives-Path: account/Top of Outlook data file/Inbox
  • message3
    • Original-File: account.pst
    • Message-Path: Top of Outlook data file/*Important*
    • Derivatives-Path: account/Top of Outlook data file/%2AImportant%2A
  • message4
    • Original-File: account.pst
    • Message-Path: Top of Outlook data file/Trash
    • Derivatives-Path: account/Top of Outlook data file/Trash

Output

path/to/allfacstaff/bagit.txt
path/to/allfacstaff/bagit.txt
path/to/allfacstaff/mailbag.csv
...
path/to/allfacstaff/data/pst/account.pst
path/to/allfacstaff/data/eml/account/Top of Outlook data file/Inbox/1.eml
path/to/allfacstaff/data/eml/account/Top of Outlook data file/Inbox/2.eml
path/to/allfacstaff/data/eml/account/Top of Outlook data file/%2AImportant%2A/3.eml
path/to/allfacstaff/data/eml/account/Top of Outlook data file/Trash/4.eml
path/to/allfacstaff/data/pdf/account/Top of Outlook data file/Inbox/1.pdf
path/to/allfacstaff/data/pdf/account/Top of Outlook data file/Inbox/2.pdf
path/to/allfacstaff/data/pdf/account/Top of Outlook data file/%2AImportant%2A/3.pdf
path/to/allfacstaff/data/pdf/account/Top of Outlook data file/Trash/4.pdf
path/to/otherFile.docx

mailbagit could potentially ignore "Top of Outlook data file," but it is unclear if this is consistent across all PST files. This top folder may also contain messages.

Example 2

Command

mailbagit path/to/parentDirectory -m allfacstaff -i pst -d pdf

Input

path/to/parentDirectory/faculty/Dave.pst
path/to/parentDirectory/faculty/Fatima.pst
path/to/parentDirectory/staff/Eric.pst
path/to/parentDirectory/staff/save.pst
path/to/parentDirectory/otherFile.docx

Metadata Examples

  • message1
    • Original-File: faculty/Dave.pst
    • Message-Path: Top of Outlook data file/Inbox
    • Derivatives-Path: faculty/Dave/Top of Outlook data file/Inbox
  • message2
    • Original-File: faculty/Dave.pst
    • Message-Path: Top of Outlook data file/Inbox
    • Derivatives-Path: faculty/Dave/Top of Outlook data file/Inbox
  • message3
    • Original-File: faculty/Fatima.pst
    • Message-Path: Top of Outlook data file/Inbox
    • Derivatives-Path: faculty/Fatima/Top of Outlook data file/Inbox
  • message4
    • Original-File: staff/Eric.pst
    • Message-Path: Top of Outlook data file/Inbox
    • Derivatives-Path: faculty/Eric/Top of Outlook data file/Inbox
  • message5
    • Original-File: staff/save.pst
    • Message-Path: Top of Outlook data file/Save
    • Derivatives-Path: faculty/Dave/Top of Outlook data file/Inbox

Output

path/to/parentDirectory/allfacstaff/bagit.txt
path/to/parentDirectory/allfacstaff/bagit.txt
path/to/parentDirectory/allfacstaff/mailbag.csv
...
path/to/parentDirectory/allfacstaff/data/pst/faculty/Dave.pst
path/to/parentDirectory/allfacstaff/data/pst/faculty/Fatima.pst
path/to/parentDirectory/allfacstaff/data/pst/staff/Eric.pst
path/to/parentDirectory/allfacstaff/data/pst/staff/save.pst
path/to/parentDirectory/allfacstaff/data/pdf/faculty/Dave/Top of Outlook data file/Inbox/1.pdf
path/to/parentDirectory/allfacstaff/data/pdf/faculty/Dave/Top of Outlook data file/Inbox/Listservs/2.pdf
path/to/parentDirectory/allfacstaff/data/pdf/faculty/Fatima/Top of Outlook data file/Inbox/3.pdf
path/to/parentDirectory/allfacstaff/data/pdf/staff/Eric/Top of Outlook data file/Inbox/4.pdf
path/to/parentDirectory/allfacstaff/data/pdf/staff/save/Top of Outlook data file/Save/5.pdf
path/to/parentDirectory/otherFile.docx

Folder-level files (mbox)

Example 3

Command

mailbagit path/to/parentDirectory -m account_export -i mbox -d pdf

Input

path/to/parentDirectory/Inbox.mbox
path/to/parentDirectory/Inbox/Listservs.mbox
path/to/parentDirectory/Trash.mbox
path/to/parentDirectory/otherFile.docx

ignores all files without .mbox extension (not case sensitive)

Metadata Examples

  • message1
    • Original-File: Inbox.mbox
    • Message-Path: Inbox
    • Derivatives-Path: Inbox/Inbox
  • message2
    • Original-File: Inbox.mbox
    • Message-Path: Inbox
    • Derivatives-Path: Inbox/Inbox
  • message3
    • Original-File: Inbox/Listservs.mbox
    • Message-Path: Inbox/Listservs
    • Derivatives-Path: Inbox/Inbox
  • message4
    • Original-File: Inbox/Listservs.mbox
    • Message-Path: Inbox/Listservs
    • Derivatives-Path: Inbox/Listservs/Inbox/Listservs
  • message5
    • Original-File: Trash.mbox
    • Message-Path: Trash
    • Derivatives-Path: Trash/Trash

in this case, the Message-Path was extracted from the X-Folder header and is duplicative of the MBOX as arranged on disk

Output

path/to/parentDirectory/account_export/bagit.txt
path/to/parentDirectory/account_export/bagit.txt
path/to/parentDirectory/account_export/mailbag.csv
...
path/to/parentDirectory/account_export/data/mbox/Inbox/Inbox.mbox
path/to/parentDirectory/account_export/data/mbox/Inbox/Listservs/Inbox/Listservs.mbox
path/to/parentDirectory/account_export/data/mbox/Trash/Trash.mbox
path/to/parentDirectory/account_export/data/eml/Inbox/Inbox/1.eml
path/to/parentDirectory/account_export/data/eml/Inbox/Inbox/2.eml
path/to/parentDirectory/account_export/data/eml/Inbox/Listservs/Inbox/Listservs/3.eml
path/to/parentDirectory/account_export/data/eml/Inbox/Listservs/Inbox/Listservs/4.eml
path/to/parentDirectory/account_export/data/eml/Trash/Trash/5.eml
path/to/parentDirectory/account_export/data/pdf/Inbox/Inbox/1.pdf
path/to/parentDirectory/account_export/data/pdf/Inbox/Inbox/2.pdf
path/to/parentDirectory/account_export/data/pdf/Inbox/Listservs/Inbox/Listservs/3.pdf
path/to/parentDirectory/account_export/data/pdf/Inbox/Listservs/Inbox/Listservs/4.pdf
path/to/parentDirectory/account_export/data/pdf/Trash/Trash/5.pdf
path/to/parentDirectory/otherFile.docx

Example 4

Command

mailbagit path/to/parentDirectory -m faculty -i mbox -d pdf

Input

path/to/parentDirectory/Dave/inbox.mbox
path/to/parentDirectory/Dave/trash.mbox
path/to/parentDirectory/Fatima/inbox.mbox
path/to/parentDirectory/Fatima/listservs.mbox

Metadata Examples

  • message1
    • Original-File: Dave/inbox.mbox
    • Message-Path: Inbox
    • Derivatives-Path: Dave/inbox/Inbox
  • message2
    • Original-File: Dave/inbox.mbox
    • Message-Path: Inbox
    • Derivatives-Path: Dave/inbox/Inbox
  • message3
    • Original-File: Dave/trash.mbox
    • Message-Path: Trash
    • Derivatives-Path: Dave/trash/Trash
  • message4
    • Original-File: Fatima/inbox.mbox
    • Message-Path: Inbox
    • Derivatives-Path: Fatima/inbox/Inbox
  • message5
    • Original-File: Fatima/listservs.mbox
    • Message-Path: Inbox/Listservs
    • Derivatives-Path: Fatima/listservs/Inbox/Listservs

Output

path/to/parentDirectory/faculty/bagit.txt
path/to/parentDirectory/faculty/bagit.txt
path/to/parentDirectory/faculty/mailbag.csv
...
path/to/parentDirectory/faculty/data/mbox/Dave/inbox.mbox
path/to/parentDirectory/faculty/data/mbox/Dave/trash.mbox
path/to/parentDirectory/faculty/data/mbox/Fatima/inbox.mbox
path/to/parentDirectory/faculty/data/mbox/Fatima/listservs.mbox
path/to/parentDirectory/faculty/data/eml/Dave/inbox/Inbox/1.eml
path/to/parentDirectory/faculty/data/eml/Dave/inbox/Inbox/2.eml
path/to/parentDirectory/faculty/data/eml/Dave/trash/Inbox/3.eml
path/to/parentDirectory/faculty/data/eml/Fatima/inbox/Inbox/4.eml
path/to/parentDirectory/faculty/data/eml/Fatima/listservs/Inbox/Listservs/5.eml
path/to/parentDirectory/faculty/data/pdf/Dave/inbox/Inbox/1.pdf
path/to/parentDirectory/faculty/data/pdf/Dave/inbox/Inbox/2.pdf
path/to/parentDirectory/faculty/data/pdf/Dave/trash/Inbox/3.pdf
path/to/parentDirectory/faculty/data/pdf/Fatima/inbox/Inbox/4.pdf
path/to/parentDirectory/faculty/data/pdf/Fatima/listservs/Inbox/Listservs/5.pdf

Example 5

Command

mailbagit path/to/All mail Including Spam and Trash.mbox -m account -i mbox -d pdf

Input

path/to/All mail Including Spam and Trash.mbox
path/to/otherFile.docx

ignores all files without .mbox extension (not case sensitive)

Metadata Examples

  • message1
    • Original-File: All mail Including Spam and Trash.mbox
    • Message-Path: Inbox
    • Derivatives-Path: All mail Including Spam and Trash/Inbox
  • message2
    • Original-File: All mail Including Spam and Trash.mbox
    • Message-Path: Inbox
    • Derivatives-Path: All mail Including Spam and Trash/Inbox
  • message3
    • Original-File: All mail Including Spam and Trash.mbox
    • Message-Path: Inbox
    • Derivatives-Path: All mail Including Spam and Trash/Inbox
  • message4
    • Original-File: All mail Including Spam and Trash.mbox
    • Message-Path: Trash
    • Derivatives-Path: All mail Including Spam and Trash/Trash

Output

path/to/account/bagit.txt
path/to/account/bagit.txt
path/to/account/mailbag.csv
...
path/to/account/data/mbox/All mail Including Spam and Trash.mbox
path/to/account/data/eml/All mail Including Spam and Trash/1.eml
path/to/account/data/eml/All mail Including Spam and Trash/2.eml
path/to/account/data/eml/All mail Including Spam and Trash/3.eml
path/to/account/data/eml/All mail Including Spam and Trash/4.eml
path/to/account/data/pdf/All mail Including Spam and Trash/1.pdf
path/to/account/data/pdf/All mail Including Spam and Trash/2.pdf
path/to/account/data/pdf/All mail Including Spam and Trash/3.pdf
path/to/account/data/pdf/All mail Including Spam and Trash/4.pdf

This case is a Gmail export example. Here the .mbox is a flat file even though it contains multiple folders. The folder of a message is only documented by using a custom Gmail-specific X-Gmail-Labels header.

Message-level files (eml, msg)

Example 6

Command

mailbagit path/to/parentDirectory -m allfacstaff -i msg -d pdf

Input

path/to/parentDirectory/Inbox/message1.msg
path/to/parentDirectory/Inbox/message2.msg
path/to/parentDirectory/Inbox/Listservs/message1.msg
path/to/parentDirectory/Inbox/Listservs/message2.msg
path/to/parentDirectory/Inbox/Listservs/Code4Lib/message1.msg
path/to/parentDirectory/Inbox/Listservs/Code4Lib/message2.msg
path/to/parentDirectory/Trash/spam.msg
path/to/parentDirectory/otherFile.docx

ignores all files without .msg extension (not case sensitive) and directories that do not contain .msg files

Metadata Examples

  • message1
    • Original-File: Inbox/message1.msg
    • Message-Path:
    • Derivatives-Path: Inbox
  • message2
    • Original-File: Inbox/message2.msg
    • Message-Path:
    • Derivatives-Path: Inbox
  • message3
    • Original-File: Inbox/Listservs/message1.msg
    • Message-Path:
    • Derivatives-Path: Inbox/Listservs
  • message4
    • Original-File: Inbox/Listservs/message2.msg
    • Message-Path:
    • Derivatives-Path: Inbox/Listservs
  • message5
    • Original-File: Inbox/Listservs/Code4Lib/message1.msg
    • Message-Path:
    • Derivatives-Path: Inbox/Listservs/Code4Lib
  • message6
    • Original-File: Inbox/Listservs/Code4Lib/message2.msg
    • Message-Path:
    • Derivatives-Path: Inbox/Listservs/Code4Lib
  • message7
    • Original-File: Trash/spam.msg
    • Message-Path:
    • Derivatives-Path: Trash

msg files seem to rarely contain arrangement information in headers.

Output

parentDirectory/allfacstaff/bagit.txt
parentDirectory/allfacstaff/mailbag.csv
...
parentDirectory/allfacstaff/data/msg/Inbox/message1.msg
parentDirectory/allfacstaff/data/msg/Inbox/message2.msg
parentDirectory/allfacstaff/data/msg/Inbox/Listservs/message1.msg
parentDirectory/allfacstaff/data/msg/Inbox/Listservs/message2.msg
parentDirectory/allfacstaff/data/msg/Inbox/Listservs/Code4Lib/message1.msg
parentDirectory/allfacstaff/data/msg/Inbox/Listservs/Code4Lib/message2.msg
parentDirectory/allfacstaff/data/msg/Trash/spam.eml
parentDirectory/allfacstaff/data/pdf/Inbox/1.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/2.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/3.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/4.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/Code4Lib/5.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/Code4Lib/6.pdf
parentDirectory/allfacstaff/data/pdf/Trash/7.pdf

Problem: What to do if a directory contains both .msg and other files?

    path/to/parentDirectory/Inbox/message1.msg 
    path/to/parentDirectory/Inbox/message2.msg 
--> path/to/parentDirectory/Inbox/otherFile.docx
    path/to/parentDirectory/Inbox/Listservs/message1.msg 
    path/to/parentDirectory/Inbox/Listservs/message2.msg 
    path/to/parentDirectory/Inbox/Listservs/Code4Lib/message1.msg 
    path/to/parentDirectory/Inbox/Listservs/Code4Lib/message2.msg 
    path/to/parentDirectory/Trash/spam.msg 

Right now, the option is to include these files in the mailbag. This could be an issue if a uses tries to package a directory of, say, emls and another email format. In these cases the other formats will be included in the mailbag, but won't be documented in mailbag.csv or used to create any derivative files.

Example 7

Command

mailbagit path/to/parentDirectory -m allfacstaff -i eml -d pdf

Input

path/to/parentDirectory/message1.eml
path/to/parentDirectory/message2.eml
path/to/parentDirectory/message3.eml
path/to/parentDirectory/message4.eml
path/to/parentDirectory/message5.eml
path/to/parentDirectory/message6.eml
path/to/parentDirectory/message7.eml
path/to/parentDirectory/otherFile.docx

ignores all files without .eml extension (not case sensitive) and directories that do not contain .eml files

Metadata Examples

  • message1
    • Original-File: message1.eml
    • Message-Path: Inbox
    • Derivatives-Path: Inbox
  • message2
    • Original-File: message2.eml
    • Message-Path: Inbox
    • Derivatives-Path: Inbox
  • message3
    • Original-File: message3.eml
    • Message-Path: Inbox/Listservs
    • Derivatives-Path: Inbox/Listservs
  • message4
    • Original-File: message4.eml
    • Message-Path: Inbox/Listservs
    • Derivatives-Path: Inbox/Listservs
  • message5
    • Original-File: message5.eml
    • Message-Path: Inbox/Listservs/Code4Lib
    • Derivatives-Path: Inbox/Listservs/Code4Lib
  • message6
    • Original-File: message6.eml
    • Message-Path: Inbox/Listservs/Code4Lib
    • Derivatives-Path: Inbox/Listservs/Code4Lib
  • message7
    • Original-File: message7.eml
    • Message-Path: Trash
    • Derivatives-Path: Trash

In this example, the Message-Path was read using the X-Folder header in each EML file.

Output

parentDirectory/allfacstaff/bagit.txt
parentDirectory/allfacstaff/mailbag.csv
...
parentDirectory/allfacstaff/data/eml/message1.eml
parentDirectory/allfacstaff/data/eml/message2.eml
parentDirectory/allfacstaff/data/eml/message3.eml
parentDirectory/allfacstaff/data/eml/message4.eml
parentDirectory/allfacstaff/data/eml/message5.eml
parentDirectory/allfacstaff/data/eml/message6.eml
parentDirectory/allfacstaff/data/eml/message7.eml
parentDirectory/allfacstaff/data/pdf/Inbox/1.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/2.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/3.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/4.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/Code4Lib/5.pdf
parentDirectory/allfacstaff/data/pdf/Inbox/Listservs/Code4Lib/6.pdf
parentDirectory/allfacstaff/data/pdf/Trash/7.pdf