Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Aleph doesn't process all .msg email formats correctly #3733

Open
brrttwrks opened this issue May 13, 2024 · 0 comments
Open

BUG: Aleph doesn't process all .msg email formats correctly #3733

brrttwrks opened this issue May 13, 2024 · 0 comments
Labels
bug Things that should work, but don’t Major issue that requires attention

Comments

@brrttwrks
Copy link

Describe the bug
.msg email file format has had several versions and it seems that Aleph doesn't parse all of them correctly. This leads to us needing to convert them to eml format before ingesting into Aleph. The tool I've been using to convert the msg emails is msgconvert (https://www.matijs.net/software/msgconv/) The current state is problematic as Aleph gives the perception that it does process them, but some might be processed correctly and some seem to only show parts of the body of the email and none of the attachments. If it is possible to detect the different versions and parse them accordingly, then we wouldn't necessarily need to pre-process them and journalists wouldn't be surprised by the results.

To Reproduce
Steps to reproduce the behavior:

  1. Will share with you separately as the only examples I have are sensitive.

Expected behavior
All msg versions get parsed and ingested properly in Aleph.

Aleph version
4.0.0rc1

Screenshots
Cannot share.

Additional context
None

@brrttwrks brrttwrks added bug Things that should work, but don’t triage These issues need to be reviewed by the Aleph team labels May 13, 2024
@Rosencrantz Rosencrantz added Major issue that requires attention and removed triage These issues need to be reviewed by the Aleph team labels Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Things that should work, but don’t Major issue that requires attention
Projects
None yet
Development

No branches or pull requests

3 participants