Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smscalls: parse mms from smscalls export #370

Merged
merged 8 commits into from
Jun 5, 2024

Conversation

seanbreckenridge
Copy link
Contributor

@seanbreckenridge seanbreckenridge commented May 31, 2024

parses all the MMS file/content parts, left comments alongside me exploring the data object

want to actually try consuming the output of this somewhere before I think it should be merged

also, there are some items parsed as MMS which are actually just text (like, theyre SMIL (XML-format) containers which contain a single text/plain file), with no pictures attached. So, its just a text thats stored in a MMS container, in an ideal world we can filter those out and add them to messages? But may not always work perfectly

So, we should probably at least note that mms can have text items, or... maybe combine them/have some helper .property methods on the MMS that give you nicer filtering/output?

will post some examples when I'm playing with this, gonna go sleep now

$ hpi doctor -S my.smscalls
✅ OK  : my.smscalls                                      
✅     - stats: {..... 'mms': {'count': 2491, 'first': datetime.datetime(2019, 12, 18, 4, 22, 46, tzinfo=datetime.timezone.utc), 'last': datetime.datetime(2024, 5, 29, 1, 31, 19, tzinfo=datetime.timezone.utc)}}

@seanbreckenridge
Copy link
Contributor Author

seanbreckenridge commented May 31, 2024

Probably instead of adding it to messages, we can just add helpers...? If there are multiple content parts, pretty common case is for there to be one text/plain and multiple image/jpeg attached.

Maybe a property that checks if theres only one part and its text, with no other parts attached

it might be nice to be able to handle the common mime types as well, like parsing images into PIL images

heres what I have:

$ hpi query my.smscalls.mms -s | jq '.content.[].content_type' -r | tally
      1 text/x-vcard
      4 video/3gpp
      5 image/gif
      7 text/x-vCard
     43 image/png
    429 image/jpeg
   2405 text/plain

Other todos:

  • Want to see if I can decode the target for the translated 'Liked an image'/'Liked a message' Apple stuff
  • need to check on the emitted key, I just copied the message one, not sure how accurate it is

@seanbreckenridge
Copy link
Contributor Author

On the Liked an Image stuff being transcribed, looks like a probably not

seems that stuff often breaks when things are re-imported or when you move sim cards, so its likely some translation happening in-app and its not saved perfectly in an export

Was looking to see if I could match any ids like in here:

<mms date="1648436193000" rr="129" sub="null" ct_t="application/vnd.wap.multipart.mixed" read_status="null" seen="1" msg_box="1" address="<REDACTED>" sub_cs="null" resp_st="null" retr_st="128" d_tm="null" text_only="1" exp="null" locked="0" m_id="mavodi-6-89-1e8-8-ba-628c370c-7d583aca2b" st="null" retr_txt_cs="null" retr_txt="null" creator="com.google.android.apps.messaging" date_sent="1648436191" read="1" m_size="243" rpt_a="null" ct_cls="null" pri="null" sub_id="2" tr_id="null" resp_txt="null" ct_l="<REDACTED>" m_cls="null" d_rpt="129" v="18" _id="740" m_type="132" readable_date="Mar 27, 2022 7:56:33 PM" contact_name="<REDACTED>"> <parts> <part seq="0" ct="text/plain" name="null" chset="3" cd="null" fn="null" cid="&lt;0&gt;" cl="null" ctt_s="null" ctt_t="null" text="Loved an image"/> </parts> </mms>

but my basic exploration of grepping IDs across the file doesnt seem to have worked. Theres lots of random key/values/ids though, maybe it could be indexes in conversation/some auto-indexed ID...? cant figure it out, just guessing

@seanbreckenridge
Copy link
Contributor Author

Ahhh, any group convo is converted into a MMS since otherwise it cant accurately encode who sent which message.

So thats why there are messages that are 'just text'

message_type is just # 1 = Received, 2 = Sent, 3 = Draft, 4 = Outbox, for group messages contact_name is just a list -- so you just know if you recieved it but dont know from who. The addresses described here https://www.synctech.com.au/sms-backup-restore/fields-in-xml-backup-files/

<addr address="<--->" type="130" charset="106"/>
<addr address="<--->" type="130" charset="106"/>
<addr address="<--->" type="130" charset="106"/>
<addr address="<--->" type="130" charset="106"/>
<addr address="<--->" type="137" charset="3"/>
<addr address="<--->" type="151" charset="106"/>

where it specifies which number and then a type tells you who actually sent it...

129 = BCC, 130 = CC, 151 = To, 137 = From ... weird schema.

@seanbreckenridge seanbreckenridge changed the title initial mms exploration smscalls: parse mms from smscalls export Jun 3, 2024
@seanbreckenridge
Copy link
Contributor Author

seanbreckenridge commented Jun 3, 2024

Was able to use it nicely in some scripts I have to preview convos, and save any images found to ~/.cache/sms-images, synced those scripts up here:

seanbreckenridge/HPI-personal@d8b0539

$ sms-images
....
Saving /home/sean/.cache/sms-images/.../1716941561.0-IMG_7345.jpg.jpg
Done, saved 465 images, using 82.697 MB

am sure these some small issues I may have missed, but those will get found with more usage - I think this is good enough to merge and start using.

@seanbreckenridge
Copy link
Contributor Author

@karlicoss this should be good to review/merge

@karlicoss
Copy link
Owner

Thanks! I literally never received or sent an MMS, so don't really have any option, happy to merge :)

@karlicoss karlicoss merged commit 35dd5d8 into karlicoss:master Jun 5, 2024
10 checks passed
@seanbreckenridge
Copy link
Contributor Author

never recieved or sent an MMS

I didnt think I had that many either, but any group chat turns itself into an MMS because theres no other way to encode who the message is coming from when there are multiple people (see comment)...

So, apparently I had a couple hundred because am in group chats with family etc.

I mostly care about this for random images that are stored in it though, am going to embed the correct date into that and add it to my eventual photos module.

thanks 👍

@seanbreckenridge
Copy link
Contributor Author

Oh, you may have to edit this line then, I assumed you had a few so I just put the standard, eh, "check if theres 10 things in this" test

@karlicoss
Copy link
Owner

Yeah, noticed this as well, but sadly this isn't running regularly anyway. Ideally we'd add some test file to https://github.com/karlicoss/hpi-testdata or something like that, so it can run on CI

@seanbreckenridge
Copy link
Contributor Author

Yeah, I always get paranoid about pushing location data test files or a modified xml export just because I always think Ive missed something, I've been wanting to write some tool that takes JSON/XML as input and creates valid-looking dummy-data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants