-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discord parser #1859
Discord parser #1859
Conversation
Performance improvement and disk usage in the CacheAddr class, now it returns the InputStream of exactly the necessary size.
Suitability for using the corrected getInputStream method of the CacheAddr class
The DiscordParser class is now able to parse any cache that has the Google Chrome cache structure, and all of its subitems. When a cached Discord chat file is found, the original JSON file is parsed separately from the chat file. Therefore, for each chat found, there will be an original JSON (found in cache) and a chat interpreted in HTML.
Deduplication of participant names in calls
Change to display AM/PM indication and time in 24h format for Brazilian language.
BUGFIX - If there are two files with the same name, the parser could show the wrong file. Now the file is fetched based on the name and id contained in the URL.
BUGFIX - The character ")" was being printed wrongly in the HTML;
Thank you @felipecampanini! Have you noticed merge conflicts below? Are you comfortable to fix them? |
Hello @lfcnassif, sorry for the delay. I will review the conflicts |
bc701bd
to
52d9b07
Compare
@felipecampanini, @patrickdalla was trying to review this but he found merge conflicts in thousands of files. I took a look and seems you did a bad reverse merge of master here. I needed to force-push to fix that incorrect merge. Please take a look and check if I reverted some of your logic changes unintentionally, so @patrickdalla can proceed, thanks! |
At a first look, seems the date format in localization files were fixed just into the pt-BR localization and an AM/PM marker was introduced just in the English one, missing the other 3 localization files. I suggest always using 24h in all languages. |
Hi @felipecampanini . It seems the implementation mixes generic chrome cache index parsing with discord specific parsing. Wouldn't it be better to create a generic chrome index cache parser, and a specific discord parser? |
@felipecampanini does this contain all proposed changes on #1568, right? |
Exactly |
@patrickdalla, I even discussed this with lfcnassif in #1568. The discord app uses the exact cache structure of Google Chrome. Therefore, I believe it would be ideal to use the same parser for both applications (Chrome and Discord). However, the name should be ChromeCacheParser or something similar, but as explained by @lfcnassif, renaming the parser could cause some problems. The same chat file that can be found in Chrome's cache can also be found in Discord's cache, so I believe they should be the same parser. |
I will review it calmly on Monday, sorry for the delay, this week I needed to focus on some urgent demands from the sector. |
One option is having 2 parsers: renaming the current one to ChromeCacheParser and creating a new one with the old name, extending the renamed parser. So things would continue working with old profiles.
Out of curiosity, is this situation detected and deduplicated by current code?
I totally understand, don't worry. |
After commit c5eb90d (propagate exception instead of silently catching it), I surprisingly got 25% more decoded chats and almost 50% more instant messages, although more parsing exceptions happen now. @felipecampanini or @patrickdalla, could some of you double check the commit to confirm it is fine? |
Are you using as sample your browser navigation? I came across some similar strange situation also, when using my own browser and discord session cache to test. My chats did not appear at first in IPED result, although I have seen them on the browser. On a latter test, many chats appeared in IPED, from channels I have subscribed, although I have not seen them on browser. It seemed some Discord background processing that works with the cache, in second case, and delayed cache flush, in first, as the real index files content used (seeing them with other tools) were consistent with IPED results. So I ignored. If yes, could you revert your simple change on commit and test again to see if something the chat appears. At first sight, this change isn't enough to recognize or recover any new chat. If not, could you share your test base? |
No, I'm using a fixed and immutable data set extracted from real cases... |
Sure. |
@lfcnassif I could not repeat the first image of last msg with the test sample you sent me. One thing I can observe in it is: It does not have any "browser artifacts" category items. So it seems you have run a yet older code commit, where the "Chrome Cache" subcategory wasn't defined. |
@lfcnassif: About the removal of Whatsapp related CSS, if I well remember, I have put it to make Audio and Video attachments reproducible directly from HTML exported via html report. |
Thanks @patrickdalla for your tests! I'll try to reproduce the first image I sent. About whatsapp css removal, I made it because the whatsapp chat background image was being displayed in some Discord chats, giving a very weird look. Please send me a sample with one audio or video as attachment, so I can test the change with them. |
Hi @patrickdalla, you're right, I did a mvn clean install and the same number of artifacts were recovered before and after my logic changes. About the WhatsApp CSS removal, this was what motivated me, this background shows up in many Discord chats: |
I think it would be better to have another CSS, even if it means some duplication. Otherwise changes in WhatsApp CSS (which is already complex) may cause undesired side effects in other parsers. |
I finished my review and pushed some fixes and adjustments. Thank you very much @felipecampanini and @patrickdalla for this great work! |
No description provided.