Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: decode media names before inserting into the database #18019

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Sahil06012002
Copy link
Contributor

@Sahil06012002 Sahil06012002 commented Feb 25, 2025

Issue:

Check Media shows incorrect values when media filenames contain special characters (spaces, %, &, =, etc.) on Android, while the desktop version works as expected.

Cause:

According to the Anki manual (https://docs.ankiweb.net/media.html#media), filenames with spaces or special characters appear differently in the HTML editor compared to how they are stored on disk. For example, hello 100%.jpg is displayed as hello%20100%25.jpg in the editor.

HTML encoding is likely applied because certain characters have special meanings in URLs and HTML syntax. Without proper encoding, the card rendering engine may fail to locate or display media files with special characters in their names.

For instance, if a file named test,.jpg is referenced in a card, it gets encoded as test%2C.jpg in the editor for proper rendering, while the actual file on disk remains test,.jpg.

Observations:

  • When adding a card using the desktop version, the original filename is stored in the collection.anki2 database (notes table).
  • When adding media via Android, the filename is stored in an encoded format.
    Scrrenshot of the notes table :
synv_ev

This discrepancy causes the check_media function to return incorrect results for media files added from Android when their names include special characters.

Approach

Now, while adding card in a deck addNote() function is called
https://github.com/Sahil06012002/Anki-Android/blob/fb3eec394463e19f0b3db3fdaa5b6b1eb156879a/AnkiDroid/src/main/java/com/ichi2/anki/NoteEditor.kt#L1198
that stores the note in the database.

Decoding the media filenames before storing them in the database resolves the issue.
Before calling addNote(editorNote!!, deckId), the fields parameter in editorNote is updated to replace encoded filenames with their decoded versions :
https://github.com/Sahil06012002/Anki-Android/blob/fb3eec394463e19f0b3db3fdaa5b6b1eb156879a/AnkiDroid/src/main/java/com/ichi2/anki/NoteEditor.kt#L2579 .

Fixes

How Has This Been Tested?

Checklist

  • You have a descriptive commit message with a short title (first line, max 50 chars).
  • You have commented your code, particularly in hard-to-understand areas
  • You have performed a self-review of your own code
  • UI changes: include screenshots of all affected screens (in particular showing any new or changed strings)
  • UI Changes: You have tested your change using the Google Accessibility Scanner

Copy link
Member

@BrayanDSO BrayanDSO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A description of the issue and how this fix it would be appreciated. You also said that you had thought about other ways to fix it. What are they?

@@ -2575,7 +2575,8 @@ class NoteEditor :

private fun updateField(field: FieldEditText?): Boolean {
val fieldContent = field!!.text?.toString() ?: ""
val correctedFieldContent = NoteService.convertToHtmlNewline(fieldContent, shouldReplaceNewlines())
// Decode the file name when preparing data for saveNote() to handle special characters correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the // should follow the surrounding indent

@@ -2575,7 +2575,8 @@ class NoteEditor :

private fun updateField(field: FieldEditText?): Boolean {
val fieldContent = field!!.text?.toString() ?: ""
val correctedFieldContent = NoteService.convertToHtmlNewline(fieldContent, shouldReplaceNewlines())
// Decode the file name when preparing data for saveNote() to handle special characters correctly
val correctedFieldContent = Uri.decode(NoteService.convertToHtmlNewline(fieldContent, shouldReplaceNewlines()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this affect any kind of encoded content the user has added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any instances where users would intentionally add encoded content, but this is something to consider. If necessary, we might need to ensure that only media filenames are decoded rather than applying decoding to all fields.

Do you know of any cases where users might intentionally add encoded content?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From memory, no. But in Anki there are always someone doing stuff like that

@BrayanDSO BrayanDSO added the Needs Author Reply Waiting for a reply from the original author label Feb 28, 2025
@Sahil06012002
Copy link
Contributor Author

A description of the issue and how this fix it would be appreciated. You also said that you had thought about other ways to fix it. What are they?

I have updated the PR description for better clarity.

Regarding the second question, modifying the editorNote parameter anywhere before passing it to addNote() would effectively resolve the issue. Initially, I tried to fix it by updating the fields parameter in NoteEditor.kt at

as follows:

editorNote!!.fields.replaceAll { Uri.decode(it) }
addNote(editorNote!!, deckId)

But, this introduces an extra iteration, which is unnecessary. If needed, we can update editorNote at any suitable point before calling addNote() to optimize the fix.

This commit ensures that media file names remain unencoded when stored in the database, as they are already encoded during rendering. This prevents unnecessary encoding and ensures correct handling of special characters in filenames.
@Sahil06012002
Copy link
Contributor Author

Sahil06012002 commented Mar 6, 2025

Hi @BrayanDSO , Would you be able to take some time to review this?
While rendering a card on Android, the encoding function is called from the backend:

The method encodes media file names for rendering the HTML. This implies that encoding media file names during card creation should not be necessary.

Based on the comments in the backend code as well :

https://github.com/ankitects/anki/blob/63c2a09ef6760890c03be4bd83f613c03c512d1f/rslib/src/text.rs#L577

the media file names are not encoded when adding a card in the desktop version and the database contains the actual file name contrary to current implementation of android version. I confirmed this by checking the database as mentioned earlier and also by exporting a card in plain text format, which resulted in:

#separator:tab
#html:true
"<img src=""txm 232.jpeg"">"	test&nbsp;
"<img src=""test,hello.jpeg"">"

Proposed Fix :

Instead of decoding media file names before storing them in the database, a better solution would be to avoid encoding them when adding cards in the first place, since they are already encoded during rendering. I have already made the necessary changes in the commit, and I kindly ask you to review and verify it.

This change will cause a slight difference in the UI. Previously, when an image was attached from the gallery, the encoded file name was saved in the text field. With this fix, the actual file name will be displayed in the text field.

Additionally,In my opinion if encoding media file names is necessary anywhere in the code base, instead of using Uri.encode() as seen here:

we should use the following function to maintain consistency:

Copy link
Member

@Arthur-Milchior Arthur-Milchior left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR.

Can you please send the commit(s) as you want it/them in the repo. So please, squash them and ensure the description is stating exactly what you are doing and what you do it.

This really needs a regression test. Can you please add one?

And the biggest issue. What will occur to notes that were created with the current version of ankidroi?
Indeed, suppose that:

  • you create a card with today buggy version of ankidroid. You add an image with a comma in its name as in the original bug,
  • you update ankidroid so that your PR is incorporated,
  • you show a card of the note
  • you use check media

Is the image still correctly displayed in the reviewer? Is the media still detected as unused in the "check media" ? I don't understand the reviewer enough to know what to expect for the first case. However, I suspect that we'll still offer to delete the media as unused while it's used, because nothing in your code seems to have any impact over the "unused media" feature.

And while it's great that we'll stop introducing buggy entry in the notes, we should ideally be able to deal with all notes that were created in the buggy version of ankidroid, so that we don't break cards of our users.

It'd be also nice that you test a card created with today version of ankidroid on anki desktop. Ensuring that anki desktop does not detect the media as unused. If it does, we'll need to discuss with Damien Elmes what we want to do about it, but ideally, we'd need to update anki so that it does not remove used media that were created by our users when we had a bug

@@ -45,7 +45,7 @@ interface IField : Serializable {
/**
* Returns the formatted value for this field. Each implementation of IField should return in a format which will be
* used to store in the database
*
* - The formatted value now contains the actual file name instead of the encoded name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the "now". That seems to indicate that during the execution of the program, there is a past time where something else occurred. Not clear what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier, the formattedValue held the encoded file name within the tag because of its setter behavior in the ImageField.kt file that implements the IField.kt interface. In this commit, it has been updated to contain the actual file name.

I added this comment for this particular commit, but it doesn’t align with the codebase actually, I’ll be removing it in the next commit.

// No need to encode the file name, as it will be encoded during rendering.
// val encodedName = Uri.encode(file.name)
val fileName = file.name
"""<img src="$fileName">"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src="${file.name}" would be fine too, no need to use two lines here.

@@ -80,8 +79,10 @@ class ImageField :
@NeedsTest("files with HTML illegal chars can be imported and rendered")
fun formatImageFileName(file: File): String =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please document the function? I fail to understand when we'd want to return the empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "else" condition was already present, possibly as a fallback, and I haven't made any changes to that part. Would you like me to take a look at it?

@Sahil06012002
Copy link
Contributor Author

Thanks for this PR.

Can you please send the commit(s) as you want it/them in the repo. So please, squash them and ensure the description is stating exactly what you are doing and what you do it.

This really needs a regression test. Can you please add one?

And the biggest issue. What will occur to notes that were created with the current version of ankidroi? Indeed, suppose that:

  • you create a card with today buggy version of ankidroid. You add an image with a comma in its name as in the original bug,
  • you update ankidroid so that your PR is incorporated,
  • you show a card of the note
  • you use check media

Is the image still correctly displayed in the reviewer? Is the media still detected as unused in the "check media" ? I don't understand the reviewer enough to know what to expect for the first case. However, I suspect that we'll still offer to delete the media as unused while it's used, because nothing in your code seems to have any impact over the "unused media" feature.

And while it's great that we'll stop introducing buggy entry in the notes, we should ideally be able to deal with all notes that were created in the buggy version of ankidroid, so that we don't break cards of our users.

It'd be also nice that you test a card created with today version of ankidroid on anki desktop. Ensuring that anki desktop does not detect the media as unused. If it does, we'll need to discuss with Damien Elmes what we want to do about it, but ideally, we'd need to update anki so that it does not remove used media that were created by our users when we had a bug

  • When a card is added in the buggy version of AnkiDroid, using the "Check Media" tool results in the following:
    • Missing: The encoded media name (as this is what gets referenced with the card in the database).
    • Unused: The actual media name (since this is present in the media folder but isn't referenced by any card).

This behavior is consistent across:

  • The current app version
  • The app including this PR
  • The desktop version (we’ll need to discuss with Damien Elmes)

This PR does not address the existing issue with the "Check Media" feature. After this PR, the tool will still display the previous buggy entries.

To resolve this, we could update the database to store the actual file name instead of the encoded version, or we could explore a better fix..

  • In the app including this PR, the card containing the (,) in the file name which was added from the current version of the app is visible in the reviewer. The reviewer requires the encoded file name to render the HTML, so there is no issue.
  • When adding cards from the app in this PR, the "Check Media" feature works fine in both the Android and desktop versions.

PS: We are not altering the results from the "Check Media" function; we are simply displaying them as they are.

@@ -80,8 +79,10 @@ class ImageField :
@NeedsTest("files with HTML illegal chars can be imported and rendered")
fun formatImageFileName(file: File): String =
if (file.exists()) {
val encodedName = Uri.encode(file.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use git blame and check the commit that introduced this, it tried to fix #15844. Since you are reverting the change, please check if this has created a regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BrayanDSO , I checked, and it seems that the current PR reintroduces the issue. Reverting the change has caused the regression. I'll look into this further.

@Sahil06012002
Copy link
Contributor Author

@Arthur-Milchior @BrayanDSO The current implementation uses Uri.encode() here:

This resolves the issue of rendering media files with encoded characters in their names. For example, when uploading a file named test%20.png, it gets stored as test%2520.png. As a result, <img src='test%2520.png'/> correctly looks for test%20.png, which matches the actual file name on disk.

However, this approach causes the Check Media functionality to fail because the database now contains the encoded filename, while the actual file on disk remains unencoded.
Point to note here is we also encode the file names at display time but contrary to its expected behaivour this


does not encode media names that already contain encoded characters as expected. For instance, if the file name is test%20.png, it remains unchanged as test%20.png, resulting in <img src='test%20.png'/>. This causes the system to look for test .png (without %20), which does not exist, leading to rendering issues in both the Android and desktop versions.

I’ve raised this issue with Dae (comment here) to determine whether this should be fixed upstream, as both the desktop and Android versions are affected.

Proposed Fix (If No Upstream Changes)
Instead of encoding the filename during upload (which breaks Check Media), we should encode it at display time using Uri.encode().

This would work as follows:

  • A file named test%20.png remains unchanged on disk and the database.
  • When rendering, we encode it using Uri.encode(), producing <img src='test%2520.png'/>.
  • This ensures both Check Media and rendering work correctly.

I’ve tested this approach, and it appears to work as expected. That said, I’ll wait for Dae’s response before proceeding with this PR. Let me know if you have any suggestions.

Copy link
Contributor

Hello 👋, this PR has had no activity for more than 2 weeks and needs a reply from the author. If you think this is a mistake please comment and ping a maintainer to get this merged ASAP! Thanks for contributing! You have 7 days until this gets closed automatically

@github-actions github-actions bot added the Stale label Mar 25, 2025
@BrayanDSO BrayanDSO removed the Stale label Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Author Reply Waiting for a reply from the original author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check Media Deletes Used Files
3 participants